1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00
Commit Graph

6647 Commits

Author SHA1 Message Date
Rafael Espindola
1fb628bc96 Handle DAG CSE adding new uses during ReplaceAllUsesWith. Fixes PR14333.
llvm-svn: 167912
2012-11-14 05:08:56 +00:00
Anton Korobeynikov
3edf77ac04 Use TARGET2 relocation for TType references on ARM.
Do some cleanup of the code while here.

Inspired by patch by Logan Chien!

llvm-svn: 167904
2012-11-14 01:47:00 +00:00
Eric Christopher
b3e4c78741 Revert "Use the 'count' attribute instead of the 'upper_bound' attribute."
temporarily as it is breaking the gdb bots.

This reverts commit r167806/e7ff4c14b157746b3e0228d2dce9f70712d1c126.

llvm-svn: 167886
2012-11-13 23:30:43 +00:00
Manman Ren
e98ec5dd77 X86: when constructing VZEXT_LOAD from other loads, makes sure its output
chain is correctly setup.

As an example, if the original load must happen before later stores, we need
to make sure the constructed VZEXT_LOAD is constrained to be before the stores.

rdar://12684358

llvm-svn: 167859
2012-11-13 19:13:05 +00:00
Ulrich Weigand
9c5e333c90 Do not consider a machine instruction that uses and defines the same
physical register as candidate for common subexpression elimination
in MachineCSE.

This fixes a bug on PowerPC in MultiSource/Applications/oggenc/oggenc
caused by MachineCSE invalidly merging two separate DYNALLOC insns.

llvm-svn: 167855
2012-11-13 18:40:58 +00:00
Duncan Sands
7c55936d5f Codegen support for arbitrary vector getelementptrs.
llvm-svn: 167830
2012-11-13 13:01:58 +00:00
Bill Wendling
ab44d906b6 Use the 'count' attribute instead of the 'upper_bound' attribute.
If we have a type 'int a[1]' and a type 'int b[0]', the generated DWARF is the
same for both of them because we use the 'upper_bound' attribute. Instead use
the 'count' attrbute, which gives the correct number of elements in the array.
<rdar://problem/12566646>

llvm-svn: 167806
2012-11-13 02:31:47 +00:00
Andrew Trick
d8c621a864 Cleanup the main RegisterCoalescer loop.
Block priorities still apply outside loops.

llvm-svn: 167793
2012-11-13 00:34:44 +00:00
Michael Liao
c260a62eb2 Fix test case added in patch fixing PR14314
llvm-svn: 167769
2012-11-12 22:33:18 +00:00
Andrew Trick
ad4b55b3d8 misched: Infrastructure for weak DAG edges.
This adds support for weak DAG edges to the general scheduling
infrastructure in preparation for MachineScheduler support for
heuristics based on weak edges.

llvm-svn: 167738
2012-11-12 19:28:57 +00:00
Michael Liao
5bf5c77881 Fix PR14314
- Fix operand order for atomic sub, where the minuend is the value
  loaded from memory and the subtrahend is the parameter specified.

llvm-svn: 167718
2012-11-12 06:49:17 +00:00
Justin Holewinski
da9a98c364 [NVPTX] Add more precise PTX/SM target attributes
Each SM and PTX version is modeled as a subtarget feature/CPU. Additionally,
PTX 3.1 is added as the default PTX version to be out-of-the-box compatible
with CUDA 5.0.

Available CPUs for this target:

  sm_10 - Select the sm_10 processor.
  sm_11 - Select the sm_11 processor.
  sm_12 - Select the sm_12 processor.
  sm_13 - Select the sm_13 processor.
  sm_20 - Select the sm_20 processor.
  sm_21 - Select the sm_21 processor.
  sm_30 - Select the sm_30 processor.
  sm_35 - Select the sm_35 processor.

Available features for this target:

  ptx30 - Use PTX version 3.0.
  ptx31 - Use PTX version 3.1.
  sm_10 - Target SM 1.0.
  sm_11 - Target SM 1.1.
  sm_12 - Target SM 1.2.
  sm_13 - Target SM 1.3.
  sm_20 - Target SM 2.0.
  sm_21 - Target SM 2.1.
  sm_30 - Target SM 3.0.
  sm_35 - Target SM 3.5.

llvm-svn: 167699
2012-11-12 03:16:43 +00:00
Evan Cheng
2599006e46 Convert an improper CodeGen test to a MC test.
llvm-svn: 167663
2012-11-10 04:30:40 +00:00
Evan Cheng
6ed26ba70c xfail a bad test. This is a MC test but it's dependent on a codegen optimization which is now disabled.
llvm-svn: 167658
2012-11-10 02:34:36 +00:00
Evan Cheng
ebe241fb9d Disable the Thumb no-return call optimization:
mov lr, pc
b.w _foo

The "mov" instruction doesn't set bit zero to one, it's putting incorrect
value in lr. It messes up backtraces.

rdar://12663632

llvm-svn: 167657
2012-11-10 02:09:05 +00:00
Craig Topper
f424da6ff9 Cleanup pcmp(e/i)str(m/i) instruction definitions and load folding support.
llvm-svn: 167652
2012-11-10 01:23:36 +00:00
Justin Holewinski
be8faeed70 [NVPTX] Use ABI alignment for parameters when alignment is not specified.
Affects SM 2.0+.  Fixes bug 13324.

llvm-svn: 167646
2012-11-09 23:50:24 +00:00
Jakob Stoklund Olesen
887571e652 Fix assertions in updateRegMaskSlots().
The RegMaskSlots contains 'r' slots while NewIdx and OldIdx are 'B'
slots. This broke the checks in the assertions.

This fixes PR14302.

llvm-svn: 167625
2012-11-09 19:18:49 +00:00
Amara Emerson
f7a46cedbc Recommit modified r167540.
Improve ARM build attribute emission for architectures types.
This also changes the default architecture emitted for a generic CPU to "v7".

llvm-svn: 167574
2012-11-08 09:51:45 +00:00
Michael Liao
59114df23b Add support of RTM from TSX extension
- Add RTM code generation support throught 3 X86 intrinsics:
  xbegin()/xend() to start/end a transaction region, and xabort() to abort a
  tranaction region

llvm-svn: 167573
2012-11-08 07:28:54 +00:00
Akira Hatanaka
b8f5a8ab0b [mips] Custom-lower ISD::FRAME_TO_ARGS_OFFSET node.
Patch by Sasa Stankovic.

llvm-svn: 167548
2012-11-07 19:10:58 +00:00
Andrew Trick
8b72906a53 misched: Heuristics based on the machine model.
misched is disabled by default. With -enable-misched, these heuristics
balance the schedule to simultaneously avoid saturating processor
resources, expose ILP, and minimize register pressure. I've been
analyzing the performance of these heuristics on everything in the
llvm test suite in addition to a few other benchmarks. I would like
each heuristic check to be verified by a unit test, but I'm still
trying to figure out the best way to do that. The heuristics are still
in considerable flux, but as they are refined we should be rigorous
about unit testing the improvements.

llvm-svn: 167527
2012-11-07 07:05:09 +00:00
Ulrich Weigand
5e496676d0 On PowerPC64, integer return values (as well as arguments) are supposed
to be extended to a full register.   This is modeled in the IR by marking
the return value (or argument) with a signext or zeroext attribute.

However, while these attributes are respected for function arguments,
they are currently ignored for function return values by the PowerPC
back-end.  This patch updates PPCCallingConv.td to ask for the promotion
to i64, and fixes LowerReturn and LowerCallResult to implement it.

The new test case verifies that both arguments and return values are
properly extended when passing them; and also that the optimizers
understand incoming argument and return values are in fact guaranteed
by the ABI to be extended.

The patch caused a spurious breakage in CodeGen/PowerPC/coalesce-ext.ll,
since the test case used a "ret" instruction to create a use of an i32
value at the end of the function (to set up data flow as required for
what the test is intended to test).  Since there's now an implicit
promotion to i64, that data flow no longer works as expected.  To fix
this, this patch now adds an extra "add" to ensure we have an appropriate
use of the i32 value.

llvm-svn: 167396
2012-11-05 19:39:45 +00:00
Hal Finkel
a82b79fc22 Add support for the PowerPC-specific inline asm Z constraint and y modifier.
The Z constraint specifies an r+r memory address, and the y modifier expands
to the "r, r" in the asm string. For this initial implementation, the base
register is forced to r0 (which has the special meaning of 0 for r+r addressing
on PowerPC) and the full address is taken in the second register. In the
future, this should be improved.

llvm-svn: 167388
2012-11-05 18:18:42 +00:00
Adhemerval Zanella
382ede5fd4 [PATCH] PowerPC: Expand load extend vector operations
This patch expands the SEXTLOAD, ZEXTLOAD, and EXTLOAD operations for
vector types when altivec is enabled.

llvm-svn: 167386
2012-11-05 17:15:56 +00:00
Akira Hatanaka
1bfa522bfe [mips] Set flag neverHasSideEffects flag on floating point conversion
instructions.

llvm-svn: 167348
2012-11-03 00:53:12 +00:00
Akira Hatanaka
61434a3632 [mips] Set flag isAsCheapAsAMove flag on instruction LUi.
llvm-svn: 167345
2012-11-03 00:26:02 +00:00
Akira Hatanaka
06b2c52edc [mips] Stop reserving register AT and use register scavenger when a scratch
register is needed.

llvm-svn: 167341
2012-11-03 00:05:43 +00:00
Akira Hatanaka
5a2c763cb0 [mips] Fix bug in test case. Disable machine LICM to prevent instruction from
being moved out of a basic block.

llvm-svn: 167322
2012-11-02 21:46:42 +00:00
Quentin Colombet
522698f693 Vext Lowering was missing opportunities
llvm-svn: 167318
2012-11-02 21:32:17 +00:00
Akira Hatanaka
25aba90c6b [mips] Use register number instead of name to print register $AT.
llvm-svn: 167315
2012-11-02 21:26:03 +00:00
Akira Hatanaka
ba57d98a50 [mips] Delete MipsFunctionInfo::EmitNOAT. Unconditionally print directive
"set .noat" so that the assembler doesn't issue warnings when register $AT is
used.

llvm-svn: 167310
2012-11-02 20:56:25 +00:00
NAKAMURA Takumi
e2c67c353f test/CodeGen/X86/fp-fast.ll: Add +avx.
llvm-svn: 167207
2012-11-01 02:13:45 +00:00
Owen Anderson
8f66d7107c Add a few more simple fast-math constant propagations and cancellations.
llvm-svn: 167200
2012-11-01 02:00:53 +00:00
Shuxin Yang
4762eb1825 (For X86) Enhancement to add-carray/sub-borrow (adc/sbb) optimization.
The adc/sbb optimization is to able to convert following expression
into a single adc/sbb instruction:
  (ult) ... = x + 1 // where the ult is unsigned-less-than comparison
  (ult) ... = x - 1

  This change is to flip the "x >u y" (i.e. ugt comparison) in order 
to expose the adc/sbb opportunity.

llvm-svn: 167180
2012-10-31 23:11:48 +00:00
Akira Hatanaka
245eaafd42 [mips] Set isAsCheapAsAMove flag on ADDiu and DADDiu, which enables
re-materialization of immediate loads.

llvm-svn: 167153
2012-10-31 18:37:55 +00:00
Akira Hatanaka
6632837f2e Test case for r167039. Check that tail-call optimization is disabled for
mips16.

llvm-svn: 167139
2012-10-31 17:25:23 +00:00
Reed Kotler
82a7f09a30 Implement ADJCALLSTACKUP and ADJCALLSTACKDOWN
llvm-svn: 167107
2012-10-31 05:21:10 +00:00
Bill Schmidt
f4c899f8e7 This patch addresses an ABI compatibility issue with empty aggregate
parameters.  Examples of these are:

  struct { } a;
  union { } b[256];
  int a[0];

An empty aggregate has an address, although dereferencing that address is
pointless.  When passed as a parameter, an empty aggregate does not consume
a protocol register, nor does it consume a doubleword in the parameter save
area.  Passing an empty aggregate by reference passes an address just as
for any other aggregate.  Returning an empty aggregate uses GPR3 as a hidden
address of the return value location, just as for any other aggregate.

The patch modifies PPCTargetLowering::LowerFormalArguments_64SVR4 and
PPCTargetLowering::LowerCall_64SVR4 to properly skip empty aggregate
parameters passed by value.  The handling of return values and by-reference
parameters was already correct.

Built on powerpc64-unknown-linux-gnu and tested with no new regressions.
A test case is included to test proper handling of empty aggregate
parameters on both sides of the function call protocol.

llvm-svn: 167090
2012-10-31 01:15:05 +00:00
Manman Ren
f26bd7d8f9 X86 SSE: update rsqrtss and rcpss to use two source operands and
the first source operand is tied to the destination operand.

This is to accurately model the corresponding instructions where the upper
bits are unmodified.

rdar://12558838
PR14221

llvm-svn: 167064
2012-10-30 23:53:59 +00:00
Manman Ren
584c3daf8d X86 MMX: optimize transfer from mmx to i32
We used to generate a store (movq) + a load.
Now we use movd.

rdar://9946746

llvm-svn: 167056
2012-10-30 22:15:38 +00:00
Akira Hatanaka
dbea525cfc [mips] Allow tail-call optimization for vararg functions and functions which
use the caller's stack.

llvm-svn: 167048
2012-10-30 20:16:31 +00:00
Adhemerval Zanella
74fd05ff3f PowerPC: Expand FSRQT for vector types
This patch expands FSQRT for floating point vector types when altivec is
used.

llvm-svn: 167034
2012-10-30 18:29:42 +00:00
Quentin Colombet
dde058d386 Change ForceSizeOpt attribute into MinSize attribute
llvm-svn: 167020
2012-10-30 16:32:52 +00:00
Adhemerval Zanella
ac3ba40bc2 PowerPC: More support for Altivec compare operations
This patch adds more support for vector type comparisons using altivec.
It adds correct support for v16i8, v8i16, v4i32, and v4f32 vector
types for comparison operators ==, !=, >, >=, <, and <=.

llvm-svn: 167015
2012-10-30 13:50:19 +00:00
Hans Wennborg
40eb1b4055 Use TargetTransformInfo to control switch-to-lookup table transformation
When the switch-to-lookup tables transform landed in SimplifyCFG, it
was pointed out that this could be inappropriate for some targets.
Since there was no way at the time for the pass to know anything about
the target, an awkward reverse-transform was added in CodeGenPrepare
that turned lookup tables back into switches for some targets.

This patch uses the new TargetTransformInfo to determine if a
switch should be transformed, and removes
CodeGenPrepare::ConvertLoadToSwitch.

llvm-svn: 167011
2012-10-30 11:23:25 +00:00
Reed Kotler
de0ea1027e Change mips16 delay slot jumps to non delay slot forms by default.
We will make them delay slot forms if there is something that can be
placed in the delay slot during a separate pass. Mips16 extended instructions
cannot be placed in delay slots.

llvm-svn: 166990
2012-10-30 00:54:49 +00:00
Jakub Staszak
f1cddf738b Re-commit r166971. I reverted it to quickly, when buildbots didn't have a chance
to test it with chapni's fix (-mattr=+avx).

llvm-svn: 166985
2012-10-30 00:01:57 +00:00
Jakub Staszak
ce95e4429f Revert r166971. It causes buildbot failure. To be investigated.
llvm-svn: 166979
2012-10-29 23:13:50 +00:00
NAKAMURA Takumi
d544c8f4ca llvm/test/CodeGen/X86/vec_shuffle-30.ll: Try to unbreak builds - assuming +avx.
llvm-svn: 166974
2012-10-29 22:45:18 +00:00
Jakub Staszak
ded6f21890 Allow to fold vector load if there is more than one bitcast, so in the case:
%0 = load <8 x i16>* %dest
%1 = shufflevector <8 x i16> %0, <8 x i16> %in,
      <8 x i32> < i32 0, i32 1, i32 2, i32 3, i32 13, i32 undef, i32 14, i32 14>
store <8 x i16> %1, <8 x i16>* %dest

We get:
  vmovlpd (%eax), %xmm0, %xmm0

instead of:
  vmovaps (%eax), %xmm1
  vmovsd  %xmm1, %xmm0, %xmm0

No extra test-case is added. I just fixed the existing one
(also it uses FileCheck now).

llvm-svn: 166971
2012-10-29 21:56:35 +00:00
Bill Schmidt
77a8fd274b This patch solves a problem with passing varargs parameters under the PPC64
ELF ABI.

A varargs parameter consisting of a single-precision floating-point value,
or of a single-element aggregate containing a single-precision floating-point
value, must be passed in the low-order (rightmost) four bytes of the
doubleword stack slot reserved for that parameter.  If there are GPR protocol
registers remaining, the parameter must also be mirrored in the low-order
four bytes of the reserved GPR.

Prior to this patch, such parameters were being passed in the high-order
four bytes of the stack slot and the mirrored GPR.

The patch adds a new test case to verify the correct code generation.

llvm-svn: 166968
2012-10-29 21:18:16 +00:00
Reed Kotler
3859f5469e Implement patterns for extloadi8 and extloadi16
llvm-svn: 166960
2012-10-29 19:39:04 +00:00
Ulrich Weigand
445bd73056 In various places throughout the code generator, there were special
checks to avoid performing compile-time arithmetic on PPCDoubleDouble.

Now that APFloat supports arithmetic on PPCDoubleDouble, those checks
are no longer needed, and we can treat the type like any other.

llvm-svn: 166958
2012-10-29 18:35:49 +00:00
Chad Rosier
3b32dec25d Remove redundant test case from r166949, per Eli's suggestion.
llvm-svn: 166953
2012-10-29 18:18:26 +00:00
Chad Rosier
651ecf255c [ms-inline asm] Add support for the [] operator. Essentially, [expr1][expr2] is
equivalent to [expr1 + expr2].  See test cases for more examples.
rdar://12470392

llvm-svn: 166949
2012-10-29 18:01:54 +00:00
Michael Liao
d26c27ad35 Fix PR14204
- Add missing pattern on X86ISD::VZEXT from VR256 to VR256 when AVX2 is enabled.

llvm-svn: 166947
2012-10-29 17:57:12 +00:00
Jakob Stoklund Olesen
05cec5db28 Completely disallow partial copies in adjustCopiesBackFrom().
Partial copies can show up even when CoalescerPair.isPartial() returns
false. For example:

   %vreg24:dsub_0<def> = COPY %vreg31:dsub_0; QPR:%vreg24,%vreg31

Such a partial-partial copy is not good enough for the transformation
adjustCopiesBackFrom() needs to do.

llvm-svn: 166944
2012-10-29 17:51:52 +00:00
Ulrich Weigand
2daab9e4b4 Allow i32/i64 for 'f' constraint on PowerPC.
This fixes PR12757.

llvm-svn: 166943
2012-10-29 17:49:34 +00:00
Reed Kotler
56b8c348f2 Expand all atomic ops for mips16.
llvm-svn: 166935
2012-10-29 16:16:54 +00:00
Preston Gurd
1683865eea This patch addresses a problem with the Post RA scheduler generating an
incorrect instruction sequence due to it not being aware that an
inline assembly instruction may reference memory.

This patch fixes the problem by causing the scheduler to always assume that any
inline assembly code instruction could access memory. This is necessary because
the internal representation of the inline instruction does not include
any information about memory accesses.
 
This should fix PR13504.

llvm-svn: 166929
2012-10-29 15:01:23 +00:00
Bill Schmidt
2682b73f6d This patch adds alignment information for long double to the 64-bit PowerPC
ELF subtarget.

The existing logic is used as a fallback to avoid any changes to the Darwin
ABI.  PPC64 ELF now has two possible data layout strings: one for FreeBSD,
which requires 8-byte alignment, and a default string that requires
16-byte alignment.

I've added a test for PPC64 Linux to verify the 16-byte alignment.  If
somebody wants to add a separate test for FreeBSD, that would be great.

Note that there is a companion patch to update the alignment information
in Clang, which I am committing now as well.

llvm-svn: 166928
2012-10-29 14:59:36 +00:00
Reed Kotler
1b4c985fc8 Implement brind operator for mips16.
llvm-svn: 166903
2012-10-28 23:08:07 +00:00
Reed Kotler
16eaf8644a This patch is for the implementation of mips16 complex pattern addr16.
Previously mips16 was sharing the pattern addr which is used for mips32
and mips64. This had a number of problems:
1) Storing and loading byte and halfword quantities for mips16 has particular
problems due to the primarily non mips16 nature of SP. When we must
load/store byte/halfword stack objects in a function, we must create a mips16
alias register for SP. This functionality is tested in stchar.ll.
2) We need to have an FP register under certain conditions (such as 
dynamically sized alloca). We use mips16 register S0 for this purpose.
In this case, we also use this register when accessing frame objects so this
issue also affects the complex pattern addr16. This functionality is
tested in alloca16.ll.

The Mips16InstrInfo.td has been updated to use addr16 instead of addr.

The complex pattern C++ function for addr has been copied to addr16 and
updated to reflect the above issues.

llvm-svn: 166897
2012-10-28 06:02:37 +00:00
Jakob Stoklund Olesen
2efe8df169 Never attempt to join an early-clobber def with a regular kill.
This fixes PR14194.

llvm-svn: 166880
2012-10-27 17:41:27 +00:00
Quentin Colombet
bcd2bc3437 [code size][ARM] Emit regular call instructions instead of the move, branch sequence
llvm-svn: 166854
2012-10-27 01:10:17 +00:00
Reed Kotler
cc41464454 Implement MipsHi for mips16
llvm-svn: 166852
2012-10-27 00:57:14 +00:00
Akira Hatanaka
a2ae72bd2f [mips] Do not tail-call optimize vararg functions or functions with byval
arguments.

This is rather conservative and should be fixed later to be more aggressive.

llvm-svn: 166851
2012-10-27 00:56:56 +00:00
Akira Hatanaka
4d26f60ca0 [mips] Make sure FuncArg doesn't advance when OrigArgIndex is the same as in the
previous iteration.

llvm-svn: 166850
2012-10-27 00:44:39 +00:00
Jakob Stoklund Olesen
4b4db880a3 Revert r163298 "Optimize codegen for VSETLNi{8,16,32} operating on Q registers."
Keep the integer_insertelement test case, the new coalescer can handle
this kind of lane insertion without help from pseudo-instructions.

llvm-svn: 166835
2012-10-26 23:39:46 +00:00
Reed Kotler
8721a311b3 implement mips16 tls global addr
llvm-svn: 166827
2012-10-26 22:57:32 +00:00
Jakob Stoklund Olesen
1a8c00078d Add GPRPair Register class to ARM.
Some instructions in ARM require 2 even-odd paired GPRs. This
patch adds support for such register class.

Patch by Weiming Zhao!

llvm-svn: 166816
2012-10-26 21:29:15 +00:00
Reed Kotler
6b0e65fce7 Implement carry for subtract/add for mips16
llvm-svn: 166755
2012-10-26 04:46:26 +00:00
Reed Kotler
fd22c8bfc1 implement large (>16 bit) constant loading.
llvm-svn: 166749
2012-10-26 03:09:34 +00:00
Reed Kotler
72bddf2c9a fix test setgek.ll so that it will not give false "make check"
failure in some cases

llvm-svn: 166747
2012-10-26 01:29:42 +00:00
Reed Kotler
6ac4e4d880 implement mips16 patterns for select nodes
llvm-svn: 166721
2012-10-25 21:33:30 +00:00
Michael Liao
e5329935bd Add test for ATOM ISA SSSE3
- Remove SSE4.1 feature in other ATOM-based test cases

llvm-svn: 166699
2012-10-25 17:50:05 +00:00
Bill Schmidt
71c462aff2 This patch addresses a PPC64 ELF issue with passing parameters consisting of
structs having size 3, 5, 6, or 7.  Such a struct must be passed and received
as right-justified within its register or memory slot.  The problem is only
present for structs that are passed in registers.

Previously, as part of a patch handling all structs of size less than 8, I
added logic to rotate the incoming register so that the struct was left-
justified prior to storing the whole register.  This was incorrect because
the address of the parameter had already been adjusted earlier to point to
the right-adjusted value in the storage slot.  Essentially I had accidentally
accounted for the right-adjustment twice.

In this patch, I removed the incorrect logic and reorganized the code to make
the flow clearer.

The removal of the rotates changes the expected code generation, so test case
structsinregs.ll has been modified to reflect this.  I also added a new test
case, jaggedstructs.ll, to demonstrate that structs of these sizes can now
be properly received and passed.

I've built and tested the code on powerpc64-unknown-linux-gnu with no new
regressions.  I also ran the GCC compatibility test suite and verified that
earlier problems with these structs are now resolved, with no new regressions.

llvm-svn: 166680
2012-10-25 13:38:09 +00:00
Elena Demikhovsky
d93b6e7cd4 The test avx-intel-ocl.ll failed. I can't reproduce on any of my machines. I added -mcpu flag, may be it will fix the problem
llvm-svn: 166669
2012-10-25 08:38:42 +00:00
Chad Rosier
492e58a0f7 [ms-inline asm] Add back-end test case for r166632. Make sure we emit the
correct .s output as well as get the correct encoding by the integrated
assembler.

llvm-svn: 166638
2012-10-24 23:10:28 +00:00
Evan Cheng
f97472cdf6 Fix a miscompilation caused by a typo. When turning a adde with negative value
into a sbc with a positive number, the immediate should be complemented, not
negated. Also added a missing pattern for ARM codegen.

rdar://12559385

llvm-svn: 166613
2012-10-24 19:53:01 +00:00
Elena Demikhovsky
b711ef1960 Special calling conventions for Intel OpenCL built-in library.
llvm-svn: 166566
2012-10-24 14:46:16 +00:00
Michael Liao
18e40965aa Teach DAG combine to fold (buildvec (Xint2fp x)) to (Xint2fp (buildvec x))
- If more than 1 elemennts are defined and target supports the vectorized
  conversion, use the vectorized one instead to reduce the strength on
  conversion operation.

llvm-svn: 166546
2012-10-24 04:14:18 +00:00
Michael Liao
70bb8004bd Add custom conversion from v2u32 to v2f32 in 32-bit mode
- As there's no 64-bit GPRs in 32-bit mode, a custom conversion from v2u32 to
  v2f32 is added to improve the efficiency of the code generated.

llvm-svn: 166545
2012-10-24 04:09:32 +00:00
Akira Hatanaka
8c77adc9ce [mips] Make sure sret argument is returned in register V0.
llvm-svn: 166539
2012-10-24 02:10:54 +00:00
Rafael Espindola
8ebde25931 Change x86_fastcallcc to require inreg markers. This allows it to known
the difference from "int x" (which should go in registers and
"struct y {int x;}" (which should not).

Clang will be updated in the next patches.

llvm-svn: 166536
2012-10-24 01:58:48 +00:00
Michael Liao
7ca099f727 Fix PR14161
- Check index being extracted to be constant 0 before simplfiying.
  Otherwise, retain the original sequence.

llvm-svn: 166504
2012-10-23 21:40:15 +00:00
Michael Liao
23c890e0a8 Enable lowering ZERO_EXTEND/ANY_EXTEND to PMOVZX from SSE4.1
llvm-svn: 166486
2012-10-23 17:34:00 +00:00
Reed Kotler
730040c219 implement setXX patterns
llvm-svn: 166459
2012-10-23 01:35:48 +00:00
Bill Wendling
e97df2d337 When a block ends in an indirect branch, add its successors to the machine basic block.
The CFG of the machine function needs to know that the targets of the indirect
branch are successors to the indirect branch.
<rdar://problem/12529625>

llvm-svn: 166448
2012-10-22 23:30:04 +00:00
Akira Hatanaka
5439c43726 [mips] Use 64-bit registers to return an sret pointer if target ABI is N64.
llvm-svn: 166344
2012-10-19 22:11:40 +00:00
Akira Hatanaka
32235d5612 [mips] Add code to do tail call optimization.
Currently, it is enabled only if option "enable-mips-tail-calls" is given and
all of the callee's arguments are passed in registers.

llvm-svn: 166342
2012-10-19 21:47:33 +00:00
Shuxin Yang
3ad15929e7 This patch is to fix radar://8426430. It is about llvm support of __builtin_debugtrap()
which is supposed to consistently raise SIGTRAP across all systems. In contrast,
__builtin_trap() behave differently on different systems. e.g. it raises SIGTRAP on ARM, and
SIGILL on X86. The purpose of __builtin_debugtrap() is to consistently provide "trap"
functionality, in the mean time preserve the compatibility with on gcc on __builtin_trap().

  The X86 backend is already able to handle debugtrap(). This patch is to:
  1) make front-end recognize "__builtin_debugtrap()" (emboddied in the one-line change to Clang).
  2) In DAG legalization phase, by default, "debugtrap" will be replaced with "trap", which
     make the __builtin_debugtrap() "available" to all existing ports without the hassle of
     changing their code.
  3) If trap-function is specified (via -trap-func=xyz to llc), both __builtin_debugtrap() and
     __builtin_trap() will be expanded into the function call of the specified trap function.
    This behavior may need change in the future.

  The provided testing-case is to make sure 2) and 3) are working for ARM port, and we
already have a testing case for x86. 

llvm-svn: 166300
2012-10-19 20:11:16 +00:00
Michael Liao
9984a3bee3 Lower BUILD_VECTOR to SHUFFLE + INSERT_VECTOR_ELT for X86
- If INSERT_VECTOR_ELT is supported (above SSE2, either by custom
  sequence of legal insn), transform BUILD_VECTOR into SHUFFLE +
  INSERT_VECTOR_ELT if most of elements could be built from SHUFFLE with few
  (so far 1) elements being inserted.

llvm-svn: 166288
2012-10-19 17:15:18 +00:00
Stepan Dyatkovskiy
ece4c2a9c1 ARM:
Removed extra stack frame object for fixed byval arguments,
VarArgsStyleRegisters invocation was reworked due to some improper usage in
past. PR14099 also demonstrates it.

llvm-svn: 166273
2012-10-19 08:23:06 +00:00
Sebastian Pop
2e267f64cb Clear unknown mem ops when merging stack slots (pr14090)
When merging stack slots, if StackColoring::remapInstructions gets a
value back from GetUnderlyingObject that it does not know about or is
not itself a stack slot, clear the memory operand in case it aliases
the merged slot. This prevents the introduction of incorrect aliasing
information.

Author:    Matthew Curtis <mcurtis@codeaurora.org>
llvm-svn: 166216
2012-10-18 19:53:48 +00:00
Nadav Rotem
12105d6078 In SimplifySelectOps we pulled two loads through a select node despite the fact that one was dependent on the other.
rdar://12513091

llvm-svn: 166196
2012-10-18 18:06:48 +00:00
Ulrich Weigand
2248bca601 This patch fixes failures in the SingleSource/Regression/C/uint64_to_float
test case on PowerPC caused by rounding errors when converting from a 64-bit
integer to a single-precision floating point. The reason for this are
double-rounding effects, since on PowerPC we have to convert to an
intermediate double-precision value first, which gets rounded to the
final single-precision result.

The patch fixes the problem by preparing the 64-bit integer so that the
first conversion step to double-precision will always be exact, and the
final rounding step will result in the correctly-rounded single-precision
result.  The generated code sequence is equivalent to what GCC would generate.

When -enable-unsafe-fp-math is in effect, that extra effort is omitted
and we accept possible rounding errors (just like GCC does as well).

llvm-svn: 166178
2012-10-18 13:16:11 +00:00
Michael Liao
f58d16a933 Revert part of r166049 back and enable test case in r166125.
- Folding (trunc (concat ... X )) to (concat ... (trunc X) ...) is valid
  when '...' are all 'undef's.
- r166125 relies on this transformation.

llvm-svn: 166155
2012-10-17 23:45:54 +00:00
Michael Liao
1feff8ece7 Disable extract-concat test case temporarily
llvm-svn: 166141
2012-10-17 23:08:19 +00:00