1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

5122 Commits

Author SHA1 Message Date
Akira Hatanaka
6caf61a6ac Test case for 137484
llvm-svn: 137486
2011-08-12 18:12:06 +00:00
Akira Hatanaka
b787f8a8a5 Enclose directive .cprestore with .set macro and nomacro to silence assembler
warning. 

llvm-svn: 137378
2011-08-11 22:42:31 +00:00
Bruno Cardoso Lopes
328a6a980b Add a dag combine to xform 256-bit shuffles into simple vector
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.

llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes
884d8b9cb5 Fix the test added by Nadav in r137308. Make it more strict:
1) check for the "v" version of movaps
2) add a couple of CHECK-NOT to guarantee the behavior
3) move to a more appropriate test file

llvm-svn: 137361
2011-08-11 21:50:35 +00:00
Bruno Cardoso Lopes
38d4afa02f Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Jim Grosbach
9717a9c0d3 ARM push of a single register encodes as pre-indexed STR.
Per the ARM ARM, a 'push' of a single register encodes as an STR,
not an STM.

llvm-svn: 137318
2011-08-11 18:07:11 +00:00
Jim Grosbach
abaaf4513f ARM pop of a single register encodes as post-indexed LDR.
Per the ARM ARM, a 'pop' of a single register encodes as an LDR,
not an LDM.

llvm-svn: 137316
2011-08-11 17:35:48 +00:00
Nadav Rotem
de1b485f3f [AVX] If the data which is going to be saved is already in two XMM registers
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.

Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)

After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)

llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Chris Lattner
3ae8704c4f add missing colon, thanks peter.
llvm-svn: 137306
2011-08-11 16:15:10 +00:00
Chris Lattner
575057916a fix PR10605 / rdar://9930964 by adding a pretty scary missed check.
It's somewhat surprising anything works without this.  Before we would
compile the testcase into:

test:                                   # @test
	movl	$4, 8(%rdi)
	movl	8(%rdi), %eax
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2

now we produce:

test:                                   # @test
	movl	8(%rdi), %eax
	movl	$4, 8(%rdi)
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2
llvm-svn: 137303
2011-08-11 06:26:54 +00:00
Bruno Cardoso Lopes
8674ddf55a Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
infinite recursive calls in legalize. Fix PR10562

llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes
954ac403c7 Use the splat index to generate the desired shuffle. Otherwise we
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.

llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman
17bd9e5d7c Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
Fixes PR9693.

llvm-svn: 137292
2011-08-11 01:48:05 +00:00
NAKAMURA Takumi
5d316f7632 test/CodeGen/X86/opt-shuff-tstore.ll: Add explicit -mtriple=x86_64-linux.
llvm-svn: 137262
2011-08-10 22:52:48 +00:00
Devang Patel
393d6e1fd0 While extending definition range of a debug variable, consult lexical scopes also. There is no point extending debug variable out side its lexical block. This provides 6x compile time speedup in some cases.
llvm-svn: 137250
2011-08-10 21:25:34 +00:00
Nadav Rotem
1b3075c0ab Fix the test. Add cpu target.
llvm-svn: 137241
2011-08-10 19:49:19 +00:00
Nadav Rotem
4a8d78d24a When performing a truncating store, it is sometimes possible to rearrange the
data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.

llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes
565ab1542a The following X86 pattern is incorrect:
def : Pat<(X86Movss VR128:$src1,
                   (bc_v4i32 (v2i64 (load addr:$src2)))),
          (MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.

llvm-svn: 137227
2011-08-10 17:45:17 +00:00
Rafael Espindola
45cd7316b5 Add support for the R and Q constraints.
llvm-svn: 137217
2011-08-10 16:26:42 +00:00
Bruno Cardoso Lopes
4a435a361d Fix a bug in vpermilps mask checking. Fix PR10560
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes
9a695724bd Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes
7461b930f3 Add v16i16 and v32i8 store patterns
llvm-svn: 137166
2011-08-09 22:39:53 +00:00
Bruno Cardoso Lopes
028c6aa951 Use fp unpack instructions to unpack int types. Until we have AVX2, this
is the best we can do for these patterns. This fix PR10554.

llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Eli Friedman
44fd5b2b59 Fix a couple ridiculous copy-paste errors. rdar://9914773 .
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Bill Wendling
250ea7930e Revert r137134. It breaks some code as Eli pointed out.
llvm-svn: 137135
2011-08-09 18:56:35 +00:00
Bill Wendling
ca256c0d2d Print out the variable declaration only if it is a declaration. Otherwise, a
'static' variable will be emitted twice.
PR10081

llvm-svn: 137134
2011-08-09 18:31:50 +00:00
Jakob Stoklund Olesen
e43aca1c39 Inflate register classes after coalescing.
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class.  Examples are:

  x86: GR32_ABCD:sub_8bit_hi -> GR32
  arm: DPR_VFP2:ssub0 -> DPR

Recompute the register class of any virtual registers that are used by
less instructions after coalescing.

This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:

  vadd.f32  d16, d1, d0
  vcvt.s32.f32  d0, d16

The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.

llvm-svn: 137133
2011-08-09 18:19:41 +00:00
Bruno Cardoso Lopes
633400ee00 Reapply a more appropriate solution than in r137114. AVX supports
v4f64 = sitofp v4i32. This fix PR10559.
Also add support for v4i32 = fptosi v4f64.

llvm-svn: 137128
2011-08-09 17:39:13 +00:00
Bruno Cardoso Lopes
1962a341d8 Revert r137114
llvm-svn: 137127
2011-08-09 17:39:01 +00:00
Justin Holewinski
021ab783b7 PTX: Add initial support for device function calls
- Calls are supported on SM 2.0+ for function with no return values

llvm-svn: 137125
2011-08-09 17:36:31 +00:00
Bruno Cardoso Lopes
5dac86dac6 Handle sitofp between v4f64 <- v4i32. Fix PR10559
llvm-svn: 137114
2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes
d521431558 Add support for avx vector fextend
llvm-svn: 137105
2011-08-09 03:04:29 +00:00
Bruno Cardoso Lopes
81534df169 Rename and tidy up tests
llvm-svn: 137103
2011-08-09 03:04:23 +00:00
Bruno Cardoso Lopes
1025d1eb3b Add two patterns to match special vmovss and vmovsd cases. Also fix
the patterns already there to be more strict regarding the predicate.
This fixes PR10558

llvm-svn: 137100
2011-08-09 01:43:09 +00:00
Bruno Cardoso Lopes
d7eac41193 Make LowerVSETCC aware of AVX types and add patterns to match them.
llvm-svn: 137090
2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes
d8534855ff Add support for several vector shifts operations while in AVX mode. Fix PR10581
llvm-svn: 137067
2011-08-08 21:31:08 +00:00
Eli Friedman
7a34419c6f Fix up the patterns for SXTB, SXTH, UXTB, and UXTH so that they are correctly active without HasT2ExtractPack. PR10611.
llvm-svn: 137061
2011-08-08 19:49:37 +00:00
Jakob Stoklund Olesen
85931574b0 Don't clobber pending ST regs when FP regs are killed.
X86FloatingPoint keeps track of pending ST registers for an upcoming
inline asm instruction with fixed stack register constraints.  It does
this by remembering which FP register holds the value that should appear
at a fixed stack position for the inline asm.

When that FP register is killed before the inline asm, make sure to
duplicate it to a scratch register, so the ST register still has a live
FP reference.

This could happen when the same FP register was copied to two ST
registers, or when a spill instruction is inserted between the ST copy
and the inline asm.

This fixes PR10602.

llvm-svn: 137050
2011-08-08 17:15:43 +00:00
Rafael Espindola
2da6e6a1d8 print st_shndx with the correct number of bits.
llvm-svn: 136880
2011-08-04 15:50:13 +00:00
Rafael Espindola
c1a076eeb1 print st_other with the correct number of bits.
llvm-svn: 136877
2011-08-04 15:38:19 +00:00
Rafael Espindola
368850841d print st_type with the correct number of bits.
llvm-svn: 136875
2011-08-04 15:24:00 +00:00
Rafael Espindola
e08bb3d50f Print st_bind with the correct number of bits.
llvm-svn: 136874
2011-08-04 15:10:35 +00:00
Rafael Espindola
865ab6cb05 Print r_sym with the correct number of bits.
llvm-svn: 136873
2011-08-04 14:48:27 +00:00
Rafael Espindola
f65dd30907 Print r_type with the correct number of bits.
llvm-svn: 136872
2011-08-04 14:39:30 +00:00
Rafael Espindola
edfafcbfb0 Change anther counter to decimal.
llvm-svn: 136870
2011-08-04 14:01:03 +00:00
Rafael Espindola
3e8393e6f7 Don't print a counter in hex.
llvm-svn: 136869
2011-08-04 13:39:15 +00:00
Bill Wendling
60e17f8212 Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.
Fixes PR10527.

llvm-svn: 136853
2011-08-04 00:32:58 +00:00
Benjamin Kramer
d93ac7d0b6 Remove underscore that's breaking linux buildbots.
llvm-svn: 136833
2011-08-03 23:13:01 +00:00
Jakub Staszak
9d083611d4 Use MachineBranchProbabilityInfo in If-Conversion instead of its own heuristics.
llvm-svn: 136826
2011-08-03 22:34:43 +00:00
Jakob Stoklund Olesen
002075193b Handle IMPLICIT_DEF instructions in X86FloatingPoint.
This fixes PR10575.

llvm-svn: 136787
2011-08-03 16:33:19 +00:00
Devang Patel
99a2f0d98c Use byte offset, instead of element number, to access merged global.
llvm-svn: 136759
2011-08-03 01:25:46 +00:00
Rafael Espindola
cefc38659a Assume .cfi_startproc is the first thing in a function. If the function is
externally visable, create a local symbol to use in the CFE. If not, use the
function label itself.

Fixes PR10420.

llvm-svn: 136716
2011-08-02 20:24:22 +00:00
Bruno Cardoso Lopes
ac0984dc7e Make this kind of lowering to be supported by 256-bit instructions:
shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
To:
  shuffle (vload ptr)), undef, <1, 1, 1, 1>
Fix PR10494

llvm-svn: 136691
2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes
771876cade Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise
the legalizer. This commit together with the two previous ones fixes
PR10495.

llvm-svn: 136654
2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes
d3a5171087 Since vectors with all ones can't be created with a 256-bit instruction,
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547

llvm-svn: 136642
2011-08-01 19:51:53 +00:00
Richard Osborne
2cd07cf351 Fix crash with varargs function with no named parameters.
llvm-svn: 136623
2011-08-01 16:45:59 +00:00
Jakob Stoklund Olesen
0f099a3c58 Revert "Don't check liveness of unallocatable registers."
The ARM target depends on CPSR liveness being tracked after register
allocation.

llvm-svn: 136548
2011-07-30 00:57:25 +00:00
Jakob Stoklund Olesen
a05b70241c Don't check liveness of unallocatable registers.
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.

While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.

llvm-svn: 136541
2011-07-29 23:36:21 +00:00
Eric Christopher
96b31d5681 Add support for the 'Q' constraint.
Fixes rdar://9866494

llvm-svn: 136523
2011-07-29 21:18:58 +00:00
Bruno Cardoso Lopes
871df895f4 Fix two tests that I crashed in the previous commits. The mask elts
on the second half must be reindexed.

llvm-svn: 136454
2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes
2b3d85d81c Match VPERMIL masks more strictly and update the target specific mask
generation to always catch the weird cases.

llvm-svn: 136453
2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes
473d982caf Add v8i32 and v4i64 vpermil patterns
llvm-svn: 136451
2011-07-29 01:31:07 +00:00
Jakob Stoklund Olesen
cc29034b4c Transfer implicit operands in NEONMoveFixPass.
Later passes /are/ using this information when running the register
scavenger.

This fixes the second problem in PR10520.

llvm-svn: 136440
2011-07-29 00:27:35 +00:00
Jakob Stoklund Olesen
f97f492104 Add -verify-arm-pseudo-expand.
This hidden llc option runs the machine code verifier after expanding
ARM pseudo-instructions, but before if-conversion.

The machine code verifier is much better at pointing out liveness errors
that can trip up the register scavenger.

llvm-svn: 136439
2011-07-29 00:27:32 +00:00
Jakob Stoklund Olesen
5f429460ba Handle REG_SEQUENCE with implicitly defined operands.
Code like that would only be produced by bugpoint, but we should still
handle it correctly.

When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.

Fixes part of PR10520.

llvm-svn: 136401
2011-07-28 21:38:51 +00:00
Bruno Cardoso Lopes
e24a043703 Add patterns to generate copies for extract_subvector instead of
using vextractf128. This will reduce the number of issued instruction
for several avx codes.

llvm-svn: 136323
2011-07-28 01:26:50 +00:00
Bruno Cardoso Lopes
1f63a37172 Add a few patterns to match allzeros without having to use the fp unit.
Take advantage that the 128-bit vpxor zeros the higher part and use it.
This also fixes PR10491

llvm-svn: 136321
2011-07-28 01:26:43 +00:00
Bruno Cardoso Lopes
06d8be564f Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move
a convert pattern close to the instruction definition.

llvm-svn: 136320
2011-07-28 01:26:39 +00:00
Bruno Cardoso Lopes
8830fde434 The vpermilps and vpermilpd have different behaviour regarding the
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200
2011-07-27 00:56:34 +00:00
Devang Patel
e85a416d4e It is quiet possible that inlined function body is split into multiple chunks of consequtive instructions. But, there is not any way to describe this in .debug_inline accelerator table used by gdb. However, describe non contiguous ranges of inlined function body appropriately using AT_range of DW_TAG_inlined_subroutine debug info entry.
llvm-svn: 136196
2011-07-27 00:34:13 +00:00
Jakob Stoklund Olesen
3f729850d3 Eliminate copies of undefined values during coalescing.
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.

This fixes PR10503.

llvm-svn: 136174
2011-07-26 23:00:24 +00:00
Benjamin Kramer
32a2ce8416 Update test.
llvm-svn: 136170
2011-07-26 22:45:39 +00:00
Benjamin Kramer
bfc2dfe3f7 Add a neat little two's complement hack for x86.
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.

This code is generated for __builtin_clz and friends.

llvm-svn: 136167
2011-07-26 22:42:13 +00:00
Bruno Cardoso Lopes
e53bb853ea Recognize unpckh* masks and match 256-bit versions. The new versions are
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
2011-07-26 22:03:40 +00:00
Eli Friedman
4e16c5341a Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130.
llvm-svn: 136148
2011-07-26 21:02:58 +00:00
Jim Grosbach
906ecb46ed FileCheck'ize test.
llvm-svn: 136135
2011-07-26 20:49:44 +00:00
Eli Friedman
8779017138 XFAIL this test while I investigate it; it's failing for an unexpected reason.
llvm-svn: 136131
2011-07-26 20:41:03 +00:00
Eli Friedman
e52bee3cc9 Add obvious missing case to switch. PR10497.
llvm-svn: 136130
2011-07-26 20:38:49 +00:00
Bruno Cardoso Lopes
ab40a57cce Add 256-bit isel for movsldup/movshdup
llvm-svn: 136051
2011-07-26 02:39:32 +00:00
Bruno Cardoso Lopes
c94d6a2d2c Codegen allonesvector better while using AVX: vpcmpeqd + vinsertf128
This also fixes PR10452

llvm-svn: 136004
2011-07-25 23:05:32 +00:00
Bruno Cardoso Lopes
9380919dc5 - Handle special scalar_to_vector case: splats. Using a native 128-bit
shuffle before inserting on a 256-bit vector.
- Add AVX versions of movd/movq instructions
- Introduce a few COPY patterns to match insert_subvector instructions.
This turns a trivial insert_subvector instruction into a register copy,
coalescing the xmm into a ymm and avoid emiting on more instruction.

llvm-svn: 136002
2011-07-25 23:05:25 +00:00
Eli Friedman
dc213dadcc Attempt to fix test failure reported on llvm-commits.
llvm-svn: 135995
2011-07-25 22:28:51 +00:00
Eli Friedman
99fd6d41b5 Make sure this DAGCombine actually returns an UNDEF of the correct type; PR10476.
llvm-svn: 135993
2011-07-25 22:25:42 +00:00
Eli Friedman
234bbb2b95 Get rid of an incorrect optimization for shuffles with PALIGNR and simplify isPALIGNRMask.
Addresses PR10466, although the crash from that PR only triggers in cases where DAGCombine misses optimizing a shuffle.

llvm-svn: 135980
2011-07-25 21:36:45 +00:00
Jakob Stoklund Olesen
0e4f7f92a2 Correctly handle <undef> tied uses when rewriting after a split.
This fixes PR10463. A two-address instruction with an <undef> use
operand was incorrectly rewritten so the def and use no longer used the
same register, violating the tie constraint.

Fix this by always rewriting <undef> operands with the register a def
operand would use.

llvm-svn: 135885
2011-07-24 20:23:50 +00:00
Bruno Cardoso Lopes
7347599e42 Fix test check!
llvm-svn: 135802
2011-07-22 20:55:28 +00:00
Bruno Cardoso Lopes
50a38b479a Fix PR10422 by adding the necessary AVX UCOMISD memory versions to
load folding logic

llvm-svn: 135801
2011-07-22 20:53:20 +00:00
Rafael Espindola
0c8190c4a3 Turn shuffles into unpacks for VT == MVT::v2i64 and MVT::v2f64
too. Patch by Jeff Muizelaar.

llvm-svn: 135789
2011-07-22 18:56:05 +00:00
Bruno Cardoso Lopes
b7b9688aa5 -Inspected a AVX code block added by someone in early Feb. This was never used
and was actually very wrong, fix it and make it simpler. Also remove the
ConcatVectors function, which is unused now.

- Fix a introduction of useless nodes in r126664 and r126264. The
VUNPCKL* should never be introduced cause we don't want duplicate
nodes for 128 AVX and non-AVX modes, the actual instruction
difference only exists during isel, but not for target specific DAG
nodes. We only introduce V* target nodes when there is no 128-bit
version already there.

- Fix a fragile test and make it more useful.

llvm-svn: 135729
2011-07-22 00:15:07 +00:00
Bruno Cardoso Lopes
85357a460f Although we already support this, add testcases for consistency
llvm-svn: 135728
2011-07-22 00:15:03 +00:00
Bruno Cardoso Lopes
1ee6122518 Add a DAGCombine for transforming 128->256 casts into a simple
vxorps + vinsertf128 pair of instructions

llvm-svn: 135727
2011-07-22 00:15:00 +00:00
Bruno Cardoso Lopes
3691063149 - Register v16i16 as valid VR256 register class
- Add more bitcasts for v16i16
- Since 135661 and 135662 already added the splat logic,
just add one more splat test for v16i16

llvm-svn: 135663
2011-07-21 02:24:08 +00:00
Bruno Cardoso Lopes
ba1a2a9135 Add support for 256-bit versions of VPERMIL instruction. This is a new
instruction introduced in AVX, which can operate on 128 and 256-bit vectors.
It considers a 256-bit vector as two independent 128-bit lanes. It can permute
any 32 or 64 elements inside a lane, and restricts the second lane to
have the same permutation of the first one. With the improved splat support
introduced early today, adding codegen for this instruction enable more
efficient 256-bit code:

Instead of:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vextractf128  $1, %ymm0, %xmm1
  shufps  $1, %xmm1, %xmm1
  movss %xmm1, 28(%rsp)
  movss %xmm1, 24(%rsp)
  movss %xmm1, 20(%rsp)
  movss %xmm1, 16(%rsp)
  vextractf128  $0, %ymm0, %xmm0
  shufps  $1, %xmm0, %xmm0
  movss %xmm0, 12(%rsp)
  movss %xmm0, 8(%rsp)
  movss %xmm0, 4(%rsp)
  movss %xmm0, (%rsp)
  vmovaps (%rsp), %ymm0
We get:
  vextractf128  $0, %ymm0, %xmm0
  punpcklbw %xmm0, %xmm0
  punpckhbw %xmm0, %xmm0
  vinsertf128 $0, %xmm0, %ymm0, %ymm1
  vinsertf128 $1, %xmm0, %ymm1, %ymm0
  vpermilps $85, %ymm0, %ymm0

llvm-svn: 135662
2011-07-21 01:55:47 +00:00
Devang Patel
9914fe1aca While emitting constant value, look through derived type and use underlying basic type to determine size and signness of the constant value.
llvm-svn: 135627
2011-07-20 21:57:04 +00:00
Eli Friedman
3af0eb7b5f PR10421: Fix a straightforward bug in the widening logic for CONCAT_VECTORS.
llvm-svn: 135595
2011-07-20 18:14:33 +00:00
Evan Cheng
380dc98371 Add MCObjectFileInfo and sink the MCSections initialization code from
TargetLoweringObjectFileImpl down to MCObjectFileInfo.

TargetAsmInfo is done to one last method. It's *almost* gone!

llvm-svn: 135569
2011-07-20 05:58:47 +00:00
Eric Christopher
7510091996 New pointer rotate test.
llvm-svn: 135562
2011-07-20 03:09:11 +00:00
Akira Hatanaka
a50bbdfe15 Lower memory barriers to sync instructions.
llvm-svn: 135537
2011-07-19 23:30:50 +00:00
Evan Cheng
9a80b0a7e6 Fix an obvious typo that's preventing x86 (32-bit) from using .literal16.
llvm-svn: 135535
2011-07-19 23:14:32 +00:00
Akira Hatanaka
14e517df43 Use the correct opcodes: SLLV/SRLV or AND must be used instead of SLL/SRL or
ANDi, when the instruction does not have any immediate operands.

llvm-svn: 135520
2011-07-19 20:34:00 +00:00
Akira Hatanaka
f59cbeec14 Remove redundant instructions.
- In EmitAtomicBinaryPartword, mask incr in loopMBB only if atomic.swap is the
  instruction being expanded, instead of masking it in thisMBB. 
- Remove redundant Or in EmitAtomicCmpSwap. 

llvm-svn: 135495
2011-07-19 18:14:26 +00:00
Richard Osborne
b469141419 Add intrinsics for the zext / sext instructions.
llvm-svn: 135476
2011-07-19 13:28:50 +00:00
Richard Osborne
50303e0d38 Add intrinsics for the testct, testwct instructions.
llvm-svn: 135475
2011-07-19 13:00:40 +00:00
Richard Osborne
409c0d7768 Add intrinsics for the peek and endin instructions.
llvm-svn: 135474
2011-07-19 12:50:25 +00:00
Evan Cheng
bfc0cac54d Introduce MCCodeGenInfo, which keeps information that can affect codegen
(including compilation, assembly). Move relocation model Reloc::Model from
TargetMachine to MCCodeGenInfo so it's accessible even without TargetMachine.

llvm-svn: 135468
2011-07-19 06:37:02 +00:00
Devang Patel
72886ba8d8 Revert r135423.
llvm-svn: 135454
2011-07-19 00:28:24 +00:00
Eli Friedman
887bb0b25a FileCheck-ize a couple tests.
llvm-svn: 135427
2011-07-18 21:23:42 +00:00
Devang Patel
389cb9d8c6 During bottom up fast-isel, instructions emitted to materalize registers are at top of basic block and do not have debug location. This may misguide debugger while entering the basic block and sometimes debugger provides semi useful view of current location to developer by picking up previous known location as current location. Assign a sensible location to the first instruction in a basic block, if it does not have one location derived from source file, so that debugger can provide meaningful user experience to developers in edge cases.
[take 2]

llvm-svn: 135423
2011-07-18 20:55:23 +00:00
Akira Hatanaka
52263f51f1 Do not treat atomic.load.sub differently than other atomic binary intrinsics.
llvm-svn: 135418
2011-07-18 19:58:59 +00:00
Akira Hatanaka
79f38f0ae7 Set mayLoad or mayStore flags for SC and LL in order to prevent LICM from
moving them out of the loop. Previously, stores and loads to a stack frame
object were inserted to accomplish this. Remove the code that was needed to do
this. Patch by Sasa Stankovic.

llvm-svn: 135415
2011-07-18 18:52:12 +00:00
Jakob Stoklund Olesen
89e84069d2 Fix a crash when building 177.mesa for armv6.
When splitting a live range immediately before an LDR_POST instruction
that redefines the address register, make sure to use the correct value
number in leaveIntvBefore.

We need the value number entering the instruction.

<rdar://problem/9793765>

llvm-svn: 135413
2011-07-18 18:47:13 +00:00
Bruno Cardoso Lopes
da90f383ab Add AVX 128-bit sqrt versions
llvm-svn: 135404
2011-07-18 17:51:40 +00:00
Nick Lewycky
47f28ebead Delete empty unused file.
llvm-svn: 135379
2011-07-18 05:54:06 +00:00
Bruno Cardoso Lopes
d258749f73 Add AVX 128-bit patterns for sint_to_fp
llvm-svn: 135332
2011-07-16 00:50:20 +00:00
Bruno Cardoso Lopes
d5b62f3403 Fix a couple of things:
1) Make non-legal 256-bit loads to be promoted to v4i64. This lets us
canonize the loads and handle things the same way we use to handle
for 128-bit registers. Despite of what one of the removed comments
explained, the load promotion would not mess with VPERM, it's only a
matter of doing the appropriate bitcasts when this instructions comes
to be introduced. Also make LOAD v8i32 legal.

2) Doing 1) exposed two bugs:
- v4i64 was being promoted to itself for several opcodes (introduced
in r124447 by David Greene) causing endless recursion and the stack to
explode.
- there was no support for allOnes BUILD_VECTORs and ANDNP would fail to
match because it was generating early target constant pools during
lowering.

3) The testcases are already checked-in, doing 1) exposed the
bugs in the current testcases.

4) Tidy up code to be more clear and explicit about AVX.

llvm-svn: 135313
2011-07-15 22:24:33 +00:00
Owen Anderson
7a380bac06 Remove VMOVDneon and VMOVQ, which are just aliases for VORR. This continues to simplify the path towards an auto-generated disassembler.
llvm-svn: 135290
2011-07-15 18:46:47 +00:00
Eric Christopher
ca7ae418a5 Check register class matching instead of width of type matching
when determining validity of matching constraint. Allow i1
types access to the GR8 reg class for x86.

Fixes PR10352 and rdar://9777108

llvm-svn: 135180
2011-07-14 20:13:52 +00:00
Bruno Cardoso Lopes
d24f039847 Add 256-bit load/store recognition and matching in several places.
llvm-svn: 135171
2011-07-14 18:50:58 +00:00
Eric Christopher
be21240f6f Add a testcase for r135123.
Part of rdar://9761830

llvm-svn: 135133
2011-07-14 06:23:09 +00:00
Benjamin Kramer
1cab6179ab Don't emit a bit test if there is only one case the test can yield false. A simple SETNE is sufficient.
llvm-svn: 135126
2011-07-14 01:38:42 +00:00
Bruno Cardoso Lopes
f29783ee55 We already support 256-bit packed ADD, SUB, DIV, MUL. Add testcases.
llvm-svn: 135099
2011-07-13 22:28:55 +00:00
Bruno Cardoso Lopes
c0401dddf7 Make X86ISD::ANDNP more general and Codegen 256-bit VANDNP. A more
general version of X86ISD::ANDNP also opened the room for a little bit
of refactoring.

llvm-svn: 135088
2011-07-13 21:36:51 +00:00
Eli Friedman
30d557cc28 Make sure we don't combine a large displacement and a frame index in the same addressing mode on x86-64. It can overflow, leading to a crash/miscompile.
<rdar://problem/9763308>

llvm-svn: 135084
2011-07-13 21:29:53 +00:00
Bruno Cardoso Lopes
cb49278ad6 AVX Codegen support for 256-bit versions of vandps, vandpd, vorps, vorpd, vxorps, vxorpd
llvm-svn: 135023
2011-07-13 01:15:33 +00:00
Evan Cheng
37ff73dfaf Improve codegen for select's:
if (x != 0) x = 1
if (x == 1) x = 1

Previous codegen looks like this:
        mov     r1, r0
        cmp     r1, #1
        mov     r0, #0
        moveq   r0, #1

The naive lowering select between two different values. It should recognize the
test is equality test so it's more a conditional move rather than a select:
        cmp     r0, #1
        movne   r0, #0

rdar://9758317

llvm-svn: 135017
2011-07-13 00:42:17 +00:00
Jim Grosbach
863f0216d5 Improve test cases from r134746.
Use memory barriers to force if-conversion off for these tests instead of
the internal llc command line option ifcvt-limit.

llvm-svn: 134986
2011-07-12 16:06:01 +00:00
Andrew Trick
a53688c65c Comment correction.
llvm-svn: 134958
2011-07-12 03:39:22 +00:00
Jim Grosbach
93f2ebb5e7 Simplify printing of ARM shifted immediates.
Print shifted immediate values directly rather than as a payload+shifter
value pair. This makes for more readable output assembly code, simplifies
the instruction printer, and is consistent with how Thumb immediates are
 displayed.

llvm-svn: 134902
2011-07-11 16:48:36 +00:00
NAKAMURA Takumi
183ec41f4a test/CodeGen/PowerPC/vector.ll: Tweak redirection >%t >%t to >%t >>%t. See also r134814 (test/CodeGen/X86/vector.ll).
llvm-svn: 134900
2011-07-11 16:21:52 +00:00
Cameron Zwarich
1efde78890 Add a missing test for r134882.
llvm-svn: 134889
2011-07-11 08:35:17 +00:00
Chris Lattner
a106725fc5 Land the long talked about "type system rewrite" patch. This
patch brings numerous advantages to LLVM.  One way to look at it
is through diffstat:
 109 files changed, 3005 insertions(+), 5906 deletions(-)

Removing almost 3K lines of code is a good thing.  Other advantages
include:

1. Value::getType() is a simple load that can be CSE'd, not a mutating
   union-find operation.
2. Types a uniqued and never move once created, defining away PATypeHolder.
3. Structs can be "named" now, and their name is part of the identity that
   uniques them.  This means that the compiler doesn't merge them structurally
   which makes the IR much less confusing.
4. Now that there is no way to get a cycle in a type graph without a named
   struct type, "upreferences" go away.
5. Type refinement is completely gone, which should make LTO much MUCH faster
   in some common cases with C++ code.
6. Types are now generally immutable, so we can use "Type *" instead 
   "const Type *" everywhere.

Downsides of this patch are that it removes some functions from the C API,
so people using those will have to upgrade to (not yet added) new API.  
"LLVM 3.0" is the right time to do this.

There are still some cleanups pending after this, this patch is large enough
as-is.

llvm-svn: 134829
2011-07-09 17:41:24 +00:00
Chris Lattner
4ddffa2acc more tests not making the jump into the brave new world.
llvm-svn: 134820
2011-07-09 16:57:10 +00:00
NAKAMURA Takumi
2cbabf301a test/CodeGen/X86/vector.ll: Tweak temporary output to appease Win32 hosts.
With Lit (not bash) in a test, multiple redirects >%t might open(%t, "w") multiple. It can be avoided if latter redirect is >>%t.

It might work even if ">/dev/null" were used.

llvm-svn: 134814
2011-07-09 10:22:28 +00:00
Jakob Stoklund Olesen
fe41eb3bda Hoist spills within a basic block.
Try to move spills as early as possible in their basic block. This can
help eliminate interferences by shortening the live range being
spilled.

This fixes PR10221.

llvm-svn: 134776
2011-07-09 00:25:03 +00:00
Evan Cheng
9719ca7c76 Fix broken x86_64 tests which specify non-64-bit cpu's.
llvm-svn: 134756
2011-07-08 22:29:33 +00:00
Eli Friedman
0ea2c325a9 Default 64-bit target features and SSE2 on when a triple specifies x86-64. Clean up all the other hacks which are now unnecessary.
llvm-svn: 134753
2011-07-08 22:16:47 +00:00
Jim Grosbach
2b8103505a Make tBX_RET and tBX_RET_vararg predicable.
The normal tBX instruction is predicable, so there's no reason the
pseudos for using it as a return shouldn't be. Gives us some nice code-gen
improvements as can be seen by the test changes. In particular, several
tests now have to disable if-conversion because it works too well and defeats
the test.

llvm-svn: 134746
2011-07-08 21:50:04 +00:00
Julien Lerouge
75e462e164 Add _allrem, _aullrem and _allmul to the runtime for MSVC.
http://llvm.org/bugs/show_bug.cgi?id=10305

llvm-svn: 134744
2011-07-08 21:40:25 +00:00
Cameron Zwarich
c23366d357 Add an intrinsic and codegen support for fused multiply-accumulate. The intent
is to use this for architectures that have a native FMA instruction.

llvm-svn: 134742
2011-07-08 21:39:21 +00:00
Jakob Stoklund Olesen
acaf9e9ce1 Be more aggressive about following hints.
RAGreedy::tryAssign will now evict interference from the preferred
register even when another register is free.

To support this, add the EvictionCost struct that counts how many hints
are broken by an eviction. We don't want to break one hint just to
satisfy another.

Rename canEvict to shouldEvict, and add the first bit of eviction policy
that doesn't depend on spill weights: Always make room in the preferred
register as long as the evictees can be split and aren't already
assigned to their preferred register.

Also make the CSR avoidance more accurate. When looking for a cheaper
register it is OK to use a new volatile register. Only CSR aliases that
have never been used before should be avoided.

llvm-svn: 134735
2011-07-08 20:46:18 +00:00
Jim Grosbach
435ca7304c Use ARMPseudoExpand for ARM tail calls.
llvm-svn: 134719
2011-07-08 18:50:22 +00:00
Benjamin Kramer
44c76d239a Emit a more efficient magic number multiplication for exact sdivs.
We have to do this in DAGBuilder instead of DAGCombiner, because the exact bit is lost after building.

  struct foo { char x[24]; };
  long bar(struct foo *a, struct foo *b) { return a-b; }
is now compiled into
  movl	4(%esp), %eax
  subl	8(%esp), %eax
  sarl	$3, %eax
  imull	$-1431655765, %eax, %eax
instead of
  movl	4(%esp), %eax
  subl	8(%esp), %eax
  movl	$715827883, %ecx
  imull	%ecx
  movl	%edx, %eax
  shrl	$31, %eax
  sarl	$2, %edx
  addl	%eax, %edx
  movl	%edx, %eax

llvm-svn: 134695
2011-07-08 10:31:30 +00:00
Jakob Stoklund Olesen
99c67603c7 Fix more register allocation sensitive tests.
llvm-svn: 134667
2011-07-08 00:24:06 +00:00
Jakob Stoklund Olesen
47bc41b3c3 Remove a test that no longer makes sense.
It was testing a linear scan feature:

  Test if linearscan is unfavoring registers for allocation to allow
  more reuse of reloads from stack slots.

The greedy register allocator doesn't access any stack slots in this
function, so the linear scan feature was not being tested.

llvm-svn: 134666
2011-07-08 00:24:03 +00:00
Nick Lewycky
a82f7a687e Let the inline asm 'q' constraint match float, and on 64-bit double too.
Fixes PR9602!

llvm-svn: 134665
2011-07-08 00:19:27 +00:00
Eric Christopher
5fb023bb10 Go ahead and emit the barrier on x86-64 even without sse2. The
processor supports it just fine.

Fixes PR9675 and rdar://9740801

llvm-svn: 134664
2011-07-08 00:04:56 +00:00
Eric Christopher
b7597bc669 Add support for the X86 'l' constraint.
Fixes PR10149 and rdar://9738585

llvm-svn: 134648
2011-07-07 22:29:07 +00:00
Evan Cheng
bbed81df25 Add Mode64Bit feature and sink it down to MC layer.
llvm-svn: 134641
2011-07-07 21:06:52 +00:00
Evan Cheng
952943f744 Change some ARM subtarget features to be single bit yes/no in order to sink them down to MC layer. Also fix tests.
llvm-svn: 134590
2011-07-07 03:55:05 +00:00
Lang Hames
2c2f6ed1f7 Added a testcase for PR10220.
llvm-svn: 134573
2011-07-07 00:36:02 +00:00
Jakub Staszak
28bcc8673e Introduce "expect" intrinsic instructions.
llvm-svn: 134516
2011-07-06 18:22:43 +00:00
Dan Gohman
151e8ce446 Revert r134366 and add an explicit triple to make this test host-independent.
llvm-svn: 134447
2011-07-05 22:09:19 +00:00
Jakob Stoklund Olesen
f95a1068bd Fix PR10277.
Remat during spilling triggers dead code elimination. If a phi-def
becomes unused, that may also cause live ranges to split into separate
connected components.

This type of splitting is different from normal live range splitting. In
particular, there may not be a common original interval.

When the split range is its own original, make sure that the new
siblings are also their own originals. The range being split cannot be
used as an original since it doesn't cover the new siblings.

llvm-svn: 134413
2011-07-05 15:38:41 +00:00
NAKAMURA Takumi
c0837d703b test/CodeGen/X86/lsr-nonaffine.ll: Relax expressions for Win64 CC to appease Win32 hosts.
llvm-svn: 134366
2011-07-03 09:26:14 +00:00
Chandler Carruth
e07bb36a9e FileCheck-ize another test. Reduces the llc invocations from 8 to 1, and
makes one of the tests actually mean something (as the string 'add' will
always appear in the output of this file).

llvm-svn: 134358
2011-07-02 21:34:52 +00:00
Chandler Carruth
78b12b3ed4 FileCheck-ize another X86 test, making it more precisely verify the
desired result based on the comments in the file.

llvm-svn: 134354
2011-07-02 20:43:16 +00:00
Chandler Carruth
1926e141f1 FileCheck-ize and simplify RUN lines.
llvm-svn: 134352
2011-07-02 20:43:11 +00:00
Chandler Carruth
5de1d825e4 FileCheck-ize
llvm-svn: 134351
2011-07-02 20:43:08 +00:00
Chandler Carruth
01e8f9314e FileCheck-ize and tighten up assertions to only check the relevant sections.
llvm-svn: 134350
2011-07-02 20:43:04 +00:00
Chandler Carruth
500b05b1bb FileCheck-ize and cleanup IR.
llvm-svn: 134349
2011-07-02 20:43:01 +00:00
Chandler Carruth
c674fb38ef FileCheck-ize
llvm-svn: 134348
2011-07-02 20:42:59 +00:00
Chandler Carruth
341ed5f0a0 Remove a grep that is already checked with FileCheck.
llvm-svn: 134346
2011-07-02 20:42:56 +00:00
Chandler Carruth
88e183829b FileCheck-ize
llvm-svn: 134345
2011-07-02 20:42:53 +00:00
Chandler Carruth
7a0f51e003 FileCheck-ize and modernize IR.
llvm-svn: 134344
2011-07-02 20:42:50 +00:00
Chandler Carruth
4af34fe339 FileCheck-ize and simplify RUNs.
llvm-svn: 134343
2011-07-02 20:42:48 +00:00
Chandler Carruth
9e114fc3ee FileCheck-ize and modernize the RUN line.
llvm-svn: 134342
2011-07-02 20:42:44 +00:00
Chandler Carruth
df1690a113 FileCheck-ize, tightening checks and avoiding a temporary file.
llvm-svn: 134341
2011-07-02 20:42:42 +00:00
Chandler Carruth
a5b1de166b FileCheck-ize, tightening checks and avoiding a temporary file.
llvm-svn: 134340
2011-07-02 20:42:39 +00:00
Chandler Carruth
c041ee0766 FileCheck-ize
llvm-svn: 134339
2011-07-02 20:42:36 +00:00
Chandler Carruth
4f82b948fd FileCheck-ize
llvm-svn: 134338
2011-07-02 20:42:33 +00:00
Chandler Carruth
e344d9c676 FileCheck-ize a test, avoiding a temporary file.
llvm-svn: 134337
2011-07-02 20:42:31 +00:00
Chandler Carruth
d939fba46d FileCheck-ize and simplify this test.
llvm-svn: 134336
2011-07-02 20:42:28 +00:00
Chandler Carruth
b870175dd5 FileCheck-ize
llvm-svn: 134335
2011-07-02 20:42:25 +00:00
Chandler Carruth
d98a57cc5a FileCheck-ize another codegen test.
llvm-svn: 134334
2011-07-02 20:42:22 +00:00
Chandler Carruth
4c7e28777b Partially FileCheck-ize a test to remove a weird quoting situation.
llvm-svn: 134333
2011-07-02 20:42:20 +00:00
Chandler Carruth
0d1da937eb FileCheck-ize another test, and upgrade its syntax a bit.
llvm-svn: 134332
2011-07-02 20:42:17 +00:00
Chandler Carruth
4fd8502d12 FileCheck-ize another codegen test, tightening it up.
llvm-svn: 134331
2011-07-02 20:42:14 +00:00
Chandler Carruth
b74aff3ce8 FileCheck-ize another test, making it much more precise for testing the
individual cases, while hard coding less about registers in use.

llvm-svn: 134330
2011-07-02 20:42:11 +00:00
Chandler Carruth
70fa55f478 FileCheck-ize another test. This one is more clear and runs fewer
commands as a result.

llvm-svn: 134329
2011-07-02 20:42:08 +00:00
Chandler Carruth
72358a4bf8 FileCheck-ize a test, no functionality changed.
llvm-svn: 134328
2011-07-02 20:42:06 +00:00
Jakob Stoklund Olesen
b94d989634 Better diagnostics when inline asm fails to allocate.
asm.c:2:7: error: ran out of registers during register allocation
  asm(""::"r"(0), "r"(1), "r"(2), "r"(3), "r"(4), "r"(5), "r"(6), "r"(7), "r"(8), "r"(9));
        ^

llvm-svn: 134310
2011-07-02 07:17:37 +00:00
Eric Christopher
9689f96b1e Be less specific about register allocation ordering.
llvm-svn: 134308
2011-07-02 04:06:41 +00:00
Eric Christopher
7260817287 TargetConstant immediates won't be placed into registers so tighten
up the valid constant check earlier.

rdar://9692967

llvm-svn: 134286
2011-07-01 23:04:38 +00:00
Dan Gohman
c093f48834 Teach IVUsers to stop at non-affine expressions unless they are both
outside the loop and reducible.

This more completely hides them from LSR, which isn't usually able to
do anything meaningful with non-affine expressions anyway, and this
consequently hides them from SCEVExpander, which is acutely unprepared
for non-affine expressions.

Replace test/CodeGen/X86/lsr-nonaffine.ll with a new test that tests
the new behavior.

This works around the bug in PR10117 / rdar://problem/9633149, and is
generally an improvement besides.

llvm-svn: 134268
2011-07-01 22:05:19 +00:00
Jim Grosbach
461adc233e ARMv7M vs. ARMv7E-M support.
The DSP instructions in the Thumb2 instruction set are an optional extension
in the Cortex-M* archtitecture. When present, the implementation is considered
an "ARMv7E-M implementation," and when not, an "ARMv7-M implementation."

Add a subtarget feature hook for the v7e-m instructions and hook it up. The
cortex-m3 cpu is an example of a v7m implementation, while the cortex-m4 is
a v7e-m implementation.

rdar://9572992

llvm-svn: 134261
2011-07-01 21:12:19 +00:00
Eric Christopher
d369a9fe83 Add support for the 'j' immediate constraint. This is conditionalized on
supporting the instruction that the constraint is for 'movw'.

Part of rdar://9119939

llvm-svn: 134222
2011-07-01 01:00:07 +00:00
Eric Christopher
4bc6b7e1a6 Add support for the ARM 't' register constraint. And another testcase
for the 'x' register constraint.

Part of rdar://9119939

llvm-svn: 134220
2011-07-01 00:30:46 +00:00
Eric Christopher
d40f06b48f Add support for the 'x' constraint.
Part of rdar://9307836 and rdar://9119939

llvm-svn: 134215
2011-07-01 00:14:47 +00:00
Jakob Stoklund Olesen
8b22811785 Fix a problem with fast-isel return values introduced in r134018.
We would put the return value from long double functions in the wrong
register.

This fixes gcc.c-torture/execute/conversion.c

llvm-svn: 134205
2011-06-30 23:42:18 +00:00
Eric Christopher
2582061ec1 Add support for the 'h' constraint.
Part of rdar://9119939

llvm-svn: 134203
2011-06-30 23:23:01 +00:00
Jim Grosbach
32d3b2625b Thumb1 register to register MOV instruction is predicable.
Fix a FIXME and allow predication (in Thumb2) for the T1 register to
register MOV instructions. This allows some better codegen with
if-conversion (as seen in the test updates), plus it lays the groundwork
for pseudo-izing the tMOVCC instructions.

llvm-svn: 134197
2011-06-30 22:10:46 +00:00
Jim Grosbach
8c1fb3c4e1 Pseudo-ize the t2LDMIA_RET instruction.
It's just a t2LDMIA_UPD instruction with extra codegen properties, so it
doesn't need the encoding information. As a side-benefit, we now correctly
recognize for instruction printing as a 'pop' instruction.

llvm-svn: 134173
2011-06-30 18:25:42 +00:00
Eric Christopher
7ce905754f Fix a small thinko for constant i64 lock/orq optimization where we
we didn't have an opcode for 64-bit constant or expressions.

Fixes rdar://9692967

llvm-svn: 134121
2011-06-30 00:48:30 +00:00
Devang Patel
66c4bc1dda Revert r133953 for now.
llvm-svn: 134116
2011-06-29 23:50:13 +00:00
Cameron Zwarich
2ffbcf9b96 In the ARM global merging pass, allow extraneous alignment specifiers. This pass
already makes the assumption, which is correct on ARM, that a type's alignment is
less than its alloc size. This improves codegen with Clang (which inserts a lot of
extraneous alignment specifiers) and fixes <rdar://problem/9695089>.

llvm-svn: 134106
2011-06-29 22:24:25 +00:00
Benjamin Kramer
d97872524b Don't depend on the optimization reverted in r134067.
llvm-svn: 134068
2011-06-29 14:07:18 +00:00
Benjamin Kramer
cc91642a94 Revert a part of r126557 which could create unschedulable DAGs.
llvm-svn: 134067
2011-06-29 13:47:25 +00:00
Jakob Stoklund Olesen
7d3e1553d2 Clean up the handling of the x87 fp stack to make it more robust.
Drop the FpMov instructions, use plain COPY instead.

Drop the FpSET/GET instruction for accessing fixed stack positions.
Instead use normal COPY to/from ST registers around inline assembly, and
provide a single new FpPOP_RETVAL instruction that can access the return
value(s) from a call. This is still necessary since you cannot tell from
the CALL instruction alone if it returns anything on the FP stack. Teach
fast isel to use this.

This provides a much more robust way of handling fixed stack registers -
we can tolerate arbitrary FP stack instructions inserted around calls
and inline assembly. Live range splitting could sometimes break x87 code
by inserting spill code in unfortunate places.

As a bonus we handle floating point inline assembly correctly now.

llvm-svn: 134018
2011-06-28 18:32:28 +00:00
Roman Divacky
736e37d9b9 Implement ISD::VAARG lowering on PPC32.
llvm-svn: 134005
2011-06-28 15:30:42 +00:00
Jakob Stoklund Olesen
55a0ce1776 FileCheckize a couple of tests.
Also and add a test for popping dead return values and avoid testing the
spill precision.

llvm-svn: 133997
2011-06-28 06:25:03 +00:00
Chandler Carruth
910d35b98b FileCheck-ize a test that had the strangest TCL quote I've seen yet: an
opening single quote with no closing single quote, and with {} quotes
"inside" of it. This broke some of our tools that scrape test cases.

Also, while here, make the test actually assert what the comment says it
asserts. This was essentially authored by Nick Lewycky, and merely typed
in by myself. Let me know if this is still missing the mark, but the
previous test only succeeded due to the improper quoting preventing
*anything* from matching the grep -- it had a '4(%...)' sequence in the
output!

llvm-svn: 133980
2011-06-28 02:03:10 +00:00
Evan Cheng
7df851a4ff Remove the experimental (and unused) pre-ra splitting pass. Greedy regalloc can split live ranges.
llvm-svn: 133962
2011-06-27 23:40:45 +00:00
Devang Patel
8fbd4b55ea During bottom up fast-isel, instructions emitted to materalize registers are at top of basic block and do not have debug location. This may misguide debugger while entering the basic block and sometimes debugger provides semi useful view of current location to developer by picking up previous known location as current location. Assign a sensible location to the first instruction in a basic block, if it does not have one location derived from source file, so that debugger can provide meaningful user experience to developers in edge cases.
llvm-svn: 133953
2011-06-27 22:32:04 +00:00
Eric Christopher
bb65f96b18 Allow lr in the register options here.
llvm-svn: 133935
2011-06-27 20:31:01 +00:00
Jakob Stoklund Olesen
58c34c0e80 Move all inline-asm-fpstack tests to a single file.
Also fix some of the tests that were actually testing wrong behavior -
An input operand in {st} is only popped by the inline asm when {st} is
also in the clobber list.

The original bug reports all had ~{st} clobbers as they should.

llvm-svn: 133916
2011-06-27 17:27:37 +00:00
Dan Bailey
8de16fa817 PTX: corrected tests that were failing
llvm-svn: 133875
2011-06-25 19:41:17 +00:00
Dan Bailey
5b68fc5126 PTX: Reverting implementation of i8.
The .b8 operations in PTX are far more limiting than I first thought. The mov operation isn't even supported, so there's no way of converting a .pred value into a .b8 without going via .b16, which is
not sensible. An improved implementation needs to use the fact that loads and stores automatically extend and truncate to implement support for EXTLOAD and TRUNCSTORE in order to correctly support
boolean values.

llvm-svn: 133873
2011-06-25 18:16:28 +00:00
Chad Rosier
2c0dc1fb19 Test case for r133858 (tail call optimize in the presence of byval).
llvm-svn: 133863
2011-06-25 02:44:56 +00:00
Devang Patel
91fee59b74 Handle debug info for i128 constants.
llvm-svn: 133821
2011-06-24 20:46:11 +00:00
Dan Bailey
2237ea06fb PTX: Add support for i8 type and introduce associated .b8 registers
The i8 type is required for boolean values, but can only use ld, st and mov instructions. The i1 type continues to be used for predicates.

llvm-svn: 133814
2011-06-24 19:27:10 +00:00
Chad Rosier
3127a19140 The Neon VCVT (between floating-point and fixed-point, Advanced SIMD)
instructions can be used to match combinations of multiply/divide and VCVT 
(between floating-point and integer, Advanced SIMD).  Basically the VCVT 
immediate operand that specifies the number of fraction bits corresponds to a 
floating-point multiply or divide by the corresponding power of 2.

For example, VCVT (floating-point to fixed-point, Advanced SIMD) can replace a 
combination of VMUL and VCVT (floating-point to integer) as follows:

Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
  vmul.f32        d16, d17, d16
  vcvt.s32.f32    d16, d16
becomes:
  vcvt.s32.f32    d16, d16, #3

Similarly, VCVT (fixed-point to floating-point, Advanced SIMD) can replace a 
combinations of VCVT (integer to floating-point) and VDIV as follows:

Example (assume d17 = <float 8.000000e+00, float 8.000000e+00>):
  vcvt.f32.s32    d16, d16
  vdiv.f32        d16, d17, d16
becomes:
  vcvt.f32.s32    d16, d16, #3

llvm-svn: 133813
2011-06-24 19:23:04 +00:00
Akira Hatanaka
539ba34c25 Change the chain input of nodes that load the address of a function. This change
enables SelectionDAG::getLoad at MipsISelLowering.cpp:1914 to return a
pre-existing node instead of redundantly create a new node every time it is
called.

llvm-svn: 133811
2011-06-24 19:01:25 +00:00
Akira Hatanaka
3a3e7dfd84 Prevent generation of redundant addiu instructions that compute address of
static variables or functions. 

llvm-svn: 133803
2011-06-24 17:55:19 +00:00
Justin Holewinski
a1dd1dd26e PTX: Always use registers for return values, but use .param space for device
parameters if SM >= 2.0

- Update test cases to be more robust against register allocation changes
- Bump up the number of registers to 128 per type
- Include Python script to re-generate register file with any number of
  registers

llvm-svn: 133736
2011-06-23 18:10:13 +00:00
Justin Holewinski
acf53a172e PTX: Fixup test cases for device param changes
llvm-svn: 133735
2011-06-23 18:10:08 +00:00
Andrew Trick
aec8bc23bf lit support for REQUIRES: asserts.
Take #2. Don't piggyback on the existing config.build_mode. Instead,
define a new lit feature for each build feature we need (currently
just "asserts"). Teach both autoconf'd and cmake'd Makefiles to define
this feature within test/lit.site.cfg. This doesn't require any lit
harness changes and should be more robust across build systems.

llvm-svn: 133664
2011-06-22 23:23:19 +00:00
Rafael Espindola
e57d6977be Reenable tail duplication of bb with just an unconditional jump, but
don't remove blocks that have their address taken.

llvm-svn: 133659
2011-06-22 22:31:57 +00:00
Nick Lewycky
7f45c2bd84 Needs a triple.
llvm-svn: 133634
2011-06-22 19:42:14 +00:00
Nick Lewycky
bf55e4b776 Emit trailing padding on constant vectors when TargetData says that the vector
is larger than the sum of the elements (including per-element padding).

llvm-svn: 133631
2011-06-22 18:55:03 +00:00
Justin Holewinski
376f1d46d4 PTX: Add signed integer comparisons
llvm-svn: 133599
2011-06-22 02:09:50 +00:00
Justin Holewinski
0844ac41b6 PTX: Add .address_size directive if PTX version >= 2.3
Patch by Wei-Ren Chen

llvm-svn: 133589
2011-06-22 00:43:56 +00:00
Devang Patel
f610afdefb Test case for r133560.
llvm-svn: 133585
2011-06-22 00:03:42 +00:00
Bob Wilson
5b04895bb8 Revert r133452: "Emit movq for 64-bit register to XMM register moves..."
This is breaking compiler-rt and llvm-gcc builds on MacOSX when not using
the integrated assembler.

llvm-svn: 133524
2011-06-21 17:35:13 +00:00
Anna Zaks
488fc45c84 Add support for sadd.with.overflow and uadd.with.overflow intrinsics to the CBackend by emitting definitions for each intrinsic that occurs in the module.
llvm-svn: 133522
2011-06-21 17:18:15 +00:00
Evan Cheng
40adfc21f6 Teach dag combine to match halfword byteswap patterns.
1. (((x) & 0xFF00) >> 8) | (((x) & 0x00FF) << 8)
   => (bswap x) >> 16
2. ((x&0xff)<<8)|((x&0xff00)>>8)|((x&0xff000000)>>8)|((x&0x00ff0000)<<8))
   => (rotl (bswap x) 16)

This allows us to eliminate most of the def : Pat patterns for ARM rev16
revsh instructions. It catches many more cases for ARM and x86.

rdar://9609108

llvm-svn: 133503
2011-06-21 06:01:08 +00:00
Akira Hatanaka
1e08980a21 Re-apply 132758 and 132768 which were speculatively reverted in 132777.
llvm-svn: 133494
2011-06-21 00:40:49 +00:00
Justin Holewinski
e62da847fa PTX: Fix conversion between predicates and value types
llvm-svn: 133454
2011-06-20 18:42:48 +00:00
Nick Lewycky
831fb8200d Emit movq for 64-bit register to XMM register moves, but continue to accept
movd when assembling.

llvm-svn: 133452
2011-06-20 18:33:26 +00:00
Roman Divacky
79578394f5 Don't apply on PPC64 the 32bit ADDIC optimizations as there's no overflow
with 32bit values.

llvm-svn: 133439
2011-06-20 15:28:39 +00:00
Nadav Rotem
ea7e393b4e Fix PromoteIntRes_TRUNCATE: Add support for cases where the
source vector type is to be split while the target vector is to be promoted.
(eg: <4 x i64> -> <4 x i8> )

llvm-svn: 133424
2011-06-20 07:15:58 +00:00
Benjamin Kramer
c20d8728fc Update test.
llvm-svn: 133390
2011-06-19 12:14:34 +00:00
Nadav Rotem
07b7d6858d Reduce the runtime of the test. Keep only the interesting cases.
llvm-svn: 133381
2011-06-19 08:12:43 +00:00
Chris Lattner
6aa403748e Remove support for parsing the "type i32" syntax for defining a numbered
top level type without a specified number.  This syntax isn't documented
and blocks forward progress.

llvm-svn: 133371
2011-06-19 00:03:46 +00:00
Chris Lattner
ad5400fa72 rip out a ton of intrinsic modernization logic from AutoUpgrade.cpp, which is
for pre-2.9 bitcode files.  We keep x86 unaligned loads, movnt, crc32, and the
target indep prefetch change.

As usual, updating the testsuite is a PITA.

llvm-svn: 133337
2011-06-18 06:05:24 +00:00
Jakob Stoklund Olesen
6346426b8c Switch ARM to using AltOrders instead of MethodBodies.
This slightly changes the GPR allocation order on Darwin where R9 is not
a callee-saved register:

Before: %R0 %R1 %R2 %R3 %R12 %R9 %LR %R4 %R5 %R6 %R8 %R10 %R11
After:  %R0 %R1 %R2 %R3 %R9 %R12 %LR %R4 %R5 %R6 %R8 %R10 %R11
llvm-svn: 133326
2011-06-18 01:14:46 +00:00
Galina Kistanova
36039fd720 Moved to the right place.
llvm-svn: 133324
2011-06-18 00:59:37 +00:00
Eric Christopher
169d53e1e0 Fix UMULO support for 2x register width to allow the full
range without a libcall to a new mulo<mode> libcall
that we'd have to create.

Finishes the rest of rdar://9090077 and rdar://9210061

llvm-svn: 133318
2011-06-18 00:09:57 +00:00
Nadav Rotem
0cec5ab356 Fix a bug in the type-lowering of integer-promoted elements. Add a check that
the newly created simple type is valid before checking its legality.
Re-commit the test file.

llvm-svn: 133291
2011-06-17 20:54:12 +00:00
Evan Cheng
df9192b200 Add an alternative rev16 pattern. We should figure out a better way to handle these complex rev patterns. rdar://9609108
llvm-svn: 133289
2011-06-17 20:47:21 +00:00
Eric Christopher
25aa04466a Lower multiply with overflow checking to __mulo<mode>
calls if we haven't been able to lower them any
other way.

Fixes rdar://9090077 and rdar://9210061

llvm-svn: 133288
2011-06-17 20:41:29 +00:00
Galina Kistanova
b46dac6e3d est 2008-06-04-indirectmem.ll is X86-specific. Move to X86 folder.
llvm-svn: 133275
2011-06-17 18:26:23 +00:00
Chris Lattner
2e2fad280a Stop accepting and ignoring attributes in function types. Attributes are applied
to functions and call/invokes, not to types.

llvm-svn: 133266
2011-06-17 17:37:13 +00:00
Roman Divacky
6778c94b24 Fix a few places where 32bit instructions/registerset were used on PPC64.
llvm-svn: 133260
2011-06-17 15:21:10 +00:00
Justin Holewinski
c515f1b903 PTX: Adjust rounding modes
* rounding modes for fp add, mul, sub now use .rn
* float -> int rounding correctly uses .rzi not .rni
* 32bit fdiv for sm13 uses div.rn (instead of div.approx)
* 32bit fdiv for sm10 now uses div (instead of div.approx)

Approx is not IEEE 754 compatible (and should be optionally set by a flag to the backend instead). The .rn rounding modifier is the PTX default anyway, but it's better to be explicit.

All these modifiers should be available by using __fmul_rz functions for example, but support will need to be added for this in the backend.

Patch by Dan Bailey

llvm-svn: 133253
2011-06-17 12:12:42 +00:00
Chris Lattner
0899957b99 make the asmparser reject function and type redefinitions. 'Merging' hasn't been
needed since llvm-gcc 3.4 days.

llvm-svn: 133248
2011-06-17 07:06:44 +00:00
Chris Lattner
385977c252 remove asmparser support for the old getresult instruction, which has been subsumed by extractvalue.
llvm-svn: 133247
2011-06-17 06:57:15 +00:00
Chris Lattner
9e7c036d09 remove parser support for the obsolete "multiple return values" syntax, which
was replaced with return of a "first class aggregate".

llvm-svn: 133245
2011-06-17 06:49:41 +00:00
Chris Lattner
4eb6f76fa6 Remove support for using "foo" as symbols instead of %"foo". This is ancient
syntax and has been long obsolete.  As usual, updating the tests is the nasty
part of this.

llvm-svn: 133242
2011-06-17 06:36:20 +00:00
Chris Lattner
9ec82f54d4 manually upgrade a bunch of tests to modern syntax, and remove some that
are either unreduced or only test old syntax.

llvm-svn: 133228
2011-06-17 03:14:27 +00:00
Cameron Zwarich
681f02ec26 Update an insertion point iterator after replacing a return instruction with a
tail call pseudoinstruction. This fixes <rdar://problem/9624333>.

llvm-svn: 133227
2011-06-17 02:16:43 +00:00
Jakob Stoklund Olesen
91874697b3 Don't use register classes larger than TLI->getRegClassFor(VT).
In Thumb mode we cannot handle GPR virtual registers, even though some
instructions can. When isel is lowering a CopyFromReg, it should limit
itself to subclasses of getRegClassFor(VT).

<rdar://problem/9624323>

llvm-svn: 133210
2011-06-16 22:50:38 +00:00
Nick Lewycky
ba962a7115 There's no need to be so picky about the particular register.
llvm-svn: 133189
2011-06-16 21:00:00 +00:00
Justin Holewinski
32a7bad9db PTX: Finish new calling convention implementation
llvm-svn: 133172
2011-06-16 17:50:00 +00:00
Bruno Cardoso Lopes
f52f4dd0b8 Add AVX suport for fpextend.
Original patch by Syoyo Fujita with more comments by me.

llvm-svn: 133153
2011-06-16 07:03:21 +00:00
Eli Friedman
4594e0f01a FileCheck-ize test, and make it work on EABI hosts, like clang-native-arm-cortex-a9.
llvm-svn: 133139
2011-06-16 02:36:32 +00:00
Eli Friedman
014d4feac5 Force a triple here so this test doesn't fail on EABI hosts (like clang-native-arm-cortex-a9).
llvm-svn: 133134
2011-06-16 01:49:31 +00:00
Nick Lewycky
c62f935caf Commit the right set of tests for r133124. Sorry 'bout that!
llvm-svn: 133133
2011-06-16 01:35:45 +00:00
Andrew Trick
8c37d99180 Reenabling this test with REQUIRES: Asserts
llvm-svn: 133132
2011-06-16 01:34:41 +00:00
Chad Rosier
26513932a2 Typos.
llvm-svn: 133128
2011-06-16 01:24:24 +00:00
Chad Rosier
66fa658a4b Revision r128665 added an optimization to make use of NEON multiplier
accumulator forwarding.  Specifically (from SVN log entry):

Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.

llvm-svn: 133127
2011-06-16 01:21:54 +00:00
Nick Lewycky
f4886c7374 Add a DAGCombine for (ext (binop (load x), cst)).
llvm-svn: 133124
2011-06-16 01:15:49 +00:00
Anna Zaks
73f1ba0a88 Rename the test. Thanks Cameron! Use shorter/generic names.
llvm-svn: 133115
2011-06-16 00:34:10 +00:00
Anna Zaks
e2a947a4f7 Function::getNumBlockIDs() should be used instead of Function::size() to set the upper limit on the block IDs since basic blocks might get removed (simplified away) after being initially numbered. Plus the test case, in which SelectionDAGBuilder::visitBr() calls llvm::MachineFunction::removeFromMBBNumbering(), which introduces the hole in numbering leading to an assert in llc (prior to the fix).
llvm-svn: 133113
2011-06-16 00:03:21 +00:00
Rafael Espindola
8edd93b519 Testcase for previous commit.
llvm-svn: 133089
2011-06-15 21:18:51 +00:00
John McCall
e6835ee44e Add a new function attribute, nonlazybind, which inhibits lazy-loading
optimizations when emitting calls to the function;  instead those calls may
use faster relocations which require the function to be immediately resolved
upon loading the dynamic object featuring the call.  This is useful when it
is known that the function will be called frequently and pervasively and
therefore there is no merit in delaying binding of the function.

Currently only implemented for x86-64, where it turns into a call through
the global offset table.

Patch by Dan Gohman, who assures me that he's going to add LangRef documentation
for this once it's committed.

llvm-svn: 133080
2011-06-15 20:36:13 +00:00
Andrew Trick
6d96f3a7a2 Disabling this test until I can figure out the right lit flags.
llvm-svn: 133068
2011-06-15 18:25:38 +00:00
Jakob Stoklund Olesen
7b0de9a9e0 Remove custom allocation orders in SystemZ.
Note that this actually changes code generation, and someone who
understands this target better should check the changes.

- R12Q is now allocatable. I think it was omitted from the allocation
  order by mistake since it isn't reserved. It as apparently used as a
  GOT pointer sometimes, and it should probably be reserved if that is
  the case.

- The GR64 registers are allocated in a different order now. The
  register allocator will automatically put the CSRs last. There were
  other changes to the order that may have been significant.

The test fix is because r0 and r1 swapped places in the allocation order.

llvm-svn: 133067
2011-06-15 18:02:56 +00:00
Evan Cheng
30f84a59ae Another revsh pattern. rdar://9609059
llvm-svn: 133064
2011-06-15 17:17:48 +00:00
Andrew Trick
ce93f28a36 Added -stress-sched flag in the Asserts build.
Added a test case for handling physreg aliases during pre-RA-sched.

llvm-svn: 133063
2011-06-15 17:16:12 +00:00
Chad Rosier
f6c1b3b81f TargetLoweringOpt is a struct used by DAGCombine, not a pass.
llvm-svn: 133062
2011-06-15 16:48:02 +00:00
Nadav Rotem
72e51c94b1 This test was failing on X86 machines which do not have SSE4. Fixed the test by
specifying that the target CPU is corei7.

llvm-svn: 133053
2011-06-15 12:26:53 +00:00
Evan Cheng
7624839811 PerformBFICombine - (bfi A, (and B, Mask1), Mask2) -> (bfi A, B, Mask2) iff
the bits being cleared by the AND are not demanded by the BFI.

The previous BFI dag combine rule was actually incorrect (or used to be
correct until BFI representation changed).

rdar://9609030

llvm-svn: 133034
2011-06-15 01:12:31 +00:00
Tanya Lattner
5ee64fc868 Add an optimization that looks for a specific pair-wise add pattern and generates a vpaddl instruction instead of scalarizing the add.
Includes a test case.

llvm-svn: 133027
2011-06-14 23:48:48 +00:00
Rafael Espindola
0811c47a3a Add triple.
llvm-svn: 133026
2011-06-14 23:47:36 +00:00
Chad Rosier
30333c668f When pattern matching during instruction selection make sure shl x,1 is not
converted to add x,x if x is a undef.  add undef, undef does not guarantee
that the resulting low order bit is zero.
Fixes <rdar://problem/9453156> and <rdar://problem/9487392>.

llvm-svn: 133022
2011-06-14 22:29:10 +00:00
Rafael Espindola
d133b092e2 Check the llc output.
llvm-svn: 133021
2011-06-14 22:24:32 +00:00
Stuart Hastings
63da197d28 Test case for x86 MMX inline asm. rdar://problem/8886707
llvm-svn: 133014
2011-06-14 21:51:38 +00:00
Rafael Espindola
4e8b511063 Add a test for the recent regression.
llvm-svn: 133009
2011-06-14 20:38:50 +00:00
Dan Gohman
2071b00bdf This test is still failing. Delete the rest of it.
llvm-svn: 133001
2011-06-14 18:07:36 +00:00
Dan Gohman
d6dcf3e5e3 Revert r132991. This test is failing on the
llvm-gcc-x86_64-linux-selfhost buildbot and others.

llvm-svn: 133000
2011-06-14 18:03:11 +00:00
Rafael Espindola
1e809f99ad Add 132986 back, but avoid non-determinism if a bb address gets reused.
llvm-svn: 132995
2011-06-14 15:31:54 +00:00
Nadav Rotem
7b529545b7 Add a testcase for #9623
llvm-svn: 132991
2011-06-14 13:23:10 +00:00
Rafael Espindola
b90ea8a8c7 revert 132986 to see if the bots go green.
llvm-svn: 132988
2011-06-14 12:48:26 +00:00
Nadav Rotem
b638c3c037 This testcase cause a failure on some bots. Remove the failing test until
further investigation.

llvm-svn: 132986
2011-06-14 09:10:37 +00:00
Nadav Rotem
1b92c3d96c Add a testcase for checking the integer-promotion of many different vector
types (with power of two types such as 8,16,32 .. 512).

Fix a bug in the integer promotion of bitcast nodes. Enable integer expanding
only if the target of the conversion is an integer (when the type action is
scalarize).

Add handling to the legalization of vector load/store in cases where the saved
vector is integer-promoted.

llvm-svn: 132985
2011-06-14 08:11:52 +00:00
Rafael Espindola
434d19ff30 Implement Jakob's suggestion on how to detect fall thought without calling
AnalyzeBranch.

llvm-svn: 132981
2011-06-14 06:08:32 +00:00
Bruno Cardoso Lopes
15b9096112 Since ARM's prefetch implementation predicted the presence of a instruction
cache prefetch and now that the info from "prefetch" to "ARMPreload" is present,
only add a testcase for PLI.

llvm-svn: 132978
2011-06-14 05:11:46 +00:00
Bruno Cardoso Lopes
b6afc5168f Add one more argument to the prefetch intrinsic to indicate whether it's a data
or instruction cache access. Update the targets to match it and also teach
autoupgrade.

llvm-svn: 132976
2011-06-14 04:58:37 +00:00
Rafael Espindola
56a82c5ef8 Make the threshold used by branch folding softer. Before we would get a
sharp all or nothing transition when one extra predecessor was added. Now
we still test first ones for merging.

llvm-svn: 132974
2011-06-14 04:41:17 +00:00
Bill Wendling
77d4d62693 Heuristic: If the number of operands in the alias are more than the number of
operands in the aliasee, don't print the alias.

llvm-svn: 132963
2011-06-14 03:17:20 +00:00
Jakob Stoklund Olesen
2cac2ea7a1 Be less aggressive about hinting in RAFast.
In particular, don't spill dirty registers only to satisfy a hint. It is
not worth it.

The attached test case provides an example where the fast allocator
would spill a register when other registers are available.

llvm-svn: 132900
2011-06-13 03:26:46 +00:00
Rafael Espindola
8d0f7518b2 Really fix the fall-through logic.
Add a triple to the tests.

llvm-svn: 132885
2011-06-12 05:57:01 +00:00
Rafael Espindola
f73c2dc8f6 Test for the previous commit.
llvm-svn: 132884
2011-06-12 05:35:39 +00:00
Rafael Espindola
db58547906 AnalyzeBranch doesn't change which successors a bb has, just the order
we try to branch to them.

Before we were creating successor lists with duplicated entries. Fixing that
found a bug in isBlockOnlyReachableByFallthrough that would causes it to
return the wrong answer for

-----------
...
jne foo
jmp bar

foo:
----------

llvm-svn: 132882
2011-06-12 03:20:32 +00:00
Eli Friedman
0bb1c525fd Add full x86 fast-isel support for memcpy and memset.
rdar://9431466

llvm-svn: 132864
2011-06-10 23:39:36 +00:00
Eli Friedman
b3764b7c97 Add -mattr=+sse2 to make the buildbots happy.
llvm-svn: 132839
2011-06-10 08:26:26 +00:00
Chad Rosier
670af01484 Adding a test case for revision 132825.
llvm-svn: 132830
2011-06-10 02:44:19 +00:00
Eli Friedman
96581336e6 Add a simple test which makes sure folding immediate float zero to a memory operand works.
llvm-svn: 132824
2011-06-10 00:30:08 +00:00
Cameron Zwarich
af47f4a117 A CCState was being created without setting whether it is in the Call or Prologue state,
causing an assertion failure downstream. This fixes <rdar://problem/9562908>.

This really seems like it should always be set at CCState creation time, so mistakes like
this can never happen. I'll take a look at doing that.

llvm-svn: 132811
2011-06-09 22:30:07 +00:00
Eli Friedman
14c6ce9041 Change this DAGCombine to build AND of SHR instead of SHR of AND; this matches the ordering we prefer in instcombine. Part of rdar://9562809.
The potential DAGCombine which enforces this more generally messes up some other very fragile patterns, so I'm leaving that alone, at least for now.

llvm-svn: 132809
2011-06-09 22:14:44 +00:00