Bruno Cardoso Lopes
2d100ca13c
The VPERM2F128 is a AVX instruction which permutes between two 256-bit
...
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.
llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Akira Hatanaka
c9c0190cbe
Define unaligned load and store.
...
llvm-svn: 137515
2011-08-12 21:30:06 +00:00
Akira Hatanaka
6caf61a6ac
Test case for 137484
...
llvm-svn: 137486
2011-08-12 18:12:06 +00:00
Akira Hatanaka
b787f8a8a5
Enclose directive .cprestore with .set macro and nomacro to silence assembler
...
warning.
llvm-svn: 137378
2011-08-11 22:42:31 +00:00
Bruno Cardoso Lopes
328a6a980b
Add a dag combine to xform 256-bit shuffles into simple vector
...
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.
llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes
884d8b9cb5
Fix the test added by Nadav in r137308. Make it more strict:
...
1) check for the "v" version of movaps
2) add a couple of CHECK-NOT to guarantee the behavior
3) move to a more appropriate test file
llvm-svn: 137361
2011-08-11 21:50:35 +00:00
Bruno Cardoso Lopes
38d4afa02f
Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
...
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Jim Grosbach
9717a9c0d3
ARM push of a single register encodes as pre-indexed STR.
...
Per the ARM ARM, a 'push' of a single register encodes as an STR,
not an STM.
llvm-svn: 137318
2011-08-11 18:07:11 +00:00
Jim Grosbach
abaaf4513f
ARM pop of a single register encodes as post-indexed LDR.
...
Per the ARM ARM, a 'pop' of a single register encodes as an LDR,
not an LDM.
llvm-svn: 137316
2011-08-11 17:35:48 +00:00
Nadav Rotem
de1b485f3f
[AVX] If the data which is going to be saved is already in two XMM registers
...
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.
Before:
vinsertf128 $1, %xmm3, %ymm0, %ymm3
vinsertf128 $0, %xmm1, %ymm3, %ymm1
vmovaps %ymm1, 416(%rsp)
After:
vmovaps %xmm3, 416+16(%rsp)
vmovaps %xmm1, 416(%rsp)
llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Chris Lattner
3ae8704c4f
add missing colon, thanks peter.
...
llvm-svn: 137306
2011-08-11 16:15:10 +00:00
Chris Lattner
575057916a
fix PR10605 / rdar://9930964 by adding a pretty scary missed check.
...
It's somewhat surprising anything works without this. Before we would
compile the testcase into:
test: # @test
movl $4, 8(%rdi)
movl 8(%rdi), %eax
orl %esi, %eax
cmpl $32, %edx
movl %eax, -4(%rsp) # 4-byte Spill
je .LBB0_2
now we produce:
test: # @test
movl 8(%rdi), %eax
movl $4, 8(%rdi)
orl %esi, %eax
cmpl $32, %edx
movl %eax, -4(%rsp) # 4-byte Spill
je .LBB0_2
llvm-svn: 137303
2011-08-11 06:26:54 +00:00
Bruno Cardoso Lopes
8674ddf55a
Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
...
infinite recursive calls in legalize. Fix PR10562
llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes
954ac403c7
Use the splat index to generate the desired shuffle. Otherwise we
...
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.
llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman
17bd9e5d7c
Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
...
Fixes PR9693.
llvm-svn: 137292
2011-08-11 01:48:05 +00:00
NAKAMURA Takumi
5d316f7632
test/CodeGen/X86/opt-shuff-tstore.ll: Add explicit -mtriple=x86_64-linux.
...
llvm-svn: 137262
2011-08-10 22:52:48 +00:00
Devang Patel
393d6e1fd0
While extending definition range of a debug variable, consult lexical scopes also. There is no point extending debug variable out side its lexical block. This provides 6x compile time speedup in some cases.
...
llvm-svn: 137250
2011-08-10 21:25:34 +00:00
Nadav Rotem
1b3075c0ab
Fix the test. Add cpu target.
...
llvm-svn: 137241
2011-08-10 19:49:19 +00:00
Nadav Rotem
4a8d78d24a
When performing a truncating store, it is sometimes possible to rearrange the
...
data in-register prior to saving to memory. When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.
llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes
565ab1542a
The following X86 pattern is incorrect:
...
def : Pat<(X86Movss VR128:$src1,
(bc_v4i32 (v2i64 (load addr:$src2)))),
(MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.
llvm-svn: 137227
2011-08-10 17:45:17 +00:00
Rafael Espindola
45cd7316b5
Add support for the R and Q constraints.
...
llvm-svn: 137217
2011-08-10 16:26:42 +00:00
Bruno Cardoso Lopes
4a435a361d
Fix a bug in vpermilps mask checking. Fix PR10560
...
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes
9a695724bd
Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
...
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes
7461b930f3
Add v16i16 and v32i8 store patterns
...
llvm-svn: 137166
2011-08-09 22:39:53 +00:00
Bruno Cardoso Lopes
028c6aa951
Use fp unpack instructions to unpack int types. Until we have AVX2, this
...
is the best we can do for these patterns. This fix PR10554.
llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Eli Friedman
44fd5b2b59
Fix a couple ridiculous copy-paste errors. rdar://9914773 .
...
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Bill Wendling
250ea7930e
Revert r137134. It breaks some code as Eli pointed out.
...
llvm-svn: 137135
2011-08-09 18:56:35 +00:00
Bill Wendling
ca256c0d2d
Print out the variable declaration only if it is a declaration. Otherwise, a
...
'static' variable will be emitted twice.
PR10081
llvm-svn: 137134
2011-08-09 18:31:50 +00:00
Jakob Stoklund Olesen
e43aca1c39
Inflate register classes after coalescing.
...
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class. Examples are:
x86: GR32_ABCD:sub_8bit_hi -> GR32
arm: DPR_VFP2:ssub0 -> DPR
Recompute the register class of any virtual registers that are used by
less instructions after coalescing.
This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:
vadd.f32 d16, d1, d0
vcvt.s32.f32 d0, d16
The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.
llvm-svn: 137133
2011-08-09 18:19:41 +00:00
Bruno Cardoso Lopes
633400ee00
Reapply a more appropriate solution than in r137114. AVX supports
...
v4f64 = sitofp v4i32. This fix PR10559.
Also add support for v4i32 = fptosi v4f64.
llvm-svn: 137128
2011-08-09 17:39:13 +00:00
Bruno Cardoso Lopes
1962a341d8
Revert r137114
...
llvm-svn: 137127
2011-08-09 17:39:01 +00:00
Justin Holewinski
021ab783b7
PTX: Add initial support for device function calls
...
- Calls are supported on SM 2.0+ for function with no return values
llvm-svn: 137125
2011-08-09 17:36:31 +00:00
Bruno Cardoso Lopes
5dac86dac6
Handle sitofp between v4f64 <- v4i32. Fix PR10559
...
llvm-svn: 137114
2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes
d521431558
Add support for avx vector fextend
...
llvm-svn: 137105
2011-08-09 03:04:29 +00:00
Bruno Cardoso Lopes
81534df169
Rename and tidy up tests
...
llvm-svn: 137103
2011-08-09 03:04:23 +00:00
Bruno Cardoso Lopes
1025d1eb3b
Add two patterns to match special vmovss and vmovsd cases. Also fix
...
the patterns already there to be more strict regarding the predicate.
This fixes PR10558
llvm-svn: 137100
2011-08-09 01:43:09 +00:00
Bruno Cardoso Lopes
d7eac41193
Make LowerVSETCC aware of AVX types and add patterns to match them.
...
llvm-svn: 137090
2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes
d8534855ff
Add support for several vector shifts operations while in AVX mode. Fix PR10581
...
llvm-svn: 137067
2011-08-08 21:31:08 +00:00
Eli Friedman
7a34419c6f
Fix up the patterns for SXTB, SXTH, UXTB, and UXTH so that they are correctly active without HasT2ExtractPack. PR10611.
...
llvm-svn: 137061
2011-08-08 19:49:37 +00:00
Jakob Stoklund Olesen
85931574b0
Don't clobber pending ST regs when FP regs are killed.
...
X86FloatingPoint keeps track of pending ST registers for an upcoming
inline asm instruction with fixed stack register constraints. It does
this by remembering which FP register holds the value that should appear
at a fixed stack position for the inline asm.
When that FP register is killed before the inline asm, make sure to
duplicate it to a scratch register, so the ST register still has a live
FP reference.
This could happen when the same FP register was copied to two ST
registers, or when a spill instruction is inserted between the ST copy
and the inline asm.
This fixes PR10602.
llvm-svn: 137050
2011-08-08 17:15:43 +00:00
Rafael Espindola
2da6e6a1d8
print st_shndx with the correct number of bits.
...
llvm-svn: 136880
2011-08-04 15:50:13 +00:00
Rafael Espindola
c1a076eeb1
print st_other with the correct number of bits.
...
llvm-svn: 136877
2011-08-04 15:38:19 +00:00
Rafael Espindola
368850841d
print st_type with the correct number of bits.
...
llvm-svn: 136875
2011-08-04 15:24:00 +00:00
Rafael Espindola
e08bb3d50f
Print st_bind with the correct number of bits.
...
llvm-svn: 136874
2011-08-04 15:10:35 +00:00
Rafael Espindola
865ab6cb05
Print r_sym with the correct number of bits.
...
llvm-svn: 136873
2011-08-04 14:48:27 +00:00
Rafael Espindola
f65dd30907
Print r_type with the correct number of bits.
...
llvm-svn: 136872
2011-08-04 14:39:30 +00:00
Rafael Espindola
edfafcbfb0
Change anther counter to decimal.
...
llvm-svn: 136870
2011-08-04 14:01:03 +00:00
Rafael Espindola
3e8393e6f7
Don't print a counter in hex.
...
llvm-svn: 136869
2011-08-04 13:39:15 +00:00
Bill Wendling
60e17f8212
Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.
...
Fixes PR10527.
llvm-svn: 136853
2011-08-04 00:32:58 +00:00
Benjamin Kramer
d93ac7d0b6
Remove underscore that's breaking linux buildbots.
...
llvm-svn: 136833
2011-08-03 23:13:01 +00:00