1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

5243 Commits

Author SHA1 Message Date
Eli Friedman
3db429c878 Convert a bunch more tests over to the new atomic instructions.
llvm-svn: 140582
2011-09-26 23:15:09 +00:00
Eli Friedman
d01fc33809 Convert more tests to new atomic instructions.
llvm-svn: 140567
2011-09-26 21:36:10 +00:00
Eli Friedman
6aaaadc188 Convert more tests over to the new atomic instructions.
I did not convert Atomics-32.ll and Atomics-64.ll by hand; the diff is autoupgrade output.

The wmb test is gone because there isn't any way to express wmb with the new atomic instructions; if someone really needs a non-asm way to write a wmb on Alpha, a platform-specific intrisic could be added.

llvm-svn: 140566
2011-09-26 21:30:17 +00:00
Eli Friedman
56e68f7271 Convert more tests over to the new atomic instructions.
llvm-svn: 140559
2011-09-26 20:27:49 +00:00
Justin Holewinski
52c50104d7 PTX: Fix detection of stack load/store vs. global load/store, as well as fix the
printing of local offsets

llvm-svn: 140547
2011-09-26 18:57:22 +00:00
Justin Holewinski
443a122ac3 PTX: Add .align tests to stack object test file
llvm-svn: 140537
2011-09-26 16:20:38 +00:00
Justin Holewinski
859dd9fa59 PTX: Fix some lingering issues with stack allocation
llvm-svn: 140535
2011-09-26 16:20:34 +00:00
Justin Holewinski
83ae9143fd PTX: Unify handling of loads/stores
llvm-svn: 140533
2011-09-26 16:20:28 +00:00
David Meyer
90ed5fdd4f Only run tests in test/CodeGen/CBackend/X86 when both X86 and CBackend are supported
llvm-svn: 140517
2011-09-26 06:44:27 +00:00
David Meyer
a6e588d80c PR11004: Inline memcpy to avoid generating nested call sequence. Un-XFAIL 2011-06-09-TailCallByVal and 2010-11-04-BigByval
llvm-svn: 140516
2011-09-26 06:13:20 +00:00
Jakob Stoklund Olesen
59b2982dcf Only run MF.verify() with EXPENSIVE_CHECKS=1.
llvm-svn: 140441
2011-09-24 01:11:19 +00:00
Jakob Stoklund Olesen
bc6ae70907 Verify that terminators follow non-terminators.
This exposes a -segmented-stacks bug.

llvm-svn: 140429
2011-09-23 22:45:39 +00:00
Eli Friedman
a66a438876 PR10998: It is not legal to sink an instruction past the terminator of a block; make sure we don't do that.
llvm-svn: 140428
2011-09-23 22:41:57 +00:00
Jakob Stoklund Olesen
ca6877343b Also match negative offsets for addrmode3 and addrmode5.
Math is hard, and isScaledConstantInRange() always returned false for
negative constants.  It was doing unsigned division of negative numbers
before casting back to signed.

llvm-svn: 140425
2011-09-23 22:10:33 +00:00
Justin Holewinski
1c0e0dcfbe PTX: Handle function call return values
llvm-svn: 140386
2011-09-23 16:48:41 +00:00
Justin Holewinski
0231798704 PTX: Start fixing function calls
llvm-svn: 140378
2011-09-23 14:31:12 +00:00
Eli Friedman
6f0131b3a7 PR10989: Don't print .hidden on Windows.
llvm-svn: 140356
2011-09-23 00:13:02 +00:00
Eli Friedman
31c7bde95a PR10991: make fast-isel correctly check whether accessing a global through an alias involves thread-local storage. (I'm not entirely sure how this is supposed to work, but this patch makes fast-isel consistent with the normal isel path.)
llvm-svn: 140355
2011-09-22 23:41:28 +00:00
Dan Gohman
d63418e497 Fix SimplifySelectCC to add newly created nodes to the DAGCombiner
worklist, as it may be possible to perform further optimization on them.

llvm-svn: 140349
2011-09-22 23:01:29 +00:00
Duncan Sands
1da590b589 Synthesize SSE3/AVX 128 bit horizontal add/sub instructions from
floating point add/sub of appropriate shuffle vectors.  Does not
synthesize the 256 bit AVX versions because they work differently.

llvm-svn: 140332
2011-09-22 20:15:48 +00:00
Justin Holewinski
9acce6aa64 PTX: fixup test cases for register changes
llvm-svn: 140311
2011-09-22 16:45:51 +00:00
Devang Patel
5d43ab8434 Do not unnecessarily use AT_specification DIE because it does not add any value.
Few weeks ago, llvm completely inverted the debug info graph. Earlier each debug info node used to keep track of its compile unit, now compile unit keeps track of important nodes. One impact of this change is that the global variable's do not have any context, which should be checked before deciding to use AT_specification DIE.

llvm-svn: 140282
2011-09-21 23:41:11 +00:00
Akira Hatanaka
0c87291a10 Remove +.
llvm-svn: 140266
2011-09-21 17:43:48 +00:00
Akira Hatanaka
d987b12b57 Re-enable some of the disabled tests. Use FileCheck instead of grep to check
output.

llvm-svn: 140263
2011-09-21 17:36:30 +00:00
Nadav Rotem
50430e8160 add another testcase for pr10902
llvm-svn: 140257
2011-09-21 17:13:40 +00:00
Nadav Rotem
af5643de3c [VECTOR-SELECT] Address one of the bugs in pr10902.
Vector SetCC result types need to be type-legalized.
This code worked before because scalar result types are known to be legal.

llvm-svn: 140249
2011-09-21 14:34:38 +00:00
Eric Christopher
9b721ff19e Remove llvm-gcc and various compiler handling from llvm. It's not needed
here anymore and has been migrated to the test-suite project.

llvm-svn: 140216
2011-09-20 23:58:15 +00:00
Bill Wendling
67cf034fe3 This test is completely invalid with the modern EH model. Delete.
llvm-svn: 140213
2011-09-20 23:52:09 +00:00
Bruno Cardoso Lopes
1ffbef8ad1 Add a DAGCombine for subvector extracts to remove useless chains of
subvector inserts and extracts. Initial patch by Rackover, Zvi with
some tweak done by me.

llvm-svn: 140204
2011-09-20 23:19:33 +00:00
Bruno Cardoso Lopes
629e7c2410 Revert r140097, working on a better approach
llvm-svn: 140203
2011-09-20 23:19:29 +00:00
Evan Cheng
ead45e2ba6 Fix a bug introduced during refactoring a couple of months ago. Cortex-M3 does not support Thumb2 dsp instructions. rdar://10152911.
llvm-svn: 140181
2011-09-20 21:38:18 +00:00
NAKAMURA Takumi
595c0c8e15 test/CodeGen/X86/avx-minmax.ll: Unbreak Win32.
On Windows x64, 128-bit arguments are not passed by reg but by indirect. eg.

maxpd:
        vmovapd (%rcx), %xmm0
        vmaxpd  (%rdx), %xmm0, %xmm0

FIXME: I don't care YMM on x64 for now.
llvm-svn: 140143
2011-09-20 14:11:35 +00:00
Craig Topper
df17f1cc99 Extend changes from r139986 to produce 256-bit AVX minps/minpd/maxps/maxpd.
llvm-svn: 140140
2011-09-20 07:38:59 +00:00
Andrew Trick
53aeb9f663 ARM isel bug fix for adds/subs operands.
Modified ARMISelLowering::AdjustInstrPostInstrSelection to handle the
full gamut of CPSR defs/uses including instructins whose "optional"
cc_out operand is not really optional. This allowed removal of the
hasPostISelHook to simplify the .td files and make the implementation
more robust.
Fixes rdar://10137436: sqlite3 miscompile

llvm-svn: 140134
2011-09-20 03:17:40 +00:00
Bruno Cardoso Lopes
bed7ef51b6 Attempt to fix -mtriple=i686-{cygwin|mingw|win32} regressions. Nakamura,
if this doesn't work, please provide more details.

llvm-svn: 140107
2011-09-20 00:08:12 +00:00
Bruno Cardoso Lopes
7cf7f02c3d Based on the small opt Zvi's patch was trying to achieve, eliminate
128-bit undef subvector insertion into a 256-bit vector

llvm-svn: 140097
2011-09-19 23:36:50 +00:00
Eli Friedman
b11676fb4b Some additional tests for Thumb atomic load and store (which I somehow forgot to commit earlier).
llvm-svn: 140074
2011-09-19 22:02:33 +00:00
Bruno Cardoso Lopes
9e5ef44daf Match X86ISD::FSETCCsd and X86ISD::FSETCCss while in AVX mode. This fix
PR10955 and PR10948.

llvm-svn: 140069
2011-09-19 21:29:24 +00:00
Nadav Rotem
1cfdc59e94 setOperationAction should be done on the return value of the type, not the operands.
llvm-svn: 140001
2011-09-18 14:57:03 +00:00
Nadav Rotem
cfc77bc719 When promoting integer vectors we often create ext-loads. This patch adds a
dag-combine optimization to implement the ext-load efficiently (using shuffles).

For example the type <4 x i8> is stored in memory as i32, but it needs to
find its way into a <4 x i32> register. Previously we scalarized the memory
access, now we use shuffles.

llvm-svn: 139995
2011-09-18 10:39:32 +00:00
Benjamin Kramer
547157073b Apply Duncan's test fix from r139986 to the avx version of that test too.
llvm-svn: 139992
2011-09-18 00:41:38 +00:00
Duncan Sands
4149334f09 Synthesize x86 max/min instructions also for vectors (i.e. produce
maxps and maxpd).  This broke the sse41-blend.ll testcase by causing
maxpd to be produced rather than a cmp+blend pair, which is the reason
I tweaked it.  Gives a small speedup on doduc with dragonegg when the
GCC vectorizer is used.

llvm-svn: 139986
2011-09-17 16:49:39 +00:00
Andrew Trick
10ea51b841 Test case trial and error. Not sure the proper way to check MBB names.
llvm-svn: 139900
2011-09-16 03:57:19 +00:00
Andrew Trick
5be06c8057 Reduced a stronger test case for coalescer bug PR10920.
llvm-svn: 139898
2011-09-16 03:46:49 +00:00
Eli Friedman
f7bb39b592 Some legalization fixes for atomic load and store.
llvm-svn: 139851
2011-09-15 21:20:49 +00:00
Jakob Stoklund Olesen
b36a98d18f VirtRegMap is counting spill slots, not register spills.
Fix the stats counters to reflect that.

llvm-svn: 139819
2011-09-15 18:31:13 +00:00
Bruno Cardoso Lopes
8e702bba63 Change all checks regarding the presence of any SSE level to always
take into consideration the presence of AVX. This change, together with
the SSEDomainFix enabled for AVX, makes AVX codegen to always (hopefully)
emit the same code as SSE for 128-bit vector ops. I don't
have a testcase for this, but AVX now beats SSE in performance for
128-bit ops in the majority of programas in the llvm testsuite

llvm-svn: 139817
2011-09-15 18:27:36 +00:00
Andrew Trick
e5bb7267ff [regcoalescing] bug fix for RegistersDefinedFromSameValue.
An improper SlotIndex->VNInfo lookup was leading to unsafe copy removal.
Fixes PR10920 401.bzip2 miscompile with no IV rewrite.

llvm-svn: 139765
2011-09-15 01:09:33 +00:00
Nadav Rotem
8e3edccebe Add integer promotion support for vselect
llvm-svn: 139692
2011-09-14 14:42:15 +00:00
Bruno Cardoso Lopes
3e6b9661d1 Vector shuffle mask <i32 4, i32 5, i32 2, i32 3> should yield "movsd", not "movss".
llvm-svn: 139686
2011-09-14 02:36:14 +00:00
Devang Patel
75c70b2315 Remove ancient debug info constructs from test cases, they are not relevant to test case's main objective.
llvm-svn: 139675
2011-09-14 00:29:50 +00:00
Devang Patel
f9dcd6261d Remove unnecessary old test.
llvm-svn: 139674
2011-09-14 00:28:54 +00:00
Akira Hatanaka
3d26b79a9a Delete test cases that generate code for allegrex/psp and cannot be repurposed.
llvm-svn: 139652
2011-09-13 22:29:13 +00:00
Eli Friedman
f13b5ef0e1 Error out on CodeGen of unaligned load/store. Fix test so it isn't accidentally testing that case.
llvm-svn: 139641
2011-09-13 20:50:54 +00:00
Akira Hatanaka
44c745931f Add pattern used to match MipsLo, which is needed when the instruction selector
tries to match a dead MipsLo node (explanation in the link below).

http://article.gmane.org/gmane.comp.compilers.llvm.devel/42757/match=dagcombiner+dead

llvm-svn: 139634
2011-09-13 20:13:58 +00:00
Akira Hatanaka
1376e1eabf Disable tests which generate code for allegrex or psp.
llvm-svn: 139632
2011-09-13 20:00:35 +00:00
Nadav Rotem
5ea703debf update checked pattern
llvm-svn: 139631
2011-09-13 19:59:18 +00:00
Nadav Rotem
60df99b809 Add vselect target support for targets that do not support blend but do support
xor/and/or (For example SSE2).

llvm-svn: 139623
2011-09-13 19:17:42 +00:00
Andrew Trick
a534a558a2 Generalize this test's CHECK statements to handle different indvars modes.
llvm-svn: 139577
2011-09-13 02:46:27 +00:00
Bruno Cardoso Lopes
eb09ab7c3f Change testcase commandline to be more strict and silence buildbots
llvm-svn: 139554
2011-09-12 22:59:26 +00:00
Bruno Cardoso Lopes
a4d2bdfa40 Fix PR10845. SUBREG_TO_REG shouldn't be used when the input and
destination types are equal!

llvm-svn: 139553
2011-09-12 22:59:23 +00:00
Bruno Cardoso Lopes
64e2e852f9 Revert the wrong part of r139528, and fix testcases.
llvm-svn: 139541
2011-09-12 21:24:07 +00:00
Bruno Cardoso Lopes
c67e996fc3 Not sure how CMPPS and CMPPD had already ever worked, I guess it didn't.
However with this fix it does now.

Basically the operand order for the x86 target specific node
is not the same as the instruction, but since the intrinsic need that
specific order at the instruction definition, just change the order
during legalization. Also, there were some wrong invertions of condition
codes, such as GE => LE, GT => LT, fix that too. Fix PR10907.

llvm-svn: 139528
2011-09-12 19:30:40 +00:00
Eli Friedman
08926ecbfb Fix mistake in test runline.
llvm-svn: 139505
2011-09-12 17:32:58 +00:00
Richard Osborne
962b1ca071 Associate a MemOperand with LDWCP nodes introduced during ISel.
This information is required if we want LDWCP to be hoisted out of loops.

llvm-svn: 139495
2011-09-12 14:43:23 +00:00
Eli Friedman
2275f7612e Really un-XFAIL the testcase, like I said I would in r139458.
llvm-svn: 139459
2011-09-10 02:02:27 +00:00
Richard Trieu
0485e133f2 Fixed an assert from:
assert("not implemented for target shuffle node");

to:

  assert(0 && "not implemented for target shuffle node");

This causes a test failure in CodeGen/X86/palignr.ll which has
been marked as XFAIL for the time being.
Test failure filed at PR10901.

llvm-svn: 139454
2011-09-10 01:26:21 +00:00
Akira Hatanaka
a8f0f7babb Fix test cases.
Generate code for Mips32r1 unless a Mips32r2 feature is tested.

llvm-svn: 139433
2011-09-09 23:14:58 +00:00
Eli Friedman
4bae1c4f70 Make the SelectionDAG verify that all the operands of BUILD_VECTOR have the same type. Teach DAGCombiner::visitINSERT_VECTOR_ELT not to make invalid BUILD_VECTORs. Fixes PR10897.
llvm-svn: 139407
2011-09-09 21:04:06 +00:00
Akira Hatanaka
f65d050693 Drop support for Mips1 and Mips2.
llvm-svn: 139405
2011-09-09 20:45:50 +00:00
Nadav Rotem
ccb46031e6 Implement vector-select support for avx256. Refactor the vblend implementation to have tablegen match the instruction by the node type
llvm-svn: 139400
2011-09-09 20:29:17 +00:00
Akira Hatanaka
17df2dfe8c Drop support for Allegrex. Allegrex implements a variant of Mips2.
llvm-svn: 139383
2011-09-09 19:00:51 +00:00
Akira Hatanaka
e1eb015eb9 Change default target architecture from Mips1 to Mips32r1 in preparation for
removing support for Mips1 and Mips2. 

This change and the ones that follow have been discussed with and approved by
Bruno.

llvm-svn: 139344
2011-09-09 01:13:27 +00:00
Devang Patel
ba2d56b1ef Directly point debug info to the stack slot of the arugment, instead of trying to keep track of vreg in which it the arugment is copied. The LiveDebugVariable can keep track of variable's ranges.
llvm-svn: 139330
2011-09-08 22:59:09 +00:00
Bruno Cardoso Lopes
54962ac233 Add a AVX version of a simple i64 -> f64 bitcast. This could be
triggered using llc with -O0, which wouldn't let it be folded and
expose the lack of this pattern.

llvm-svn: 139320
2011-09-08 21:52:33 +00:00
Bruno Cardoso Lopes
50596b096c Reapply testcase from r139309!
llvm-svn: 139318
2011-09-08 21:05:43 +00:00
Bruno Cardoso Lopes
3ecc7a69fd Remove this crashing test, until I figure out what's going wrong here
llvm-svn: 139309
2011-09-08 18:32:36 +00:00
Bruno Cardoso Lopes
74a67e22b0 Add AVX versions of blend vector operations and fix some issues noticed
in Nadav's r139285 and r139287 commits.

1) Rename vsel.ll to a more descriptive name
2) Change the order of BLEND operands to "Op1, Op2, Cond", this is
necessary because PBLENDVB is already used in different places with
this order, and it was being emitted in the wrong way for vselect
3) Add AVX patterns and tests for the same SSE41 instructions

llvm-svn: 139305
2011-09-08 18:05:08 +00:00
Bruno Cardoso Lopes
84c53e3965 Fix PR10844: Add patterns to cover non foldable versions of X86vzmovl.
Triggered using llc -O0. Also fix some SET0PS patterns to their AVX
forms and test it on the testcase.

llvm-svn: 139304
2011-09-08 18:05:02 +00:00
Nadav Rotem
fd68584146 This test is already covered by llvm/trunk/test/CodeGen/X86/vsel.ll
llvm-svn: 139288
2011-09-08 08:43:23 +00:00
Nadav Rotem
dbfa2c8810 add a testcase for the previous patch
llvm-svn: 139287
2011-09-08 08:31:31 +00:00
Nadav Rotem
b461f2190e Add X86-SSE4 codegen support for vector-select.
llvm-svn: 139285
2011-09-08 08:11:19 +00:00
Eli Friedman
9ea5599729 Fix atomic load and store on x86 to pass -verify-machineinstrs (and possibly fix some subtle bugs involving passes which check mayStore()).
This isn't exactly ideal, but it is good enough for the moment.

llvm-svn: 139245
2011-09-07 18:48:32 +00:00
Duncan Sands
8df5170d0d Another forgotten trampoline testcase.
llvm-svn: 139230
2011-09-07 10:05:14 +00:00
Eli Friedman
6a45370c0f Relax the MemOperands on atomics a bit. Fixes -verify-machineinstrs failures for atomic laod/store on ARM.
(The fix for the related failures on x86 is going to be nastier because we actually need Acquire memoperands attached to the atomic load instrs, etc.)

llvm-svn: 139221
2011-09-07 02:23:42 +00:00
Devang Patel
f4483238b6 While sinking machine instructions, sink matching DBG_VALUEs also otherwise live debug variable pass will drop DBG_VALUEs on the floor.
llvm-svn: 139208
2011-09-07 00:07:58 +00:00
Nick Lewycky
4e3daabb26 Disable these tests harder. They're XFAIL'd, but that means they still run, and
these tests all infinitely recurse, bringing my system down into swapping hell.

llvm-svn: 139192
2011-09-06 22:08:18 +00:00
Evan Cheng
891e9696ea Fix fall outs from my recent change on how carry bit is modeled during isel.
Now the 'S' instructions, e.g. ADDS, treat S bit as optional operand as well.
Also fix isel hook to correctly set the optional operand.
rdar://10073745

llvm-svn: 139157
2011-09-06 18:52:20 +00:00
Jakob Stoklund Olesen
7994269719 Atomic pseudos don't use (as in read) CPSR. They clobber it.
llvm-svn: 139148
2011-09-06 17:40:35 +00:00
Duncan Sands
6939ae53ac Split the init.trampoline intrinsic, which currently combines GCC's
init.trampoline and adjust.trampoline intrinsics, into two intrinsics
like in GCC.  While having one combined intrinsic is tempting, it is
not natural because typically the trampoline initialization needs to
be done in one function, and the result of adjust trampoline is needed
in a different (nested) function.  To get around this llvm-gcc hacks the
nested function lowering code to insert an additional parent variable
holding the adjust.trampoline result that can be accessed from the child
function.  Dragonegg doesn't have the luxury of tweaking GCC code, so it
stored the result of adjust.trampoline in the memory GCC set aside for
the trampoline itself (this is always available in the child function),
and set up some new memory (using an alloca) to hold the trampoline.
Unfortunately this breaks Go which allocates trampoline memory on the
heap and wants to use it even after the parent has exited (!).  Rather
than doing even more hacks to get Go working, it seemed best to just use
two intrinsics like in GCC.  Patch mostly by Sanjoy Das.

llvm-svn: 139140
2011-09-06 13:37:06 +00:00
Dan Gohman
cbadb0f92c Revert r129875, XFAILing this test for arm, since the fix was reverted.
llvm-svn: 139058
2011-09-03 00:14:24 +00:00
Jakob Stoklund Olesen
ef8527b836 Pseudo CMOV instructions don't clobber EFLAGS.
The explanation about a 0 argument being materialized as xor is no
longer valid.  Rematerialization will check if EFLAGS is live before
clobbering it.

The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.

This causes one less testb instruction to be generated in the cmov.ll
test case.

llvm-svn: 139057
2011-09-02 23:52:55 +00:00
Bill Wendling
145872a92c Try to eliminate the use of the 'unwind' instruction.
llvm-svn: 139046
2011-09-02 22:41:11 +00:00
Eli Friedman
383a3c76b2 Don't fast-isel for atomic load/store; some cases require extra handling missing from fast-isel.
llvm-svn: 139044
2011-09-02 22:33:24 +00:00
Bill Wendling
267bc5089a Better fix for this testcase. Update it to the new EH scheme entirely.
llvm-svn: 139039
2011-09-02 21:27:08 +00:00
Bill Wendling
a4f9142ad4 Update for new EH stuff. (I'm not sure if this is 100% correct.)
llvm-svn: 139038
2011-09-02 21:24:17 +00:00
Duncan Sands
33f33411e8 Darwin wants ctors/dtors to be ordered the other way round to linux.
llvm-svn: 139015
2011-09-02 18:07:19 +00:00
Kalle Raiskila
7c154fe467 Pass signed (not unsigned) 10 bit field to SPU 'ori' instruction.
llvm-svn: 139004
2011-09-02 10:05:01 +00:00
Dan Gohman
6d0230847c Revert r131152, r129796, r129761. This code is currently considered
to be unreliable on platforms which require memcpy calls, and it is
complicating broader legalize cleanups. It is hoped that these cleanups
will make memcpy byval easier to implement in the future.

llvm-svn: 138977
2011-09-01 23:07:08 +00:00
Benjamin Kramer
bd939ad83e Don't drop alignment info on local common symbols.
- On COFF the .lcomm directive has an alignment argument.
- On ELF we fall back to .local + .comm

Based on a patch by NAKAMURA Takumi.

Fixes PR9337, PR9483 and PR10128.

llvm-svn: 138976
2011-09-01 23:04:27 +00:00
Benjamin Kramer
4ed9810e77 XFAIL this test on arm until the backend is fixed.
llvm-svn: 138955
2011-09-01 18:40:03 +00:00
Benjamin Kramer
ca7001eedb This test depends on cmov being available.
llvm-svn: 138954
2011-09-01 18:40:01 +00:00
Jakob Stoklund Olesen
c26e2e6221 Permit remat of partial register defs when it is safe.
An instruction may define part of a register where the other bits are
undefined. In that case, it is safe to rematerialize the instruction.
For example:

  %vreg2:ssub_0<def> = VLDRS <cp#0>, 0, pred:14, pred:%noreg, %vreg2<imp-def>

The extra <imp-def> operand indicates that the instruction does not read
the other parts of the virtual register, so a remat is safe.

This patch simply allows multiple def operands for the virtual register.
It is MI->readsVirtualRegister() that determines if we depend on a
previous value so remat is impossible.

llvm-svn: 138953
2011-09-01 18:27:51 +00:00
Bruno Cardoso Lopes
10f234f1a7 Fix vbroadcast matching logic to early unmatch if the node doesn't have
only one use. Fix PR10825.

llvm-svn: 138951
2011-09-01 18:15:06 +00:00
Jakob Stoklund Olesen
bc000bf219 Prevent remat of partial register redefinitions.
An instruction that redefines only part of a larger register can never
be rematerialized since the virtual register value depends on the old
value in other parts of the register.

This was fixed for the inline spiller in r138794.  This patch fixes the
problem for all register allocators, and includes a small test case.

<rdar://problem/10032939>

llvm-svn: 138944
2011-09-01 17:18:50 +00:00
Andrew Trick
e5d7c0d111 PreRA scheduler should avoid cloning compares.
Added canClobberReachingPhysRegUse() to handle a particular pattern in
which a two-address instruction could be forced to interfere with
EFLAGS, causing a compare to be unnecessarilly cloned.
Fixes rdar://problem/5875261

llvm-svn: 138924
2011-09-01 00:54:31 +00:00
Bill Wendling
66b613c8b9 Remove old declare statements.
llvm-svn: 138905
2011-08-31 21:41:20 +00:00
Bill Wendling
78fdf13f12 Update more tests to the new EH scheme.
llvm-svn: 138904
2011-08-31 21:40:15 +00:00
Bill Wendling
aca68bbe94 Update more tests to the new EH scheme.
llvm-svn: 138903
2011-08-31 21:39:05 +00:00
Bill Wendling
da12f7322a Revert r138894. This was failing on cmake-clang-i686-msvc10.
llvm-svn: 138900
2011-08-31 21:20:25 +00:00
Bill Wendling
722de8a9aa Update more tests to the new EH scheme.
llvm-svn: 138894
2011-08-31 21:04:11 +00:00
Eli Friedman
8ae6a88723 Generic expansion for atomic load/store into cmpxchg/atomicrmw xchg; implements 64-bit atomic load/store for ARM.
llvm-svn: 138872
2011-08-31 18:26:09 +00:00
Eli Friedman
5d3814e0c4 64-bit atomic cmpxchg for ARM.
llvm-svn: 138868
2011-08-31 17:52:22 +00:00
David Greene
a975ef3213 Compress Repeated Byte Output
Emit a repeated sequence of bytes using .zero.  This saves an enormous
amount of asm file space for certain programs.

llvm-svn: 138864
2011-08-31 17:30:56 +00:00
Benjamin Kramer
ce13423304 This test requires sse, otherwise x87 ops will block tailcall optimization
llvm-svn: 138859
2011-08-31 16:49:05 +00:00
Bruno Cardoso Lopes
5bd6e92f99 - Move all MOVSS and MOVSD patterns close to their definitions
- Duplicate some store patterns to their AVX forms!
- Catched a bug while restricting the patterns subtarget, fix it
  and update a testcase to check it properly

llvm-svn: 138851
2011-08-31 03:04:20 +00:00
Evan Cheng
bbabe9ff60 Fix (movhps load) lowering / pattern to match more cases. rdar://10050549
llvm-svn: 138848
2011-08-31 02:05:24 +00:00
Eli Friedman
928959bc52 Some minor cleanups for r138845.
llvm-svn: 138846
2011-08-31 00:41:05 +00:00
Eli Friedman
d71c865ae0 Some 64-bit atomic operations on ARM. 64-bit cmpxchg coming next.
llvm-svn: 138845
2011-08-31 00:31:29 +00:00
Benjamin Kramer
2da6e863e7 Fix test typo.
llvm-svn: 138843
2011-08-31 00:02:59 +00:00
Rafael Espindola
17f15bc464 Add a triple.
llvm-svn: 138831
2011-08-30 21:19:37 +00:00
Rafael Espindola
83257fe618 Some test code to check if correct code is being generated.
Patch by Sanjoy Das.

llvm-svn: 138820
2011-08-30 19:51:29 +00:00
Roman Divacky
7ac1bc57f7 Set CR1EQ only when lowering vararg floating arguments (not any vararg
arguments as before), unset CR1EQ otherwise.

llvm-svn: 138802
2011-08-30 17:04:16 +00:00
Evan Cheng
1eacb83316 Change ARM / Thumb2 addc / adde and subc / sube modeling to use physical
register dependency (rather than glue them together). This is general
goodness as it gives scheduler more freedom. However it is motivated by
a nasty bug in isel.

When a i64 sub is expanded to subc + sube.
  libcall #1
     \
      \        subc 
       \       /  \
        \     /    \
         \   /    libcall #2
          sube

If the libcalls are not serialized (i.e. both have chains which are dag
entry), legalizer can serialize them in arbitrary orders. If it's
unlucky, it can force libcall #2 before libcall #1 in the above case.

  subc
   |
  libcall #2
   |
  libcall #1
   |
  sube

However since subc and sube are "glued" together, this ends up being a
cycle when the scheduler combine subc and sube as a single scheduling
unit.

The right solution is to fix LegalizeType too chains the libcalls together.
However, LegalizeType is not processing nodes in order so that's harder than
it should be. For now, the move to physical register dependency will do.

rdar://10019576

llvm-svn: 138791
2011-08-30 01:34:54 +00:00
Eli Friedman
4d90e53381 Explicitly zero out parts of a vector which are required to be zero by the algorithm in LowerUINT_TO_FP_i32. This only has a substantial effect on the generated code when the input is extracted from a vector register; other ways of loading an i32 do the appropriate zeroing implicitly. Fixes PR10802.
llvm-svn: 138768
2011-08-29 21:15:46 +00:00
Owen Anderson
d2fc51c0e5 Add testcase for r138746.
llvm-svn: 138747
2011-08-29 18:02:40 +00:00
Duncan Sands
1a69e0119a Fix PR5329: pay attention to constructor/destructor priority
when outputting them.  With this, the entire LLVM testsuite
passes when built with dragonegg.

llvm-svn: 138724
2011-08-28 13:17:22 +00:00
Bill Wendling
aeeb59947e Update to new EH scheme.
llvm-svn: 138699
2011-08-27 04:53:41 +00:00
Bill Wendling
38b5c3a5bc Cannot have an llvm.eh.exception call in a non-landing pad block.
llvm-svn: 138698
2011-08-27 04:53:28 +00:00
Eli Friedman
9f95c7d381 Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction.
llvm-svn: 138660
2011-08-26 21:21:21 +00:00
Bill Wendling
5b7cbeacad Revert r138606 until LowerInvoke has been converted to the new EH scheme.
llvm-svn: 138656
2011-08-26 21:11:23 +00:00
Eli Friedman
802dd20495 Atomic load/store on ARM/Thumb.
I don't really like the patterns, but I'm having trouble coming up with a
better way to handle them.

I plan on making other targets use the same legalization
ARM-without-memory-barriers is using... it's not especially efficient, but
if anyone cares, it's not that hard to fix for a given target if there's
some better lowering.

llvm-svn: 138621
2011-08-26 02:59:24 +00:00
Bill Wendling
077e9ea84b Update to the new EH scheme.
llvm-svn: 138606
2011-08-25 23:48:37 +00:00
Bruno Cardoso Lopes
5b3d2c9e17 Add support for AVX 256-bit version of MOVDDUP!
llvm-svn: 138588
2011-08-25 21:40:37 +00:00
Andrew Trick
0dd0ae11f8 ARM fix for missing implicit operands on ldmia_ret.
rdar://10005094: miscompile of 176.gcc

llvm-svn: 138568
2011-08-25 17:50:53 +00:00
Bill Wendling
1eec5affec LSR wants to split the landing pad's critical edge. Let it do it, but use the
proper function to do it.

llvm-svn: 138550
2011-08-25 05:55:40 +00:00
Bruno Cardoso Lopes
5d34219953 Add support for 256-bit versions of VSHUFPD and VSHUFPS.
llvm-svn: 138546
2011-08-25 02:58:26 +00:00
Eli Friedman
b6597a2e70 Hook up 64-bit atomic load/store on x86-32. I plan to write more efficient implementations eventually.
llvm-svn: 138505
2011-08-24 22:33:28 +00:00
Eli Friedman
688794e1d1 Basic tests for atomic load and store on x86.
llvm-svn: 138486
2011-08-24 21:16:59 +00:00
Richard Osborne
6b6b0b535d Add Uses=[SP] to call instructions. This fixes a miscompilation with a
variable sized alloca.

llvm-svn: 138433
2011-08-24 13:32:43 +00:00
Craig Topper
1da38a34a6 Break 256-bit vector int add/sub/mul into two 128-bit operations to avoid costly scalarization. Fixes PR10711.
llvm-svn: 138427
2011-08-24 06:14:18 +00:00
Bruno Cardoso Lopes
8959b54713 Fix a nasty bug where a v4i64 was being wrong emitted with 32-bit
permutations. Also tidy up some patterns and make them close to their
instruction definition!

llvm-svn: 138392
2011-08-23 22:06:37 +00:00
Nick Lewycky
11874a4e0a PerformSubCombine to work on integers larger than i128. Fixes a crasher.
llvm-svn: 138354
2011-08-23 19:01:24 +00:00
Craig Topper
67b22aedb4 Add support for breaking 256-bit v16i16 and v32i8 VSETCC into two 128-bit ones, avoiding sclarization. Add vex form of pcmpeqq and pcmpgtq. Fixes more cases for PR10712.
llvm-svn: 138321
2011-08-23 04:36:33 +00:00
Bruno Cardoso Lopes
8024703a16 Introduce a pass to insert vzeroupper instructions to avoid AVX to
SSE transition penalty. The pass is enabled through the "x86-use-vzeroupper"
llc command line option. This is only the first step (very naive and
conservative one) to sketch out the idea, but proper DFA is coming next
to allow smarter decisions. Comments and ideas now and in further commits
will be very appreciated.

llvm-svn: 138317
2011-08-23 01:14:17 +00:00
Bruno Cardoso Lopes
8007165688 Add support for breaking 256-bit int VETCC into two 128-bit ones,
avoding scalarization of the compare. Reduces code from 59 to 6
instructions. Fix PR10712.

llvm-svn: 138271
2011-08-22 20:31:04 +00:00
Chad Rosier
0bfea70d09 With the fix in r138164: "Add <imp-def> operands to QQ and QQQQ stack loads."
-verify-machineinstrs can be enabled for this test case.

llvm-svn: 138171
2011-08-20 00:34:45 +00:00
Chad Rosier
55c57f07dd VMOVQQQQs pseudo instructions are only created by ARMBaseInstrInfo::copyPhysReg.
Therefore, rather then generate a pseudo instruction, which is later expanded,
generate the necessary instructions in place.

llvm-svn: 138163
2011-08-20 00:17:25 +00:00
Devang Patel
e4127d626e Do not use named md nodes to track variables that are completely optimized. This does not scale while doing LTO with debug info. New approach is to include list of variables in the subprogram info directly.
llvm-svn: 138145
2011-08-19 23:28:12 +00:00
Jim Grosbach
969c7a9037 Use regex to remove false dependencies on register allocation.
llvm-svn: 138137
2011-08-19 23:10:31 +00:00
Jim Grosbach
5481e15390 Update tests.
llvm-svn: 138116
2011-08-19 22:19:48 +00:00
Jakob Stoklund Olesen
f847cb77db Add test case for r138018.
llvm-svn: 138033
2011-08-19 04:30:24 +00:00
Akira Hatanaka
163382894e Use subword loads instead of a 4-byte load when the size of a structure (or a
piece of it) that is being passed by value is smaller than a word.

llvm-svn: 138007
2011-08-18 23:39:37 +00:00
Ivan Krasin
338df71d60 FastISel: avoid function calls between the materialization of the constant and its use.
llvm-svn: 137993
2011-08-18 22:06:10 +00:00
Jim Grosbach
7ecefeb594 Thumb assembly parsing and encoding for LDM instruction.
Fix base register type and canonicallize to the "ldm" spelling rather than
"ldmia." Add diagnostics for incorrect writeback token and out-of-range
registers.

llvm-svn: 137986
2011-08-18 21:50:53 +00:00
Richard Osborne
415c5ff412 Add intrinsics for SETEV, GETED, GETET.
llvm-svn: 137938
2011-08-18 13:00:48 +00:00
Bruno Cardoso Lopes
c174d8ac48 Cleanup vector logical ops in AVX and add use int versions for simple
v2i64

llvm-svn: 137919
2011-08-18 02:11:34 +00:00
Bruno Cardoso Lopes
82795e6b41 Fix PR10688. Add support for spliting 256-bit vector shifts when the
shift amount is variable

llvm-svn: 137885
2011-08-17 22:12:20 +00:00
Jim Grosbach
0115c6f75b Thumb assembly parsing and encoding for ADR.
llvm-svn: 137864
2011-08-17 20:37:40 +00:00
Bruno Cardoso Lopes
98531dfd08 Introduce matching patterns for vbroadcast AVX instruction. The idea is to
match splats in the form (splat (scalar_to_vector (load ...))) whenever
the load can be folded. All the logic and instruction emission is
working but because of PR8156, there are no ways to match loads, cause
they can never be folded for splats. Thus, the tests are XFAILed, but
I've tested and exercised all the logic using a relaxed version for
checking the foldable loads, as if the bug was already fixed. This
should work out of the box once PR8156 gets fixed since MayFoldLoad will
work as expected.

llvm-svn: 137810
2011-08-17 02:29:19 +00:00
Bruno Cardoso Lopes
0a3b3123fd Update test to not use the scalar type to splat from a load
llvm-svn: 137809
2011-08-17 02:29:15 +00:00
Bruno Cardoso Lopes
4ff4ed28af Now that we have a canonical way to handle 256-bit splats:
vinsertf128 $1 + vpermilps $0, remove the old code that used to first
do the splat in a 128-bit vector and then insert it into a larger one.
This is better because the handling code gets simpler and also makes a
better room for the upcoming vbroadcast!

llvm-svn: 137807
2011-08-17 02:29:10 +00:00
Akira Hatanaka
0179c7fa68 Add support for ext and ins.
llvm-svn: 137804
2011-08-17 02:05:42 +00:00
Bruno Cardoso Lopes
d64294fb0a Instead of always leaving the work to the generic legalizer when
there is no support for native 256-bit shuffles, be more smart in some
cases, for example, when you can extract specific 128-bit parts and use
regular 128-bit shuffles for them. Example:

For this shuffle:
  shufflevector <4 x i64> %a, <4 x i64> %b, <4 x i32>
                <i32 1, i32 0, i32 7, i32 6>

This was expanded to:
  vextractf128  $1, %ymm1, %xmm2
  vpextrq $0, %xmm2, %rax
  vmovd %rax, %xmm1
  vpextrq $1, %xmm2, %rax
  vmovd %rax, %xmm2
  vpunpcklqdq %xmm1, %xmm2, %xmm1
  vpextrq $0, %xmm0, %rax
  vmovd %rax, %xmm2
  vpextrq $1, %xmm0, %rax
  vmovd %rax, %xmm0
  vpunpcklqdq %xmm2, %xmm0, %xmm0
  vinsertf128 $1, %xmm1, %ymm0, %ymm0
  ret

Now we get:
  vshufpd $1, %xmm0, %xmm0, %xmm0
  vextractf128  $1, %ymm1, %xmm1
  vshufpd $1, %xmm1, %xmm1, %xmm1
  vinsertf128 $1, %xmm1, %ymm0, %ymm0

llvm-svn: 137733
2011-08-16 18:21:54 +00:00
Akira Hatanaka
dcbf455b98 Add test case for r137711.
llvm-svn: 137725
2011-08-16 17:32:01 +00:00
Akira Hatanaka
12df91513e Fix handling of double precision loads and stores when Mips1 is targeted.
Mips1 does not support double precision loads or stores, therefore two single
precision loads or stores must be used in place of these instructions. This 
patch treats double precision loads and stores as if they are legal
instructions until MCInstLowering, instead of generating the single precision
instructions during instruction selection or Prolog/Epilog code insertion.

Without the changes made in this patch, llc produces code that has the same 
problem described in r137484 or bails out when
MipsInstrInfo::storeRegToStackSlot or loadRegFromStackSlot is called before
register allocation.

llvm-svn: 137711
2011-08-16 03:51:51 +00:00
Bruno Cardoso Lopes
b81c3ed76d Fix PR10656. It's only profitable to use 128-bit inserts and extracts
when AVX mode is one. Otherwise is just more work for the type
legalizer.

llvm-svn: 137661
2011-08-15 21:45:54 +00:00
Eric Christopher
1bb5eaa978 Fix this test to avoid leaving a temporary file behind.
llvm-svn: 137651
2011-08-15 20:55:03 +00:00
Bob Wilson
90799621b3 Expand VMOVQQQQ pseudo instructions.
Apparently we never added code to expand these pseudo instructions, and in
over a year, no one has noticed.  Our register allocator must be awesome!

llvm-svn: 137551
2011-08-13 05:14:55 +00:00
Bruno Cardoso Lopes
2d100ca13c The VPERM2F128 is a AVX instruction which permutes between two 256-bit
vectors. It operates on 128-bit elements instead of regular scalar
types. Recognize shuffles that are suitable for VPERM2F128 and teach
the x86 legalizer how to handle them.

llvm-svn: 137519
2011-08-12 21:48:26 +00:00
Akira Hatanaka
c9c0190cbe Define unaligned load and store.
llvm-svn: 137515
2011-08-12 21:30:06 +00:00
Akira Hatanaka
6caf61a6ac Test case for 137484
llvm-svn: 137486
2011-08-12 18:12:06 +00:00
Akira Hatanaka
b787f8a8a5 Enclose directive .cprestore with .set macro and nomacro to silence assembler
warning. 

llvm-svn: 137378
2011-08-11 22:42:31 +00:00
Bruno Cardoso Lopes
328a6a980b Add a dag combine to xform 256-bit shuffles into simple vector
inserts and extracts. This simple combine makes us generate only 1
instruction instead of 11 in the v8 case.

llvm-svn: 137362
2011-08-11 21:50:44 +00:00
Bruno Cardoso Lopes
884d8b9cb5 Fix the test added by Nadav in r137308. Make it more strict:
1) check for the "v" version of movaps
2) add a couple of CHECK-NOT to guarantee the behavior
3) move to a more appropriate test file

llvm-svn: 137361
2011-08-11 21:50:35 +00:00
Bruno Cardoso Lopes
38d4afa02f Fix PR10492 by teaching MOVHLPS and MOVLPS mask matching to be more strict.
llvm-svn: 137324
2011-08-11 18:59:13 +00:00
Jim Grosbach
9717a9c0d3 ARM push of a single register encodes as pre-indexed STR.
Per the ARM ARM, a 'push' of a single register encodes as an STR,
not an STM.

llvm-svn: 137318
2011-08-11 18:07:11 +00:00
Jim Grosbach
abaaf4513f ARM pop of a single register encodes as post-indexed LDR.
Per the ARM ARM, a 'pop' of a single register encodes as an LDR,
not an LDM.

llvm-svn: 137316
2011-08-11 17:35:48 +00:00
Nadav Rotem
de1b485f3f [AVX] If the data which is going to be saved is already in two XMM registers
(for example, after integer operation), do not pack the registers into a YMM
before saving. Its better to save as two XMM registers.

Before:
                vinsertf128         $1, %xmm3, %ymm0, %ymm3
                vinsertf128         $0, %xmm1, %ymm3, %ymm1
                vmovaps              %ymm1, 416(%rsp)

After:
                vmovaps              %xmm3, 416+16(%rsp)
                vmovaps              %xmm1, 416(%rsp)

llvm-svn: 137308
2011-08-11 16:41:21 +00:00
Chris Lattner
3ae8704c4f add missing colon, thanks peter.
llvm-svn: 137306
2011-08-11 16:15:10 +00:00
Chris Lattner
575057916a fix PR10605 / rdar://9930964 by adding a pretty scary missed check.
It's somewhat surprising anything works without this.  Before we would
compile the testcase into:

test:                                   # @test
	movl	$4, 8(%rdi)
	movl	8(%rdi), %eax
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2

now we produce:

test:                                   # @test
	movl	8(%rdi), %eax
	movl	$4, 8(%rdi)
	orl	%esi, %eax
	cmpl	$32, %edx
	movl	%eax, -4(%rsp)          # 4-byte Spill
	je	.LBB0_2
llvm-svn: 137303
2011-08-11 06:26:54 +00:00
Bruno Cardoso Lopes
8674ddf55a Splats for v8i32/v8f32 can be handled by VPERMILPSY. This was causing
infinite recursive calls in legalize. Fix PR10562

llvm-svn: 137296
2011-08-11 02:49:44 +00:00
Bruno Cardoso Lopes
954ac403c7 Use the splat index to generate the desired shuffle. Otherwise we
could only get undefs and the vector shuffle becomes an undef,
generating wrong code.

llvm-svn: 137295
2011-08-11 02:49:41 +00:00
Eli Friedman
17bd9e5d7c Fix X86TargetLowering::LowerExternalSymbol so that it actually works in non-trivial cases. This hasn't been an issue before because the function isn't normally called (but apparently is used to generate a tail-call to sin() on ELF x86-32 with PIC and SSE2).
Fixes PR9693.

llvm-svn: 137292
2011-08-11 01:48:05 +00:00
NAKAMURA Takumi
5d316f7632 test/CodeGen/X86/opt-shuff-tstore.ll: Add explicit -mtriple=x86_64-linux.
llvm-svn: 137262
2011-08-10 22:52:48 +00:00
Devang Patel
393d6e1fd0 While extending definition range of a debug variable, consult lexical scopes also. There is no point extending debug variable out side its lexical block. This provides 6x compile time speedup in some cases.
llvm-svn: 137250
2011-08-10 21:25:34 +00:00
Nadav Rotem
1b3075c0ab Fix the test. Add cpu target.
llvm-svn: 137241
2011-08-10 19:49:19 +00:00
Nadav Rotem
4a8d78d24a When performing a truncating store, it is sometimes possible to rearrange the
data in-register prior to saving to memory.  When we reorder the data in memory
we prevent the need to save multiple scalars to memory, making a single regular
store.

llvm-svn: 137238
2011-08-10 19:30:14 +00:00
Bruno Cardoso Lopes
565ab1542a The following X86 pattern is incorrect:
def : Pat<(X86Movss VR128:$src1,
                   (bc_v4i32 (v2i64 (load addr:$src2)))),
          (MOVLPSrm VR128:$src1, addr:$src2)>;
This matches a MOVSS dag with a MOVLPS instruction. However, MOVSS will replace only the low 32 bits of the register, while the MOVLPS instruction will replace the low 64 bits. A testcase is added and illustrates the bug and also modified the one that was already present. Patch by Tanya Lattner.

llvm-svn: 137227
2011-08-10 17:45:17 +00:00
Rafael Espindola
45cd7316b5 Add support for the R and Q constraints.
llvm-svn: 137217
2011-08-10 16:26:42 +00:00
Bruno Cardoso Lopes
4a435a361d Fix a bug in vpermilps mask checking. Fix PR10560
llvm-svn: 137194
2011-08-10 01:54:17 +00:00
Bruno Cardoso Lopes
9a695724bd Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Bruno Cardoso Lopes
7461b930f3 Add v16i16 and v32i8 store patterns
llvm-svn: 137166
2011-08-09 22:39:53 +00:00
Bruno Cardoso Lopes
028c6aa951 Use fp unpack instructions to unpack int types. Until we have AVX2, this
is the best we can do for these patterns. This fix PR10554.

llvm-svn: 137161
2011-08-09 22:18:37 +00:00
Eli Friedman
44fd5b2b59 Fix a couple ridiculous copy-paste errors. rdar://9914773 .
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Bill Wendling
250ea7930e Revert r137134. It breaks some code as Eli pointed out.
llvm-svn: 137135
2011-08-09 18:56:35 +00:00
Bill Wendling
ca256c0d2d Print out the variable declaration only if it is a declaration. Otherwise, a
'static' variable will be emitted twice.
PR10081

llvm-svn: 137134
2011-08-09 18:31:50 +00:00
Jakob Stoklund Olesen
e43aca1c39 Inflate register classes after coalescing.
Coalescing can remove copy-like instructions with sub-register operands
that constrained the register class.  Examples are:

  x86: GR32_ABCD:sub_8bit_hi -> GR32
  arm: DPR_VFP2:ssub0 -> DPR

Recompute the register class of any virtual registers that are used by
less instructions after coalescing.

This affects code generation for the Cortex-A8 where we use NEON
instructions for f32 operations, c.f. fp_convert.ll:

  vadd.f32  d16, d1, d0
  vcvt.s32.f32  d0, d16

The register allocator is now free to use d16 for the temporary, and
that comes first in the allocation order because it doesn't interfere
with any s-registers.

llvm-svn: 137133
2011-08-09 18:19:41 +00:00
Bruno Cardoso Lopes
633400ee00 Reapply a more appropriate solution than in r137114. AVX supports
v4f64 = sitofp v4i32. This fix PR10559.
Also add support for v4i32 = fptosi v4f64.

llvm-svn: 137128
2011-08-09 17:39:13 +00:00
Bruno Cardoso Lopes
1962a341d8 Revert r137114
llvm-svn: 137127
2011-08-09 17:39:01 +00:00
Justin Holewinski
021ab783b7 PTX: Add initial support for device function calls
- Calls are supported on SM 2.0+ for function with no return values

llvm-svn: 137125
2011-08-09 17:36:31 +00:00
Bruno Cardoso Lopes
5dac86dac6 Handle sitofp between v4f64 <- v4i32. Fix PR10559
llvm-svn: 137114
2011-08-09 05:48:01 +00:00
Bruno Cardoso Lopes
d521431558 Add support for avx vector fextend
llvm-svn: 137105
2011-08-09 03:04:29 +00:00
Bruno Cardoso Lopes
81534df169 Rename and tidy up tests
llvm-svn: 137103
2011-08-09 03:04:23 +00:00
Bruno Cardoso Lopes
1025d1eb3b Add two patterns to match special vmovss and vmovsd cases. Also fix
the patterns already there to be more strict regarding the predicate.
This fixes PR10558

llvm-svn: 137100
2011-08-09 01:43:09 +00:00
Bruno Cardoso Lopes
d7eac41193 Make LowerVSETCC aware of AVX types and add patterns to match them.
llvm-svn: 137090
2011-08-09 00:46:57 +00:00
Bruno Cardoso Lopes
d8534855ff Add support for several vector shifts operations while in AVX mode. Fix PR10581
llvm-svn: 137067
2011-08-08 21:31:08 +00:00
Eli Friedman
7a34419c6f Fix up the patterns for SXTB, SXTH, UXTB, and UXTH so that they are correctly active without HasT2ExtractPack. PR10611.
llvm-svn: 137061
2011-08-08 19:49:37 +00:00
Jakob Stoklund Olesen
85931574b0 Don't clobber pending ST regs when FP regs are killed.
X86FloatingPoint keeps track of pending ST registers for an upcoming
inline asm instruction with fixed stack register constraints.  It does
this by remembering which FP register holds the value that should appear
at a fixed stack position for the inline asm.

When that FP register is killed before the inline asm, make sure to
duplicate it to a scratch register, so the ST register still has a live
FP reference.

This could happen when the same FP register was copied to two ST
registers, or when a spill instruction is inserted between the ST copy
and the inline asm.

This fixes PR10602.

llvm-svn: 137050
2011-08-08 17:15:43 +00:00
Rafael Espindola
2da6e6a1d8 print st_shndx with the correct number of bits.
llvm-svn: 136880
2011-08-04 15:50:13 +00:00
Rafael Espindola
c1a076eeb1 print st_other with the correct number of bits.
llvm-svn: 136877
2011-08-04 15:38:19 +00:00
Rafael Espindola
368850841d print st_type with the correct number of bits.
llvm-svn: 136875
2011-08-04 15:24:00 +00:00
Rafael Espindola
e08bb3d50f Print st_bind with the correct number of bits.
llvm-svn: 136874
2011-08-04 15:10:35 +00:00
Rafael Espindola
865ab6cb05 Print r_sym with the correct number of bits.
llvm-svn: 136873
2011-08-04 14:48:27 +00:00
Rafael Espindola
f65dd30907 Print r_type with the correct number of bits.
llvm-svn: 136872
2011-08-04 14:39:30 +00:00
Rafael Espindola
edfafcbfb0 Change anther counter to decimal.
llvm-svn: 136870
2011-08-04 14:01:03 +00:00
Rafael Espindola
3e8393e6f7 Don't print a counter in hex.
llvm-svn: 136869
2011-08-04 13:39:15 +00:00
Bill Wendling
60e17f8212 Only access both operands of an INSERT_SUBVECTOR if it is an INSERT_SUBVECTOR.
Fixes PR10527.

llvm-svn: 136853
2011-08-04 00:32:58 +00:00
Benjamin Kramer
d93ac7d0b6 Remove underscore that's breaking linux buildbots.
llvm-svn: 136833
2011-08-03 23:13:01 +00:00
Jakub Staszak
9d083611d4 Use MachineBranchProbabilityInfo in If-Conversion instead of its own heuristics.
llvm-svn: 136826
2011-08-03 22:34:43 +00:00
Jakob Stoklund Olesen
002075193b Handle IMPLICIT_DEF instructions in X86FloatingPoint.
This fixes PR10575.

llvm-svn: 136787
2011-08-03 16:33:19 +00:00
Devang Patel
99a2f0d98c Use byte offset, instead of element number, to access merged global.
llvm-svn: 136759
2011-08-03 01:25:46 +00:00
Rafael Espindola
cefc38659a Assume .cfi_startproc is the first thing in a function. If the function is
externally visable, create a local symbol to use in the CFE. If not, use the
function label itself.

Fixes PR10420.

llvm-svn: 136716
2011-08-02 20:24:22 +00:00
Bruno Cardoso Lopes
ac0984dc7e Make this kind of lowering to be supported by 256-bit instructions:
shuffle (scalar_to_vector (load (ptr + 4))), undef, <0, 0, 0, 0>
To:
  shuffle (vload ptr)), undef, <1, 1, 1, 1>
Fix PR10494

llvm-svn: 136691
2011-08-02 16:06:18 +00:00
Bruno Cardoso Lopes
771876cade Add v4f64 -> v2f32 fp_round support. Also add a testcase to exercise
the legalizer. This commit together with the two previous ones fixes
PR10495.

llvm-svn: 136654
2011-08-01 21:54:09 +00:00
Bruno Cardoso Lopes
d3a5171087 Since vectors with all ones can't be created with a 256-bit instruction,
avoid returning early for v8i32 types, which would only be valid for
vector with all zeros. Also split the handling of zeros and ones into separate
checking logic since they are handled differently. This fixes PR10547

llvm-svn: 136642
2011-08-01 19:51:53 +00:00
Richard Osborne
2cd07cf351 Fix crash with varargs function with no named parameters.
llvm-svn: 136623
2011-08-01 16:45:59 +00:00
Jakob Stoklund Olesen
0f099a3c58 Revert "Don't check liveness of unallocatable registers."
The ARM target depends on CPSR liveness being tracked after register
allocation.

llvm-svn: 136548
2011-07-30 00:57:25 +00:00
Jakob Stoklund Olesen
a05b70241c Don't check liveness of unallocatable registers.
This includes registers like EFLAGS and ST0-ST7. We don't check for
liveness issues in the verifier and scavenger because registers will
never be allocated from these classes.

While in SSA form, we do care about the liveness of unallocatable
unreserved registers. Liveness of EFLAGS and ST0 neds to be correct for
MachineDCE and MachineSinking.

llvm-svn: 136541
2011-07-29 23:36:21 +00:00
Eric Christopher
96b31d5681 Add support for the 'Q' constraint.
Fixes rdar://9866494

llvm-svn: 136523
2011-07-29 21:18:58 +00:00
Bruno Cardoso Lopes
871df895f4 Fix two tests that I crashed in the previous commits. The mask elts
on the second half must be reindexed.

llvm-svn: 136454
2011-07-29 02:05:28 +00:00
Bruno Cardoso Lopes
2b3d85d81c Match VPERMIL masks more strictly and update the target specific mask
generation to always catch the weird cases.

llvm-svn: 136453
2011-07-29 01:31:15 +00:00
Bruno Cardoso Lopes
473d982caf Add v8i32 and v4i64 vpermil patterns
llvm-svn: 136451
2011-07-29 01:31:07 +00:00
Jakob Stoklund Olesen
cc29034b4c Transfer implicit operands in NEONMoveFixPass.
Later passes /are/ using this information when running the register
scavenger.

This fixes the second problem in PR10520.

llvm-svn: 136440
2011-07-29 00:27:35 +00:00
Jakob Stoklund Olesen
f97f492104 Add -verify-arm-pseudo-expand.
This hidden llc option runs the machine code verifier after expanding
ARM pseudo-instructions, but before if-conversion.

The machine code verifier is much better at pointing out liveness errors
that can trip up the register scavenger.

llvm-svn: 136439
2011-07-29 00:27:32 +00:00
Jakob Stoklund Olesen
5f429460ba Handle REG_SEQUENCE with implicitly defined operands.
Code like that would only be produced by bugpoint, but we should still
handle it correctly.

When a register is defined by a REG_SEQUENCE of undefs, the register
itself is undef. Previously, we would create a register with uses but no
defs.

Fixes part of PR10520.

llvm-svn: 136401
2011-07-28 21:38:51 +00:00
Bruno Cardoso Lopes
e24a043703 Add patterns to generate copies for extract_subvector instead of
using vextractf128. This will reduce the number of issued instruction
for several avx codes.

llvm-svn: 136323
2011-07-28 01:26:50 +00:00
Bruno Cardoso Lopes
1f63a37172 Add a few patterns to match allzeros without having to use the fp unit.
Take advantage that the 128-bit vpxor zeros the higher part and use it.
This also fixes PR10491

llvm-svn: 136321
2011-07-28 01:26:43 +00:00
Bruno Cardoso Lopes
06d8be564f Add SINT_TO_FP and FP_TO_SINT support for v8i32 types. Also move
a convert pattern close to the instruction definition.

llvm-svn: 136320
2011-07-28 01:26:39 +00:00
Bruno Cardoso Lopes
8830fde434 The vpermilps and vpermilpd have different behaviour regarding the
usage of the shuffle bitmask. Both work in 128-bit lanes without
crossing, but in the former the mask of the high part is the same
used by the low part while in the later both lanes have independent
masks. Handle this properly and and add support for vpermilpd.

llvm-svn: 136200
2011-07-27 00:56:34 +00:00
Devang Patel
e85a416d4e It is quiet possible that inlined function body is split into multiple chunks of consequtive instructions. But, there is not any way to describe this in .debug_inline accelerator table used by gdb. However, describe non contiguous ranges of inlined function body appropriately using AT_range of DW_TAG_inlined_subroutine debug info entry.
llvm-svn: 136196
2011-07-27 00:34:13 +00:00
Jakob Stoklund Olesen
3f729850d3 Eliminate copies of undefined values during coalescing.
These copies would coalesce easily, but the resulting value would be
defined by a deleted instruction. Now we also remove the undefined value
number from the destination register.

This fixes PR10503.

llvm-svn: 136174
2011-07-26 23:00:24 +00:00
Benjamin Kramer
32a2ce8416 Update test.
llvm-svn: 136170
2011-07-26 22:45:39 +00:00
Benjamin Kramer
bfc2dfe3f7 Add a neat little two's complement hack for x86.
On x86 we can't encode an immediate LHS of a sub directly. If the RHS comes from a XOR with a constant we can
fold the negation into the xor and add one to the immediate of the sub. Then we can turn the sub into an add,
which can be commuted and encoded efficiently.

This code is generated for __builtin_clz and friends.

llvm-svn: 136167
2011-07-26 22:42:13 +00:00
Bruno Cardoso Lopes
e53bb853ea Recognize unpckh* masks and match 256-bit versions. The new versions are
different from the previous 128-bit because they work in lanes.
Update a few comments and add testcases

llvm-svn: 136157
2011-07-26 22:03:40 +00:00
Eli Friedman
4e16c5341a Prevent x86-specific DAGCombine from creating nodes with illegal type (which could not be selected). Fixes a minor isel issue that was breaking the testcase from r136130.
llvm-svn: 136148
2011-07-26 21:02:58 +00:00
Jim Grosbach
906ecb46ed FileCheck'ize test.
llvm-svn: 136135
2011-07-26 20:49:44 +00:00
Eli Friedman
8779017138 XFAIL this test while I investigate it; it's failing for an unexpected reason.
llvm-svn: 136131
2011-07-26 20:41:03 +00:00
Eli Friedman
e52bee3cc9 Add obvious missing case to switch. PR10497.
llvm-svn: 136130
2011-07-26 20:38:49 +00:00
Bruno Cardoso Lopes
ab40a57cce Add 256-bit isel for movsldup/movshdup
llvm-svn: 136051
2011-07-26 02:39:32 +00:00