1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-01 00:12:50 +01:00
Commit Graph

1907 Commits

Author SHA1 Message Date
Dan Gohman
42337e0ee9 Generalize LSR's OptimizeMax to handle the new kinds of max expressions
that indvars may use, now that indvars is recognizing le and ge loops.

llvm-svn: 102235
2010-04-24 03:13:44 +00:00
Stuart Hastings
85b5c330f2 Per Chris, fuse four trivial tests using grep (r102199) into one that uses FileCheck.
llvm-svn: 102216
2010-04-23 22:12:57 +00:00
Dan Gohman
6a48222bd8 Change TargetData's algorithm for computing defualt vector type
alignment to match what's used in clang and GCC for __alignof, rather
than trying to guess what Legalize is going to be doing.

llvm-svn: 102206
2010-04-23 19:41:15 +00:00
Stuart Hastings
ad81819149 Add some missing x86 patterns for movdq2q. Fixes two (LLVM-)GCC DejaGNU testcases. Radar 6881029.
llvm-svn: 102199
2010-04-23 19:03:32 +00:00
Dan Gohman
38949c2f1f Fix LSR to tolerate cases where ScalarEvolution initially
misses an opportunity to fold add operands, but folds them
after LSR has separated them out. This fixes rdar://7886751.

llvm-svn: 102157
2010-04-23 01:55:05 +00:00
Evan Cheng
a324da99ae Do not try to optimize a copy that has already been marked for deletion.
llvm-svn: 102027
2010-04-21 20:57:54 +00:00
Evan Cheng
dbfb7dc438 Implement -disable-non-leaf-fp-elim which disable frame pointer elimination
optimization for non-leaf functions. This will be hooked up to gcc's
-momit-leaf-frame-pointer option. rdar://7886181

llvm-svn: 101984
2010-04-21 03:18:23 +00:00
Evan Cheng
a0c4b2952f - Clean up some crappy code which deals with coalescing of copies which look at
extract_subreg / insert_subreg, etc.
- Add support for more aggressive insert_subreg coalescing.

llvm-svn: 101971
2010-04-21 00:44:22 +00:00
Dan Gohman
570b621976 Add another variant of this test which found a place where
CodeGen's ComputeMaskedBits was being over-conservative when computing
bits for an ADD.

llvm-svn: 101963
2010-04-21 00:19:28 +00:00
Chris Lattner
6db0f451a7 teach the x86 address matching stuff to handle
(shl (or x,c), 3) the same as (shl (add x, c), 3)
when x doesn't have any bits from c set.

This finishes off PR1135.  Before we compiled the block to:
to:

LBB0_3:                                 ## %bb
	cmpb	$4, %dl
	sete	%dl
	addb	%dl, %cl
	movb	%cl, %dl
	shlb	$2, %dl
	addb	%r8b, %dl
	shlb	$2, %dl
	movzbl	%dl, %edx
	movl	%esi, (%rdi,%rdx,4)
	leaq	2(%rdx), %r9
	movl	%esi, (%rdi,%r9,4)
	leaq	1(%rdx), %r9
	movl	%esi, (%rdi,%r9,4)
	addq	$3, %rdx
	movl	%esi, (%rdi,%rdx,4)
	incb	%r8b
	decb	%al
	movb	%r8b, %dl
	jne	LBB0_1

Now we produce:

LBB0_3:                                 ## %bb
	cmpb	$4, %dl
	sete	%dl
	addb	%dl, %cl
	movb	%cl, %dl
	shlb	$2, %dl
	addb	%r8b, %dl
	shlb	$2, %dl
	movzbl	%dl, %edx
	movl	%esi, (%rdi,%rdx,4)
	movl	%esi, 8(%rdi,%rdx,4)
	movl	%esi, 4(%rdi,%rdx,4)
	movl	%esi, 12(%rdi,%rdx,4)
	incb	%r8b
	decb	%al
	movb	%r8b, %dl
	jne	LBB0_1

llvm-svn: 101958
2010-04-20 23:18:40 +00:00
Bill Wendling
a87efb5d0f Move CodeGen/X86/2010-04-19-DAGCombineCrash.ll into CodeGen/X86/crash.ll. Also
reduce.

llvm-svn: 101925
2010-04-20 18:14:47 +00:00
Bill Wendling
887dac2aa6 The visitXOR method can return the same SDNode. If so, we don't want to delete
it as it's not dead.

llvm-svn: 101855
2010-04-20 01:25:01 +00:00
Dan Gohman
5736cd1e47 Start function numbering at 0.
llvm-svn: 101638
2010-04-17 16:29:15 +00:00
Evan Cheng
d3d5e6793a Add nounwind.
llvm-svn: 101613
2010-04-17 03:43:36 +00:00
Jakob Stoklund Olesen
7e77f60652 Add test case for machine-sink on critical edges
llvm-svn: 101416
2010-04-15 23:19:16 +00:00
Chris Lattner
1b7ecfdf60 enhance the load/store narrowing optimization to handle a
tokenfactor in between the load/store.  This allows us to 
optimize test7 into:

_test7:                                 ## @test7
## BB#0:                                ## %entry
	movl	(%rdx), %eax
                                        ## kill: SIL<def> ESI<kill>
	movb	%sil, 5(%rdi)
	ret

instead of:

_test7:                                 ## @test7
## BB#0:                                ## %entry
	movl	4(%esp), %ecx
	movl	$-65281, %eax           ## imm = 0xFFFFFFFFFFFF00FF
	andl	4(%ecx), %eax
	movzbl	8(%esp), %edx
	shll	$8, %edx
	addl	%eax, %edx
	movl	12(%esp), %eax
	movl	(%eax), %eax
	movl	%edx, 4(%ecx)
	ret

llvm-svn: 101355
2010-04-15 06:10:49 +00:00
Chris Lattner
8c5a5c9094 teach codegen to turn trunc(zextload) into load when possible.
This doesn't occur much at all, it only seems to formed in the case
when the trunc optimization kicks in due to phase ordering.  In that
case it is saves a few bytes on x86-32.

llvm-svn: 101350
2010-04-15 05:40:59 +00:00
Chris Lattner
3282f3d34f Implement rdar://7860110 (also in target/readme.txt) narrowing
a load/or/and/store sequence into a narrower store when it is
safe.  Daniel tells me that clang will start producing this sort
of thing with bitfields, and this does  trigger a few dozen times
on 176.gcc produced by llvm-gcc even now.

This compiles code like CodeGen/X86/2009-05-28-DAGCombineCrash.ll 
into:

        movl    %eax, 36(%rdi)

instead of:

        movl    $4294967295, %eax       ## imm = 0xFFFFFFFF
        andq    32(%rdi), %rax
        shlq    $32, %rcx
        addq    %rax, %rcx
        movq    %rcx, 32(%rdi)

and each of the testcases into a single store.  Each of them used
to compile into craziness like this:

_test4:
	movl	$65535, %eax            ## imm = 0xFFFF
	andl	(%rdi), %eax
	shll	$16, %esi
	addl	%eax, %esi
	movl	%esi, (%rdi)
	ret

llvm-svn: 101343
2010-04-15 04:48:01 +00:00
Chris Lattner
553267e9cc further tweak this to do something useful.
llvm-svn: 101341
2010-04-15 04:31:42 +00:00
Chris Lattner
a4b3756baf remove undef control flow.
llvm-svn: 101340
2010-04-15 04:30:19 +00:00
Jakob Stoklund Olesen
7343a90490 Remove unneeded types from test.
llvm-svn: 101286
2010-04-14 20:56:09 +00:00
Evan Cheng
172f2f9e2d Add test for post-ra machine licm.
llvm-svn: 101182
2010-04-13 22:10:03 +00:00
Evan Cheng
6ffb1ed4fb Fix test on non-x86 hosts.
llvm-svn: 101163
2010-04-13 18:54:04 +00:00
Evan Cheng
b8861dcb04 Re-apply 101075 and fix it properly. Just reuse the debug info of the branch instruction being optimized. There is no need to --I which can deref off start of the BB.
llvm-svn: 101162
2010-04-13 18:50:27 +00:00
Eric Christopher
330ca0c937 Temporarily revert r101075, it's causing invalid iterator assertions
in a nightly tester.

llvm-svn: 101158
2010-04-13 18:37:58 +00:00
Chris Lattner
dabcd9738c add llvm codegen support for -ffunction-sections and -fdata-sections,
patch by Sylvere Teissier!

llvm-svn: 101106
2010-04-13 00:36:43 +00:00
Evan Cheng
ec21a36774 Use .set expression for x86 pic jump table reference to reduce assembly relocation. rdar://7738756
llvm-svn: 101085
2010-04-12 23:07:17 +00:00
Bill Wendling
5a56f7fc20 Third time's a charm...
llvm-svn: 101081
2010-04-12 22:43:21 +00:00
Bill Wendling
e1bf74de52 Genericize the label test.
llvm-svn: 101079
2010-04-12 22:40:37 +00:00
Bill Wendling
fd58812c95 Correct test to test what I mean it to test.
llvm-svn: 101077
2010-04-12 22:25:42 +00:00
Bill Wendling
1f2e71928c Micro-optimization:
If we have this situation:

    jCC  L1
    jmp  L2
L1:
  ...
L2:
  ...

We can get a small performance boost by emitting this instead:

    jnCC L2
L1:
  ...
L2:
  ...

This testcase shows an example of this:

float func(float x, float y) {
    double product = (double)x * y;
    if (product == 0.0)
        return product;
    return product - 1.0;
}

llvm-svn: 101075
2010-04-12 22:19:57 +00:00
Evan Cheng
90788354c9 Enable post regalloc machine licm by default.
llvm-svn: 101023
2010-04-12 06:25:28 +00:00
Dan Gohman
a451b859f9 Merge a few fast-isel tests.
llvm-svn: 100860
2010-04-09 15:03:55 +00:00
Evan Cheng
619f1b8a94 Coalescer should not delete copy instructions whose defs are partially dead. e.g.
%RDI<def,dead> = MOV64rr %RAX<kill>, %EDI<imp-def>

llvm-svn: 100804
2010-04-08 20:02:37 +00:00
Evan Cheng
3fa0b6fb03 Avoid using f64 to lower memcpy from constant string. It's cheaper to use i32 store of immediates.
llvm-svn: 100751
2010-04-08 07:37:57 +00:00
Dan Gohman
0a77dd29de When expanding expressions which are using post-inc mode for multiple loops,
ensure that the expansion is dominated by the increments of those loops.

llvm-svn: 100748
2010-04-08 05:57:57 +00:00
Chris Lattner
23334439e9 add newlines at the end of files.
llvm-svn: 100705
2010-04-07 22:53:17 +00:00
Dan Gohman
b5210c934f Generalize IVUsers to track arbitrary expressions rather than expressions
explicitly split into stride-and-offset pairs. Also, add the
ability to track multiple post-increment loops on the same expression.

This refines the concept of "normalizing" SCEV expressions used for
to post-increment uses, and introduces a dedicated utility routine for
normalizing and denormalizing expressions.

This fixes the expansion of expressions which are post-increment users
of more than one loop at a time. More broadly, this takes LSR another
step closer to being able to reason about more than one loop at a time.

llvm-svn: 100699
2010-04-07 22:27:08 +00:00
Dale Johannesen
4cdb545401 Split big test into multiple directories to cater to
those who don't build all targets.

llvm-svn: 100688
2010-04-07 20:43:35 +00:00
Chris Lattner
367cbc160c this has a pr!
llvm-svn: 100637
2010-04-07 18:04:56 +00:00
Chris Lattner
60e5f55565 fix a latent bug my inline asm stuff exposed:
MachineOperand::isIdenticalTo wasn't handling metadata operands.

llvm-svn: 100636
2010-04-07 18:03:19 +00:00
Jakob Stoklund Olesen
370a3e553f Don't try to collapse DomainValues onto an incompatible SSE domain.
This fixes the Bullet regression on i386/nocona.

llvm-svn: 100553
2010-04-06 19:48:56 +00:00
Evan Cheng
7d177beb1d Add nounwind.
llvm-svn: 100482
2010-04-05 22:30:05 +00:00
Dan Gohman
2aff0055c1 Don't do code sinking on unreachable blocks. It's unprofitable and hazardous.
llvm-svn: 100455
2010-04-05 19:17:22 +00:00
Chris Lattner
779e051357 resolve a fixme.
llvm-svn: 100346
2010-04-04 19:28:59 +00:00
Evan Cheng
5d825988d0 Correctly lower memset / memcpy of undef. It should be a nop. PR6767.
llvm-svn: 100208
2010-04-02 19:36:14 +00:00
Dan Gohman
7051600e3f Revert the recent alignment changes. They're broken for -Os because,
in particular, they end up aligning strings at 16-byte boundaries, and
there's no way for GlobalOpt to check OptForSize.

llvm-svn: 100172
2010-04-02 03:04:37 +00:00
Dan Gohman
ec731a99fa Remove this initializer so that the optimizer doesn't convert
unaligned loads into aligned loads.

llvm-svn: 100166
2010-04-02 01:26:13 +00:00
Dan Gohman
d14fe2a5cd Update this test for the new preferred alignment heuristics.
llvm-svn: 100165
2010-04-02 01:24:08 +00:00
Evan Cheng
9508456bb6 In 64-bit mode, use i64 to lower memcpy / memset instead of f64.
llvm-svn: 100137
2010-04-01 20:27:45 +00:00
Evan Cheng
8728924812 - Avoid using floating point stores to implement memset unless the value is zero.
- Do not try to infer GV alignment unless its type is sized. It's not possible to infer alignment if it has opaque type.

llvm-svn: 100118
2010-04-01 18:19:11 +00:00
Evan Cheng
202b6327c4 Add -mcpu to memcpy / memset tests to ensure they behave the same on all hosts / targets.
llvm-svn: 100101
2010-04-01 08:25:26 +00:00
Evan Cheng
562bb43207 Fix sdisel memcpy, memset, memmove lowering:
1. Makes it possible to lower with floating point loads and stores.
2. Avoid unaligned loads / stores unless it's fast.
3. Fix some memcpy lowering logic bug related to when to optimize a
   load from constant string into a constant.
4. Adjust x86 memcpy lowering threshold to make it more sane.
5. Fix x86 target hook so it uses vector and floating point memory
   ops more effectively.
rdar://7774704

llvm-svn: 100090
2010-04-01 06:04:33 +00:00
Jakob Stoklund Olesen
58296f9543 Replace V_SET0 with variants for each SSE execution domain.
llvm-svn: 99975
2010-03-31 00:40:13 +00:00
Jakob Stoklund Olesen
13a7a0adff Fix typo. Thank you, valgrind.
llvm-svn: 99974
2010-03-31 00:40:08 +00:00
Jakob Stoklund Olesen
c6213a8bc0 Not all platforms start symbols with _
llvm-svn: 99959
2010-03-30 23:12:48 +00:00
Jakob Stoklund Olesen
a027a6d4f2 Enable -sse-domain-fix by default. Now with tests!
llvm-svn: 99954
2010-03-30 22:47:00 +00:00
Eric Christopher
4c3a3208e3 Remove the pmulld intrinsic and autoupdate it as a vector multiply.
Rewrite the pmulld patterns, and make sure that they fold in loads of
arguments into the instruction.

llvm-svn: 99910
2010-03-30 18:49:01 +00:00
Chris Lattner
e04fb0a1a7 teach tblgen to allow patterns like (add (i32 (bitconvert (i32 GPR))), 4),
transforming it into (add (i32 GPR), 4).  This allows us to write type
generic multi patterns and have tblgen automatically drop the bitconvert
in the case when the types align.  This allows us to fold an extra load
in the changed testcase.

llvm-svn: 99756
2010-03-28 08:38:32 +00:00
Evan Cheng
d1ee7e0ba3 Do not sibcall if stack needs to be dynamically aligned.
llvm-svn: 99620
2010-03-26 16:26:03 +00:00
Evan Cheng
377bb993d8 Allow trivial sibcall of vararg callee when no arguments are being passed.
llvm-svn: 99598
2010-03-26 02:13:13 +00:00
Evan Cheng
0b7dd682dd Try trivial remat before the coalescer gives up on a vr / physreg coalescing for fear of tying up a physical register.
llvm-svn: 99575
2010-03-26 00:07:25 +00:00
Evan Cheng
eb5c7f65dc Add nounwind.
llvm-svn: 99546
2010-03-25 20:01:07 +00:00
Chris Lattner
391902aa30 Make the NDEBUG assertion stronger and more clear what is
happening.

Enhance scheduling to set the DEAD flag on implicit defs
more aggressively.  Before, we'd set an implicit def operand
to dead if it were present in the SDNode corresponding to
the machineinstr but had no use.  Now we do it in this case
AND if the implicit def does not exist in the SDNode at all.

This exposes a couple of problems: one is the FIXME, which
causes a live intervals crash on CodeGen/X86/sibcall.ll.
The second is that it makes machinecse and licm more 
aggressive (which is a good thing) but also exposes a case
where licm hoists a set0 and then it doesn't get resunk.

Talking to codegen folks about both these issues, but I need
this patch in in the meantime.

llvm-svn: 99485
2010-03-25 05:40:48 +00:00
Nate Begeman
820b07ed10 BUILD_VECTOR was missing out on some prime opportunities to use SSE 4.1 inserts.
llvm-svn: 99423
2010-03-24 20:49:50 +00:00
Evan Cheng
77f5ed5de5 Stupid svn. Add back to the lost sibcall tests.
llvm-svn: 99033
2010-03-20 03:17:05 +00:00
Kevin Enderby
766909ae3b Fixed the encoding problems of the crc32 instructions. All had the Operand size
override prefix and only the r/m16 forms should have had that.  Also for variant
one, the AT&T syntax, added suffixes to all forms.  Also added the missing
64-bit form for 'CRC32 r64, r/m8'.  Plus added test cases for all forms and
tweaked one test case to add the needed suffixes.

llvm-svn: 98980
2010-03-19 20:04:42 +00:00
Mon P Wang
005a544af8 Fixed a widening bug where we were not using the correct size for the load
llvm-svn: 98920
2010-03-19 01:19:52 +00:00
Evan Cheng
5ace6fffac Turning off post-ra scheduling for x86. It isn't a consistent win.
llvm-svn: 98810
2010-03-18 06:55:42 +00:00
Evan Cheng
41730fc7c8 X86 address mode matching code MatchAddressRecursively does some aggressive hack which require doing a RAUW. It may end up deleting some SDNode up stream. It should avoid referencing deleted nodes.
llvm-svn: 98780
2010-03-17 23:58:35 +00:00
Dan Gohman
7ac4578c0e Add an rdar number to this test.
llvm-svn: 98654
2010-03-16 19:08:20 +00:00
Bill Wendling
8d2ee208ab Forgot testcase for r98599.
llvm-svn: 98602
2010-03-16 01:54:20 +00:00
Daniel Dunbar
241d3cb048 MC: Allow modifiers in MCSymbolRefExpr, and eliminate X86MCTargetExpr.
- Although it would be nice to allow this decoupling, the assembler needs to be able to reason about MCSymbolRefExprs in too many places to make this viable. We can use a target specific encoding of the variant if this becomes an issue.
 - This patch also extends llvm-mc to support parsing of the modifiers, as opposed to lumping them in with the symbol.

llvm-svn: 98592
2010-03-15 23:51:06 +00:00
Dan Gohman
db6002b964 Recognize code for doing vector gather/scatter index calculations with
32-bit indices. Instead of shuffling each element out of the index vector,
when all indices are needed, just store the input vector to the stack and
load the elements out.

llvm-svn: 98588
2010-03-15 23:23:03 +00:00
Chris Lattner
45a0ae21b8 Implement support for the case when a reference to a addr-of-bb
label is generated, but then the block is deleted.  Since the
value is undefined, we just emit the label right after the entry 
label of the function.  It might matter that the label is in the
same section as the function was afterall.

llvm-svn: 98579
2010-03-15 20:39:00 +00:00
Chris Lattner
802ebf9561 Fix the case when a reference to an address taken BB is emitted in one
function, then the BB is RAUW'd before the definition is emitted.  There
are still two cases not being handled, but this should improve us back to
the situation before I touched anything.

llvm-svn: 98566
2010-03-15 19:09:43 +00:00
Chris Lattner
8be174c089 filecheckize a test and mark these wiht a cpu so it passes
on hosts without cmovs.

llvm-svn: 98521
2010-03-14 22:31:16 +00:00
Chris Lattner
70eca8f78e fix ShrinkDemandedOps to not leave dead nodes around,
fixing PR6607

llvm-svn: 98512
2010-03-14 19:46:02 +00:00
Chris Lattner
95e0a61b2d don't have i386-specific tests in CodeGen/Generic, PR6601.
llvm-svn: 98508
2010-03-14 18:51:18 +00:00
Chris Lattner
c50b8b27f5 fix PR6605, X86ISD::CMP always returns i32 (EFLAGS), not
the operand type.

llvm-svn: 98507
2010-03-14 18:44:35 +00:00
Chris Lattner
9331acc6d7 get MMI out of the label uniquing business, just go to MCContext
to get unique assembler temporary labels.

llvm-svn: 98489
2010-03-14 08:36:50 +00:00
Evan Cheng
7d8c39bb1c Do not force indirect tailcall through fixed registers: eax, r11. Add support to allow loads to be folded to tail call instructions.
llvm-svn: 98465
2010-03-14 03:48:46 +00:00
Chris Lattner
683801add5 simplify code to use OutContext.GetOrCreateTemporarySymbol with
no arguments instead of having to come up with a unique name.
This also makes the code less fragile.

llvm-svn: 98364
2010-03-12 18:47:50 +00:00
Chris Lattner
80ab250a1c fix PR6577, a bug in sdbuilder lowering select instructions
whose true value was not Val#0.

llvm-svn: 98336
2010-03-12 07:15:36 +00:00
Bill Wendling
368a68ac82 revert r98270.
llvm-svn: 98281
2010-03-11 19:50:31 +00:00
Evan Cheng
20dbd70316 Bad bad bug. x86 force indirect tail call address into eax when it's meant to force it into a call preserved register instead. Change it to ecx for now.
llvm-svn: 98270
2010-03-11 18:49:14 +00:00
Evan Cheng
4ef6d8fa15 The check for coalescing a virtual register to a physical register, e.g.
cl = EXTRACT_SUBREG reg1024, 1, is overly conservative. It should check
for overlaps of vr's live interval with the super registers of the
physical register (ECX in this case) and let JoinIntervals() handle checking
the coalescing feasibility against the physical register (cl in this case).

llvm-svn: 98251
2010-03-11 08:20:21 +00:00
Eric Christopher
bbdbe41a97 Have fast-isel understand llvm.objectsize. Update testcase for slightly
different codegen.

llvm-svn: 98244
2010-03-11 06:20:22 +00:00
Chris Lattner
d6d11e53ab add support, testcases, and dox for the new GHC calling
convention.  Patch by David Terei!

llvm-svn: 98212
2010-03-11 00:22:57 +00:00
Chris Lattner
7cd70b8066 fix PR6533 by updating the br(xor) code to remember the case
when it looked past a trunc.

llvm-svn: 98203
2010-03-10 23:46:44 +00:00
Evan Cheng
1f7af27386 Fix typo.
llvm-svn: 98142
2010-03-10 07:07:55 +00:00
Evan Cheng
96e1e20fd5 Unbreak test on Linux.
llvm-svn: 98141
2010-03-10 07:07:45 +00:00
Evan Cheng
668ceddeec Enable machine cse pass.
llvm-svn: 98132
2010-03-10 03:07:41 +00:00
Chris Lattner
99ca33d324 move .set generation out of DwarfPrinter into AsmPrinter and
MCize it.

llvm-svn: 98010
2010-03-08 23:58:37 +00:00
Chris Lattner
10d571f349 simplify EmitSectionOffset to always use .set if it is
available, the only thing this affects is that we produce
.set in one case we didn't before, which shouldn't harm
anything.  Make EmitSectionOffset call EmitDifference
instead of duplicating it.

llvm-svn: 98005
2010-03-08 23:23:25 +00:00
Evan Cheng
cfe037000a Add documentation on sibling call optimization. Rename tailcall2.ll test to sibcall.ll.
llvm-svn: 97980
2010-03-08 21:05:02 +00:00
Charles Davis
faa2f44081 Don't emit global symbols into the (__TEXT,__ustring) section on Darwin. This
is a workaround for <rdar://problem/7672401/> (which I filed).

This let's us build Wine on Darwin, and it gets the Qt build there a little bit
further (so Doug says).

llvm-svn: 97845
2010-03-05 22:28:45 +00:00
Jakob Stoklund Olesen
4e033d2070 Better handling of dead super registers in LiveVariables. We used to do this:
CALL ... %RAX<imp-def>
   ... [not using %RAX]
   %EAX = ..., %RAX<imp-use, kill>
   RET %EAX<imp-use,kill>

Now we do this:

   CALL ... %RAX<imp-def, dead>
   ... [not using %RAX]
   %EAX = ...
   RET %EAX<imp-use,kill>

By not artificially keeping %RAX alive, we lower register pressure a bit.

The correct number of instructions for 2008-08-05-SpillerBug.ll is obviously
55, anybody can see that. Sheesh.

llvm-svn: 97838
2010-03-05 21:49:17 +00:00
Jakob Stoklund Olesen
1fce28720a We don't really care about correct register liveness information after the
post-ra scheduler has run. Disable the verifier checks that late in the game.

llvm-svn: 97837
2010-03-05 21:49:13 +00:00
Jakob Stoklund Olesen
67476519d7 Avoid creating bad PHI instructions when BR is being const-folded.
llvm-svn: 97836
2010-03-05 21:49:10 +00:00
Evan Cheng
eb43cbfc75 Fix an oops in x86 sibcall optimization. If the ByVal callee argument is itself passed as a pointer, then it's obviously not safe to do a tail call.
llvm-svn: 97797
2010-03-05 08:38:04 +00:00
Chris Lattner
80aaccb987 Fix PR6497, a bug where we'd fold a load into an addc
node which has a flag.  That flag in turn was used by an
already-selected adde which turned into an ADC32ri8 which
used a selected load which was chained to the load we
folded.  This flag use caused us to form a cycle.  Fix
this by not ignoring chains in IsLegalToFold even in
cases where the isel thinks it can.

llvm-svn: 97791
2010-03-05 06:19:13 +00:00
Chris Lattner
5804690a45 cleanup
llvm-svn: 97790
2010-03-05 06:17:43 +00:00
Evan Cheng
04b9deff58 Rever 96389 and 96990. They are causing some miscompilation that I do not fully understand.
llvm-svn: 97782
2010-03-05 03:08:23 +00:00
Bill Wendling
a92a5b8a6c Revert r97766. It's deleting a tag.
llvm-svn: 97768
2010-03-05 00:33:59 +00:00
Bill Wendling
55b4dfcd9d Micro-optimization:
This code:

float floatingPointComparison(float x, float y) {
    double product = (double)x * y;
    if (product == 0.0)
        return product;
    return product - 1.0;
}

produces this:

_floatingPointComparison:
0000000000000000        cvtss2sd        %xmm1,%xmm1
0000000000000004        cvtss2sd        %xmm0,%xmm0
0000000000000008        mulsd           %xmm1,%xmm0
000000000000000c        pxor            %xmm1,%xmm1
0000000000000010        ucomisd         %xmm1,%xmm0
0000000000000014        jne             0x00000004
0000000000000016        jp              0x00000002
0000000000000018        jmp             0x00000008
000000000000001a        addsd           0x00000006(%rip),%xmm0
0000000000000022        cvtsd2ss        %xmm0,%xmm0
0000000000000026        ret

The "jne/jp/jmp" sequence can be reduced to this instead:

_floatingPointComparison:
0000000000000000        cvtss2sd        %xmm1,%xmm1
0000000000000004        cvtss2sd        %xmm0,%xmm0
0000000000000008        mulsd           %xmm1,%xmm0
000000000000000c        pxor            %xmm1,%xmm1
0000000000000010        ucomisd         %xmm1,%xmm0
0000000000000014        jp              0x00000002
0000000000000016        je              0x00000008
0000000000000018        addsd           0x00000006(%rip),%xmm0
0000000000000020        cvtsd2ss        %xmm0,%xmm0
0000000000000024        ret

for a savings of 2 bytes.

This xform can happen when we recognize that jne and jp jump to the same "true"
MBB, the unconditional jump would jump to the "false" MBB, and the "true" branch
is the fall-through MBB.

llvm-svn: 97766
2010-03-05 00:24:26 +00:00
Jakob Stoklund Olesen
3408cd6de1 Fix the remaining MUL8 and DIV8 to define AX instead of AL,AH.
These instructions technically define AL,AH, but a trick in X86ISelDAGToDAG
reads AX in order to avoid reading AH with a REX instruction.

Fix PR6489.

llvm-svn: 97742
2010-03-04 20:42:07 +00:00
Dan Gohman
265f85f6d8 Fix recognition of 16-bit bswap for C front-ends which emit the
clobber registers in a different order.

llvm-svn: 97741
2010-03-04 19:58:08 +00:00
Dan Gohman
da13ee1220 Revert r97580; that's not the right way to fix this.
llvm-svn: 97639
2010-03-03 04:36:42 +00:00
Bill Wendling
d1f658563d This test case:
long test(long x) { return (x & 123124) | 3; }

Currently compiles to:

_test:
        orl     $3, %edi
        movq    %rdi, %rax
        andq    $123127, %rax
        ret

This is because instruction and DAG combiners canonicalize

  (or (and x, C), D) -> (and (or, D), (C | D))

However, this is only profitable if (C & D) != 0. It gets in the way of the
3-addressification because the input bits are known to be zero.

llvm-svn: 97616
2010-03-03 00:35:56 +00:00
Chris Lattner
9c9c1158cb Fix some issues in WalkChainUsers dealing with
CopyToReg/CopyFromReg/INLINEASM.  These are annoying because
they have the same opcode before an after isel.  Fix this by
setting their NodeID to -1 to indicate that they are selected,
just like what automatically happens when selecting things that
end up being machine nodes.

With that done, give IsLegalToFold a new flag that causes it to
ignore chains.  This lets the HandleMergeInputChains routine be
the one place that validates chains after a match is successful,
enabling the new hotness in chain processing.  This smarter
chain processing eliminates the need for "PreprocessRMW" in the
X86 and MSP430 backends and enables MSP to start matching it's
multiple mem operand instructions more aggressively.

I currently #if out the dead code in the X86 backend and MSP 
backend, I'll remove it for real in a follow-on patch.

The testcase changes are:
  test/CodeGen/X86/sse3.ll: we generate better code
  test/CodeGen/X86/store_op_load_fold2.ll: PreprocessRMW was 
      miscompiling this before, we now generate correct code
      Convert it to filecheck while I'm at it.
  test/CodeGen/MSP430/Inst16mm.ll: Add a testcase for mem/mem
      folding to make anton happy. :)

llvm-svn: 97596
2010-03-02 22:20:06 +00:00
Dan Gohman
f06941597a When expanding an expression such as (A + B + C + D), sort the operands
by loop depth and emit loop-invariant subexpressions outside of loops.
This speeds up MultiSource/Applications/viterbi and others.

llvm-svn: 97580
2010-03-02 19:32:21 +00:00
Chris Lattner
845db3b26d clean up some testcases.
llvm-svn: 97576
2010-03-02 18:56:03 +00:00
Chris Lattner
2019e2922f Fix the xfail I added a couple of patches back. The issue
was that we weren't properly handling the case when interior
nodes of a matched pattern become dead after updating chain
and flag uses.  Now we handle this explicitly in 
UpdateChainsAndFlags.

llvm-svn: 97561
2010-03-02 07:50:03 +00:00
Chris Lattner
0b41a42411 Rewrite chain handling validation and input TokenFactor handling
stuff now that we don't care about emulating the old broken 
behavior of the old isel.  This eliminates the 
'CheckChainCompatible' check (along with IsChainCompatible) which
did an incorrect and inefficient scan *up* the chain nodes which
happened as the pattern was being formed and does the validation
at the end in HandleMergeInputChains when it forms a structural 
pattern.  This scans "down" the graph, which means that it is
quickly bounded by nodes already selected.  This also handles
token factors that get "trapped" in the dag.

Removing the CheckChainCompatible nodes also shrinks the 
generated tables by about 6K for X86 (down to 83K).

There are two pieces remaining before I can nuke PreprocessRMW:
1. I xfailed a test because we're now producing worse code in a 
   case that has nothing to do with the change: it turns out that
   our use of MorphNodeTo will leave dead nodes in the graph
   which (depending on how the graph is walked) end up causing
   bogus uses of chains and blocking matches.  This is really 
   bad for other reasons, so I'll fix this in a follow-up patch.

2. CheckFoldableChainNode needs to be improved to handle the TF.

llvm-svn: 97539
2010-03-02 02:22:10 +00:00
Dan Gohman
56a20fc5eb Fix several places to handle vector operands properly.
Based on a patch by Micah Villmow for PR6438.

llvm-svn: 97538
2010-03-02 02:14:38 +00:00
Devang Patel
6853e2432e Rewrite test to test VLA using new debug info encoding scheme.
llvm-svn: 97465
2010-03-01 18:30:58 +00:00
Chris Lattner
74db1864da add some random nounwinds.
llvm-svn: 97411
2010-02-28 20:36:49 +00:00
Dan Gohman
0799a41c48 Don't try to replace physical registers when doing CSE.
llvm-svn: 97360
2010-02-28 01:33:43 +00:00
Dan Gohman
3d9622fa97 Add nounwinds.
llvm-svn: 97349
2010-02-27 23:53:53 +00:00
Evan Cheng
94051bc37e Re-apply 97040 with fix. This survives a ppc self-host llvm-gcc bootstrap.
llvm-svn: 97310
2010-02-27 07:36:59 +00:00
Chris Lattner
e4b5559cf8 change the scope node to include a list of children to be checked
instead of to have a chained series of scope nodes.  This makes
the generated table smaller, improves the efficiency of the
interpreter, and make the factoring optimization much more 
reasonable to implement.

llvm-svn: 97160
2010-02-25 19:00:39 +00:00
Dan Gohman
084112437d Revert r97064. Duncan pointed out that bitcasts are defined in
terms of store and load, which means bitcasting between scalar
integer and vector has endian-specific results, which undermines
this whole approach.

llvm-svn: 97137
2010-02-25 15:20:39 +00:00
Dan Gohman
52ed61204b Make LoopSimplify change conditional branches in loop exiting blocks
which branch on undef to branch on a boolean constant for the edge
exiting the loop. This helps ScalarEvolution compute trip counts for
loops.

Teach ScalarEvolution to recognize single-value PHIs, when safe, and
ForgetSymbolicName to forget such single-value PHI nodes as apprpriate
in ForgetSymbolicName.

llvm-svn: 97126
2010-02-25 06:57:05 +00:00
Dan Gohman
424e8f22d0 Make getTypeSizeInBits work correctly for array types; it should return
the number of value bits, not the number of bits of allocation for in-memory
storage.

Make getTypeStoreSize and getTypeAllocSize work consistently for arrays and
vectors.

Fix several places in CodeGen which compute offsets into in-memory vectors
to use TargetData information.

This fixes PR1784.

llvm-svn: 97064
2010-02-24 22:05:23 +00:00
Daniel Dunbar
24c99e027e Speculatively revert r97011, "Re-apply 96540 and 96556 with fixes.", again in
the hopes of fixing PPC bootstrap.

llvm-svn: 97040
2010-02-24 17:05:47 +00:00
Dan Gohman
c0c6077fed When forming SSE min and max nodes for UGE and ULE comparisons, it's
necessary to swap the operands to handle NaN and negative zero properly.

Also, reintroduce logic for checking for NaN conditions when forming
SSE min and max instructions, fixed to take into consideration NaNs and
negative zeros. This allows forming min and max instructions in more
cases.

llvm-svn: 97025
2010-02-24 06:52:40 +00:00
Chris Lattner
52a02205d8 Change the scheduler from adding nodes in allnodes order
to adding them in a determinstic order (bottom up from 
the root) based on the structure of the graph itself.

This updates tests for some random changes, interesting
bits: CodeGen/Blackfin/promote-logic.ll no longer crashes.
I have no idea why, but that's good right?

CodeGen/X86/2009-07-16-LoadFoldingBug.ll also fails, but
now compiles to have one fewer constant pool entry, making
the expected load that was being folded disappear.  Since it
is an unreduced mass of gnast, I just removed it.

This fixes PR6370

llvm-svn: 97023
2010-02-24 06:11:37 +00:00
Evan Cheng
5787cd9349 Re-apply 96540 and 96556 with fixes.
llvm-svn: 97011
2010-02-24 01:42:31 +00:00
Jakob Stoklund Olesen
a946f9eb7d DIV8r must define %AX since X86DAGToDAGISel::Select() sometimes uses it
instead of %AL/%AH.

llvm-svn: 97006
2010-02-24 00:39:35 +00:00
Jakob Stoklund Olesen
3406ec2f57 Remember to handle sub-registers when moving imp-defs to a rematted instruction.
llvm-svn: 96995
2010-02-23 22:44:02 +00:00
Jakob Stoklund Olesen
cf29251712 Keep track of phi join registers explicitly in LiveVariables.
Previously, LiveIntervalAnalysis would infer phi joins by looking for multiply
defined registers. That doesn't work if the phi join is implicitly defined in
all but one of the predecessors.

llvm-svn: 96994
2010-02-23 22:43:58 +00:00
Evan Cheng
8096221984 These should not have been committed.
llvm-svn: 96827
2010-02-22 23:37:48 +00:00
Evan Cheng
d9816ef946 Instcombine constant folding can normalize gep with negative index to index with large offset. When instcombine objsize checking transformation sees these geps where the offset seemingly point out of bound, it should just return "i don't know" rather than asserting.
llvm-svn: 96825
2010-02-22 23:34:00 +00:00
Dan Gohman
97481fb5e0 Canonicalize ConstantInts to the right operand of commutative
operators.

The test difference is just due to the multiplication operands
being commuted (and thus requiring a more elaborate match). In
optimized code, that expression would be folded.

llvm-svn: 96816
2010-02-22 22:43:23 +00:00
Dan Gohman
8f672b95f2 Actually enable the -enable-unsafe-fp-math tests.
llvm-svn: 96796
2010-02-22 18:53:26 +00:00
Arnold Schwaighofer
8427969f9c Mark the return address stack slot as mutable when moving the return address
during a tail call. A parameter might overwrite this stack slot during the tail
call. 

The sequence during a tail call is:
1.) load return address to temp reg
2.) move parameters (might involve storing to return address stack slot)
3.) store return address to new location from temp reg

If the stack location is marked immutable CodeGen can colocate load (1) with the
store (3).

This fixes bug 6225.

llvm-svn: 96783
2010-02-22 16:18:09 +00:00
Dan Gohman
c281a5da15 Remove the logic for reasoning about NaNs from the code that forms
SSE min and max instructions. The real thing this code needs to be
concerned about is negative zero.

Update the sse-minmax.ll test accordingly, and add tests for
-enable-unsafe-fp-math mode as well.

llvm-svn: 96775
2010-02-22 04:03:39 +00:00
Chris Lattner
b328c57aa4 fix and un-xfail X86/vec_ss_load_fold.ll
llvm-svn: 96720
2010-02-21 04:53:34 +00:00
Chris Lattner
b28cf9f8d9 temporarily disable this.
llvm-svn: 96717
2010-02-21 03:24:41 +00:00
Dan Gohman
9db0689627 Check for overflow when scaling up an add or an addrec for
scaled reuse.

llvm-svn: 96692
2010-02-19 19:32:49 +00:00
Charles Davis
a64fc8c41b Add support for the 'alignstack' attribute to the x86 backend. Fixes PR5254.
Also, FileCheck'ize a test.

llvm-svn: 96686
2010-02-19 18:17:13 +00:00
Duncan Sands
5d5cce2e19 Revert commits 96556 and 96640, because commit 96556 breaks the
dragonegg self-host build.  I reverted 96640 in order to revert
96556 (96640 goes on top of 96556), but it also looks like with
both of them applied the breakage happens even earlier.  The
symptom of the 96556 miscompile is the following crash:

  llvm[3]: Compiling AlphaISelLowering.cpp for Release build
  cc1plus: /home/duncan/tmp/tmp/llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp:4982: void llvm::SelectionDAG::ReplaceAllUsesWith(llvm::SDNode*, llvm::SDNode*, llvm::SelectionDAG::DAGUpdateListener*): Assertion `(!From->hasAnyUseOfValue(i) || From->getValueType(i) == To->getValueType(i)) && "Cannot use this version of ReplaceAllUsesWith!"' failed.
  Stack dump:
  0.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@_ZN4llvm19AlphaTargetLowering14LowerOperationENS_7SDValueERNS_12SelectionDAGE'
  g++: Internal error: Aborted (program cc1plus)

This occurs when building LLVM using LLVM built by LLVM (via
dragonegg).  Probably LLVM has miscompiled itself, though it
may have miscompiled GCC and/or dragonegg itself: at this point
of the self-host build, all of GCC, LLVM and dragonegg were built
using LLVM.  Unfortunately this kind of thing is extremely hard
to debug, and while I did rummage around a bit I didn't find any
smoking guns, aka obviously miscompiled code.

Found by bisection.

r96556 | evancheng | 2010-02-18 03:13:50 +0100 (Thu, 18 Feb 2010) | 5 lines

Some dag combiner goodness:
Transform br (xor (x, y)) -> br (x != y)
Transform br (xor (xor (x,y), 1)) -> br (x == y)
Also normalize (and (X, 1) == / != 1 -> (and (X, 1)) != / == 0 to match to "test on x86" and "tst on arm"

r96640 | evancheng | 2010-02-19 01:34:39 +0100 (Fri, 19 Feb 2010) | 16 lines

Transform (xor (setcc), (setcc)) == / != 1 to
(xor (setcc), (setcc)) != / == 1.

e.g. On x86_64
  %0 = icmp eq i32 %x, 0
  %1 = icmp eq i32 %y, 0
  %2 = xor i1 %1, %0
  br i1 %2, label %bb, label %return
=>
	testl   %edi, %edi
	sete    %al
	testl   %esi, %esi
	sete    %cl
	cmpb    %al, %cl
	je      LBB1_2

llvm-svn: 96672
2010-02-19 11:30:41 +00:00
Evan Cheng
32031f7404 Transform (xor (setcc), (setcc)) == / != 1 to
(xor (setcc), (setcc)) != / == 1.

e.g. On x86_64
  %0 = icmp eq i32 %x, 0
  %1 = icmp eq i32 %y, 0
  %2 = xor i1 %1, %0
  br i1 %2, label %bb, label %return
=>
	testl   %edi, %edi
	sete    %al
	testl   %esi, %esi
	sete    %cl
	cmpb    %al, %cl
	je      LBB1_2

llvm-svn: 96640
2010-02-19 00:34:39 +00:00
Dan Gohman
58199e30dc When determining the set of interesting reuse factors, consider
strides in foreign loops. This helps locate reuse opportunities
with existing induction variables in foreign loops and reduces
the need for inserting new ones. This fixes rdar://7657764.

llvm-svn: 96629
2010-02-19 00:05:23 +00:00
Mon P Wang
64cd1a8d7f getSplatIndex assumes that the first element of the mask contains the splat index
which is not always true if the mask contains undefs. Modified it to return
the first non undef value.

llvm-svn: 96621
2010-02-18 22:33:18 +00:00
Jakob Stoklund Olesen
5b9d14b55e Always normalize spill weights, also for intervals created by spilling.
Moderate the weight given to very small intervals.

The spill weight given to new intervals created when spilling was not
normalized in the same way as the original spill weights calculated by
CalcSpillWeights. That meant that restored registers would tend to hang around
because they had a much higher spill weight that unspilled registers.

This improves the runtime of a few tests by up to 10%, and there are no
significant regressions.

llvm-svn: 96613
2010-02-18 21:33:05 +00:00
Dan Gohman
34b5cb7deb Make CodePlacementOpt detect special EH control flow by
checking whether AnalyzeBranch disagrees with the CFG
directly, rather than looking for EH_LABEL instructions.
EH_LABEL instructions aren't always at the end of the
block, due to FP_REG_KILL and other things. This fixes
an infinite loop compiling MultiSource/Benchmarks/Bullet.

llvm-svn: 96611
2010-02-18 21:25:53 +00:00
Chris Lattner
a2e094064f remove empty file
llvm-svn: 96573
2010-02-18 06:29:06 +00:00
Evan Cheng
9af06dfc83 Some dag combiner goodness:
Transform br (xor (x, y)) -> br (x != y)
Transform br (xor (xor (x,y), 1)) -> br (x == y)
Also normalize (and (X, 1) == / != 1 -> (and (X, 1)) != / == 0 to match to "test on x86" and "tst on arm"

llvm-svn: 96556
2010-02-18 02:13:50 +00:00