1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-31 16:02:52 +01:00
Commit Graph

2489 Commits

Author SHA1 Message Date
Eric Christopher
f7579ff174 Expand invalid return values for umulo and smulo. Handle these similarly
to add/sub by doing the normal operation and then checking for overflow
afterwards. This generally relies on the DAG handling the later invalid
operations as well.

Fixes the 64-bit part of rdar://8622122 and rdar://8774702.

llvm-svn: 123908
2011-01-20 08:54:28 +00:00
Benjamin Kramer
869dc645f1 Fix an off-by-one error in ctpop combining.
llvm-svn: 123664
2011-01-17 18:00:28 +00:00
Benjamin Kramer
e9488ed8eb Add a DAGCombine to turn (ctpop x) u< 2 into (x & x-1) == 0.
This shaves off 4 popcounts from the hacked 186.crafty source.

This is enabled even when a native popcount instruction is available. The
combined code is one operation longer but it should be faster nevertheless.

llvm-svn: 123621
2011-01-17 12:04:57 +00:00
Rafael Espindola
9afb7af08a Update tests.
llvm-svn: 123591
2011-01-16 18:02:57 +00:00
Chris Lattner
dde85de90f fix PR8514, a bug where the "heroic" transformation of shift/and
into and/shift would cause nodes to move around and a dangling pointer
to happen.  The code tried to avoid this with a HandleSDNode, but 
got the details wrong.

llvm-svn: 123578
2011-01-16 08:48:11 +00:00
Chris Lattner
24ea7f696e fix PR8981, a crash trying to form a conditional inc with a floating point compare.
llvm-svn: 123560
2011-01-16 02:56:53 +00:00
Chris Lattner
c4d1d86d3e reapply my fix for PR8961 with a tweak to properly handle
multi-instruction sequences like calls.  Many thanks to Jakob for
finding a testcase.

llvm-svn: 123559
2011-01-16 02:27:38 +00:00
Chris Lattner
eba719204c revert my fastisel patch again which apparently still gives the
llvm-gcc-i386-linux-selfhost buildbot heartburn...

llvm-svn: 123431
2011-01-14 06:14:33 +00:00
Chris Lattner
ee950eeb24 reapply r123414 now that the botz are calmed down and the fix is already in.
llvm-svn: 123427
2011-01-14 04:24:28 +00:00
Chris Lattner
349735530b r123414 broke llvm-gcc bootstrap apparently, revert
llvm-svn: 123422
2011-01-14 02:07:32 +00:00
Chris Lattner
5baec05809 fix PR8961 - a fast isel miscompilation where we'd insert a new instruction
after sext's generated for addressing that got folded.  Previously we compiled
test5 into:

_test5:                                 ## @test5
## BB#0:
        movq    -8(%rsp), %rax          ## 8-byte Reload
        movq    (%rdi,%rax), %rdi
        addq    %rdx, %rdi
        movslq  %esi, %rax
        movq    %rax, -8(%rsp)          ## 8-byte Spill
        movq    %rdi, %rax
        ret

which is insane and wrong.  Now we produce:

_test5:                                 ## @test5
## BB#0:
	movslq	%esi, %rax
	movq	(%rdi,%rax), %rax
	addq	%rdx, %rax
	ret

llvm-svn: 123414
2011-01-14 00:01:01 +00:00
Eric Christopher
3821f63f4b Experiment with changing the default 32-bit linux stack alignment to
16 bytes for PR8969. Update all testcases accordingly.

llvm-svn: 123367
2011-01-13 06:47:10 +00:00
Jakob Stoklund Olesen
3987889b61 Try again enabling LiveDebugVariables.
llvm-svn: 123342
2011-01-12 23:36:21 +00:00
Jakob Stoklund Olesen
1f7052b53b The world is not ready for LiveDebugVariables yet.
llvm-svn: 123290
2011-01-11 23:20:33 +00:00
Jakob Stoklund Olesen
d7a523358c Enable LiveDebugVariables by default.
llvm-svn: 123282
2011-01-11 22:45:28 +00:00
Dale Johannesen
cd78621861 Fix PR 8916 (qv for analysis), at least the immediate problem.
There's an inherent tension in DAGCombine between assuming
that things will be put in canonical form, and the Depth
mechanism that disables transformations when recursion gets
too deep.  It would not surprise me if there's a lot of little
bugs like this one waiting to be discovered.  The mechanism
seems fragile and I'd suggest looking at it from a design viewpoint.

llvm-svn: 123191
2011-01-10 21:53:07 +00:00
Evan Cheng
1afd04fc59 Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Evan Cheng
aa16fd02ad Do not model all INLINEASM instructions as having unmodelled side effects.
Instead encode llvm IR level property "HasSideEffects" in an operand (shared
with IsAlignStack). Added MachineInstrs::hasUnmodeledSideEffects() to check
the operand when the instruction is an INLINEASM.

This allows memory instructions to be moved around INLINEASM instructions.

llvm-svn: 123044
2011-01-07 23:50:32 +00:00
Devang Patel
d3ba97949a Speculatively revert r123032.
llvm-svn: 123039
2011-01-07 22:33:41 +00:00
Devang Patel
a52d6c216d Appropriately truncate debug info range in dwarf output.
Enable live debug variables pass.

llvm-svn: 123032
2011-01-07 21:30:41 +00:00
Evan Cheng
ae26b91353 Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit.
llvm-svn: 123015
2011-01-07 19:35:30 +00:00
Benjamin Kramer
a842a10fc1 Try to unbreak the arm buildbot.
llvm-svn: 122999
2011-01-07 11:35:21 +00:00
Duncan Sands
06444485ee Fix the other problem reported in PR8582. Testcase and patch by
Nadav Rotem.

llvm-svn: 122983
2011-01-06 23:45:22 +00:00
Evan Cheng
1a1771584e Use movups to lower memcpy and memset even if it's not fast (like corei7).
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010

llvm-svn: 122955
2011-01-06 07:58:36 +00:00
Evan Cheng
cb39cc2164 Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.

llvm-svn: 122952
2011-01-06 06:52:41 +00:00
Evan Cheng
70711ea54d Revert r122936. I'll re-implement the change.
llvm-svn: 122949
2011-01-06 06:17:53 +00:00
Bill Wendling
b3bf7cd562 Fix test to coincide with r122934 change from PR8919.
llvm-svn: 122937
2011-01-06 01:09:35 +00:00
Evan Cheng
d425aa5d2a r105228 reduced the memcpy / memset inline limit to 4 with -Os to avoid blowing
up freebsd bootloader. However, this doesn't make much sense for Darwin, whose
-Os is meant to optimize for size only if it doesn't hurt performance.
rdar://8821501

llvm-svn: 122936
2011-01-06 01:04:47 +00:00
Evan Cheng
2af40ae781 Avoid zero extend bit test operands to pointer type if all the masks fit in
the original type of the switch statement key.
rdar://8781238

llvm-svn: 122935
2011-01-06 01:02:44 +00:00
Evan Cheng
bf92316fab Optimize:
r1025 = s/zext r1024, 4
  r1026 = extract_subreg r1025, 4
to:
  r1026 = copy r1024

llvm-svn: 122925
2011-01-05 23:06:49 +00:00
Chris Lattner
3ef9db5cd4 fix PR8900, a shuffle miscompilation. Patch by Nadav Rotem!
llvm-svn: 122921
2011-01-05 22:28:46 +00:00
Evan Cheng
25f7df1bce Use pushq / popq instead of subq $8, %rsp / addq $8, %rsp to adjust stack in
prologue and epilogue if the adjustment is 8. Similarly, use pushl / popl if
the adjustment is 4 in 32-bit mode.

In the epilogue, takes care to pop to a caller-saved register that's not live
at the exit (either return or tailcall instruction).
rdar://8771137

llvm-svn: 122783
2011-01-03 22:53:22 +00:00
Benjamin Kramer
a58b69aa9d Try to reuse the value when lowering memset.
This allows us to compile:
  void test(char *s, int a) {
    __builtin_memset(s, a, 15);
  }
into 1 mul + 3 stores instead of 3 muls + 3 stores.

llvm-svn: 122710
2011-01-02 19:57:05 +00:00
Benjamin Kramer
38491f47ce Lower the i8 extension in memset to a multiply instead of a potentially long series of shifts and ors.
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.

Example code:
  void test(char *s, int a) {
      __builtin_memset(s, a, 4);
  }
before:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    movl  %eax, %ecx
    shll  $8, %ecx
    orl %eax, %ecx
    movl  %ecx, %eax
    shll  $16, %eax
    orl %ecx, %eax
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret
after:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    imull $16843009, %eax, %eax   ## imm = 0x1010101
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret

llvm-svn: 122707
2011-01-02 19:44:58 +00:00
Rafael Espindola
2600dec19b Fix darwin bots.
llvm-svn: 122672
2011-01-01 21:58:41 +00:00
Rafael Espindola
55f7a5057d Add support for the 'H' modifier.
llvm-svn: 122667
2011-01-01 20:58:46 +00:00
NAKAMURA Takumi
e16c40416e test/CodeGen/X86/negative-sin.ll: FileCheck-ize.
llvm-svn: 122619
2010-12-29 03:58:47 +00:00
NAKAMURA Takumi
d66bcf9ada test/CodeGen/X86/fp-in-intregs.ll: FileCheck-ize.
llvm-svn: 122618
2010-12-29 03:58:36 +00:00
Benjamin Kramer
49942a90b7 DAGCombine add (sext i1), X into sub X, (zext i1) if sext from i1 is illegal. The latter usually compiles into smaller code.
example code:
unsigned foo(unsigned x, unsigned y) {
  if (x != 0) y--;
  return y;
}

before:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    sbbl  %eax, %eax              ## encoding: [0x19,0xc0]
    notl  %eax                    ## encoding: [0xf7,0xd0]
    addl  8(%esp), %eax           ## encoding: [0x03,0x44,0x24,0x08]
    ret                           ## encoding: [0xc3]

after:
  _foo:                           ## @foo
    cmpl  $1, 4(%esp)             ## encoding: [0x83,0x7c,0x24,0x04,0x01]
    movl  8(%esp), %eax           ## encoding: [0x8b,0x44,0x24,0x08]
    adcl  $-1, %eax               ## encoding: [0x83,0xd0,0xff]
    ret                           ## encoding: [0xc3]

llvm-svn: 122455
2010-12-22 23:17:45 +00:00
Benjamin Kramer
d8387aa9bd X86: Lower a select directly to a setcc_carry if possible.
int test(unsigned long a, unsigned long b) { return -(a < b); }
compiles to
  _test:                              ## @test
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    sbbl  %eax, %eax                  ## encoding: [0x19,0xc0]
    ret                               ## encoding: [0xc3]
instead of
  _test:                              ## @test
    xorl  %ecx, %ecx                  ## encoding: [0x31,0xc9]
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    movl  $-1, %eax                   ## encoding: [0xb8,0xff,0xff,0xff,0xff]
    cmovael %ecx, %eax                ## encoding: [0x0f,0x43,0xc1]
    ret                               ## encoding: [0xc3]

llvm-svn: 122451
2010-12-22 23:09:28 +00:00
Chris Lattner
04ef853e23 Fix a bug in ReduceLoadWidth that wasn't handling extending
loads properly.  We miscompiled the testcase into:

_test:                                  ## @test
	movl	$128, (%rdi)
	movzbl	1(%rdi), %eax
	ret

Now we get a proper:

_test:                                  ## @test
	movl	$128, (%rdi)
	movsbl	(%rdi), %eax
	movzbl	%ah, %eax
	ret

This fixes PR8757.

llvm-svn: 122392
2010-12-22 08:02:57 +00:00
Dale Johannesen
e0fb87c3d7 Reapply 122353-122355 with fixes. 122354 was wrong;
the shift type was needed one place, the shift count
type another.  The transform in 123555 had the same
problem.

llvm-svn: 122366
2010-12-21 21:55:50 +00:00
Benjamin Kramer
369872edfc Add some x86 specific dagcombines for conditional increments.
(add Y, (sete  X, 0)) -> cmp X, 1; adc  0, Y
(add Y, (setne X, 0)) -> cmp X, 1; sbb -1, Y
(sub (sete  X, 0), Y) -> cmp X, 1; sbb  0, Y
(sub (setne X, 0), Y) -> cmp X, 1; adc -1, Y

for
  unsigned foo(unsigned a, unsigned b) {
    if (a == 0) b++;
    return b;
  }
we now get:
  foo:
    cmpl  $1, %edi
    movl  %esi, %eax
    adcl  $0, %eax
    ret
instead of:
  foo:
    testl %edi, %edi
    sete  %al
    movzbl  %al, %eax
    addl  %esi, %eax
    ret

llvm-svn: 122364
2010-12-21 21:41:44 +00:00
Dale Johannesen
972aba543a Revert 122353-122355 for the moment, they broke stuff.
llvm-svn: 122360
2010-12-21 21:22:27 +00:00
Dale Johannesen
39186cfb0b Add a new transform to DAGCombiner.
llvm-svn: 122355
2010-12-21 20:10:51 +00:00
Dale Johannesen
5f3e7b08f6 Get the type of a shift from the shift, not from its shift
count operand.  These should be the same but apparently are
not always, and this is cleaner anyway.  This improves the
code in an existing test.

llvm-svn: 122354
2010-12-21 20:06:19 +00:00
Dale Johannesen
036c3da142 Cosmetic changes.
llvm-svn: 122259
2010-12-20 20:10:50 +00:00
Chris Lattner
bee7320c3c now that addc/adde are gone, "ADDC" in the X86 backend uses EFLAGS results,
the same as setcc.  Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS).  This is
a step towards finishing off PR5443.  In the testcase in that bug we now  get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbq	%rcx, %rcx
	testb	$1, %cl
	setne	%dl
	ret

instead of:

	movq	%rdi, %rax
	addq	%rsi, %rax
	movl	$0, %ecx
	adcq	$0, %rcx
	testq	%rcx, %rcx
	setne	%dl
	ret

llvm-svn: 122219
2010-12-20 01:37:09 +00:00
Chris Lattner
2d4e17d195 We lower setb to sbb with the hope that the and will go away, when it
doesn't, match it back to setb.

On a 64-bit version of the testcase before we'd get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbb	%dl, %dl
	andb	$1, %dl
	ret

now we get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	setb	%dl
	ret

llvm-svn: 122217
2010-12-20 01:16:03 +00:00
Mon P Wang
666259546c Add comment for testcase for 122206
llvm-svn: 122210
2010-12-20 00:54:26 +00:00
Mon P Wang
d3adab7a64 Prevents PerformShuffleCombine from creating a node with an illegal type after legalize types
has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type.

llvm-svn: 122206
2010-12-19 23:55:53 +00:00
Chris Lattner
297259f6f1 improve the setcc -> setcc_carry optimization to happen more
consistently by moving it out of lowering into dag combine.

Add some missing patterns for matching away extended versions of setcc_c.

llvm-svn: 122201
2010-12-19 22:08:31 +00:00
Chris Lattner
ad85635a93 now that generic vector types aren't selected onto MMX registers, these
tests don't need -disable-mmx.

llvm-svn: 122188
2010-12-19 20:12:58 +00:00
Chris Lattner
ac82ea26da fix PR8642: if a critical edge has a PHI value that can trap,
isel is *required* to split the edge.  PHI values get evaluated
on the edge, not in their predecessor block.

llvm-svn: 122170
2010-12-19 04:58:57 +00:00
Benjamin Kramer
84d3e6cfd0 Just rename the functions, relying on matching a instruction that has the same name as a symbol is way too fragile.
llvm-svn: 122154
2010-12-18 14:23:57 +00:00
Benjamin Kramer
4d591385d1 Test more than just label names and make test work on non-x86 hosts.
llvm-svn: 122153
2010-12-18 14:07:28 +00:00
Nate Begeman
ef5f3c0fa7 Add support for matching psign & plendvb to the x86 target
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts

llvm-svn: 122098
2010-12-17 22:55:37 +00:00
Dale Johannesen
c2c6ebd82a Add a transform to DAG Combiner. This improves the
code for the case where 32-bit divide by constant is
turned into 64-bit multiply by constant.  8771012.

llvm-svn: 122090
2010-12-17 21:45:49 +00:00
Evan Cheng
68e1ed8752 Teach machine cse to commute instructions.
llvm-svn: 121903
2010-12-15 22:16:21 +00:00
Chris Lattner
81815cd4db take care of some todos, transforming [us]mul_lohi into
a wider mul if the wider mul is legal.

llvm-svn: 121848
2010-12-15 06:04:19 +00:00
Chris Lattner
3bec2e7d0d merge two tests
llvm-svn: 121847
2010-12-15 05:58:59 +00:00
Evan Cheng
7e96e67d98 Fix a minor bug in two-address pass. It was missing a commute opportunity.
regB = move RCX
regA = op regB, regC
RAX  = move regA
where both regB and regC are killed. If regB is constrainted to non-compatible
physical registers but regC is not constrainted at all, then it's better to
commute the instruction.
       movl    %edi, %eax
       shlq    $32, %rcx
       leaq    (%rcx,%rax), %rax
=>
       movl    %edi, %eax
       shlq    $32, %rcx
       orq     %rcx, %rax
rdar://8762995

llvm-svn: 121793
2010-12-14 21:34:53 +00:00
Chris Lattner
d17cbf803b rename test
llvm-svn: 121697
2010-12-13 08:39:40 +00:00
Chris Lattner
14810c808b Add a couple dag combines to transform mulhi/mullo into a wider multiply
when the wider type is legal.  This allows us to compile:

define zeroext i16 @test1(i16 zeroext %x) nounwind {
entry:
	%div = udiv i16 %x, 33
	ret i16 %div
}

into:

test1:                                  # @test1
	movzwl	4(%esp), %eax
	imull	$63551, %eax, %eax      # imm = 0xF83F
	shrl	$21, %eax
	ret

instead of:

test1:                                  # @test1
        movw    $-1985, %ax             # imm = 0xFFFFFFFFFFFFF83F
        mulw    4(%esp)
        andl    $65504, %edx            # imm = 0xFFE0
        movl    %edx, %eax
        shrl    $5, %eax
        ret

Implementing rdar://8760399 and example #4 from:
http://blog.regehr.org/archives/320

We should implement the same thing for [su]mul_hilo, but I don't
have immediate plans to do this.

llvm-svn: 121696
2010-12-13 08:39:01 +00:00
Nate Begeman
cb6d1c8193 Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
llvm-svn: 121439
2010-12-10 00:26:57 +00:00
Eric Christopher
0e40452eb0 Rewrite the darwin tlv support to use a chain and return to copying
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.

llvm-svn: 121358
2010-12-09 06:25:53 +00:00
Eric Christopher
0100a8fda4 Remove extraneous copy from DAG conversion for darwin tls. This was
popping up at O0 when it wasn't folded and the fast allocator would
complain.

llvm-svn: 121330
2010-12-09 00:27:58 +00:00
Eric Christopher
d601b8288f Move this test to tlv* to make it easier to notice versus linux tls
support.

llvm-svn: 121316
2010-12-08 23:33:23 +00:00
Devang Patel
6fe7fe8dd4 If dbg_declare() or dbg_value() is not lowered by isel then emit DEBUG message instead of creating DBG_VALUE for undefined value in reg0.
llvm-svn: 121059
2010-12-06 22:39:26 +00:00
Rafael Espindola
4ec917db9b Revert previous two patches while I try to find out how to make both
linux and darwin assemblers happy :-(

llvm-svn: 121004
2010-12-06 15:35:15 +00:00
Rafael Espindola
ad6219b193 Update test for the extra =.
llvm-svn: 121001
2010-12-06 15:05:36 +00:00
Chris Lattner
e30adfb732 Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags
result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.

llvm-svn: 120936
2010-12-05 07:49:54 +00:00
Chris Lattner
76601e7a99 it turns out that when ".with.overflow" intrinsics were added to the X86
backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935
2010-12-05 07:30:36 +00:00
Chris Lattner
9b4b9e751a fix the rest of the linux miscompares :)
llvm-svn: 120933
2010-12-05 02:08:07 +00:00
Chris Lattner
16bafb2414 generalize the previous check to handle -1 on either side of the
select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

llvm-svn: 120932
2010-12-05 02:00:51 +00:00
Chris Lattner
e1c32a116b relax this to handle linux defaulting to -static.
llvm-svn: 120930
2010-12-05 01:31:13 +00:00
Chris Lattner
474ed0aa9b Improve an integer select optimization in two ways:
1. generalize 
    (select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
    (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y

2. Handle the identical pattern that happens with !=:
   (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y

cmov is often high latency and can't fold immediates or
memory operands.  For example for (x == 0) ? -1 : 1, before 
we got:

< 	testb	%sil, %sil
< 	movl	$-1, %ecx
< 	movl	$1, %eax
< 	cmovel	%ecx, %eax

now we get:

> 	cmpb	$1, %sil
> 	sbbl	%eax, %eax
> 	orl	$1, %eax

llvm-svn: 120929
2010-12-05 01:23:24 +00:00
Chris Lattner
c383be807e merge some tests into select.ll and make them more specific.
llvm-svn: 120928
2010-12-05 01:13:58 +00:00
Chris Lattner
7e0a594633 rename test
llvm-svn: 120927
2010-12-05 01:02:23 +00:00
Chris Lattner
1a760f6247 remove two tests that aren't really testing anything.
llvm-svn: 120926
2010-12-05 01:02:13 +00:00
Benjamin Kramer
851691ddb2 Add patterns for the x86 popcnt instruction.
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
  supports SSE4.2 (nehalem) or SSE4A (barcelona).

llvm-svn: 120917
2010-12-04 20:32:23 +00:00
Devang Patel
dad3193123 Hide tests, that check .loc, .file in output assembly, from darwin9 buildbot.
llvm-svn: 120750
2010-12-02 23:29:58 +00:00
Devang Patel
822facd787 Use set directive for StartMinusEndExpr.
This is a fix for llvm-gcc-i386-darwin9 buildbot failure.

llvm-svn: 120742
2010-12-02 21:32:30 +00:00
Evan Cheng
402157b66e Fix test.
llvm-svn: 120730
2010-12-02 20:17:34 +00:00
Evan Cheng
4118b24aca Fix and re-enable tail call optimization of expanded libcalls.
llvm-svn: 120622
2010-12-01 22:59:46 +00:00
Evan Cheng
84162760b7 Speculatively disable x86 portion of r120501 to appease the x86_64 buildbot.
llvm-svn: 120549
2010-12-01 03:27:20 +00:00
Evan Cheng
f7e586d749 Enable sibling call optimization of libcalls which are expanded during
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777

llvm-svn: 120501
2010-11-30 23:55:39 +00:00
Eric Christopher
990bcd83b8 Not all platforms use _<func>. Duh.
llvm-svn: 120418
2010-11-30 09:23:54 +00:00
Eric Christopher
f27f0b5234 Rewrite mwait and monitor support and custom lower arguments.
Fixes PR8573.

llvm-svn: 120404
2010-11-30 07:20:12 +00:00
Benjamin Kramer
84bf47f2d8 Fix some broken CHECK lines.
llvm-svn: 120332
2010-11-29 22:34:55 +00:00
Rafael Espindola
45cd9713f2 Lower TLS_addr32 and TLS_addr64.
llvm-svn: 120225
2010-11-27 20:43:02 +00:00
Benjamin Kramer
632a91cba5 Implement the "if (X == 6 || X == 4)" -> "if ((X|2) == 6)" optimization.
This currently only catches the most basic case, a two-case switch, but can be
extended later.

llvm-svn: 119964
2010-11-22 09:45:38 +00:00
Dale Johannesen
6399550f2f Prefetch has a MemOperand now. FileCheckize a test.
This finishes up 8460971.

llvm-svn: 119848
2010-11-19 21:49:38 +00:00
Mon P Wang
4965983b22 Make isScalarToVector to return false if the node is a scalar. This will prevent
DAGCombine from making an illegal transformation of bitcast of a scalar to a
vector into a scalar_to_vector.

llvm-svn: 119819
2010-11-19 19:08:12 +00:00
Duncan Sands
a61bc1a41a The DAGCombiner was threading select over pairs of extending loads even
if the extension types were not the same.  The result was that if you
fed a select with sext and zext loads, as in the testcase, then it
would get turned into a zext (or sext) of the select, which is wrong
in the cases when it should have been an sext (resp. zext).  Reported
and diagnosed by Sebastien Deldon.

llvm-svn: 119728
2010-11-18 20:05:18 +00:00
Rafael Espindola
93a07b464e Change CodeGen to use .loc directives. This produces a lot more readable output
and testing is easier.  A good example is the unknown-location.ll test that
now can just look for ".loc 1 0 0".  We also don't use a DW_LNE_set_address for
every address change anymore.

llvm-svn: 119613
2010-11-18 02:04:25 +00:00
Dale Johannesen
06f479d543 Do not throw away alignment when generating the DAG for
memset; we may need it to decide between MOVAPS and MOVUPS
later.  Adjust a test that was looking for wrong code.
PR 3866 / 8675131.

llvm-svn: 119605
2010-11-18 01:35:23 +00:00
John Thompson
8c52bd2004 Fixed to use input redirection for source - to eliminate .s output.
llvm-svn: 119599
2010-11-18 00:50:20 +00:00
John Thompson
b33f935bc3 Bug 8621 fix - pointer cast stripped from inline asm constraint argument.
llvm-svn: 119590
2010-11-17 23:58:47 +00:00
Peter Collingbourne
4ec6b2dbe7 Recognise 32-bit ror-based bswap implementation used by uclibc
llvm-svn: 119007
2010-11-13 19:54:30 +00:00
Dan Gohman
8e986f9c2f Remove the memmove->memcpy optimization from CodeGen. MemCpyOpt does this.
llvm-svn: 118789
2010-11-11 16:24:49 +00:00
Duncan Sands
a6d539ccbc Testcase for PR8211 (llc crash at -O0).
llvm-svn: 118509
2010-11-09 16:22:27 +00:00
Dan Gohman
903935bf3e Fix DAGCombiner to avoid folding a sext-in-reg or similar through a shl
in order to fold it into a load.

llvm-svn: 118471
2010-11-09 01:54:35 +00:00
Dan Gohman
f78cd9006b Delete an extraneous svn:executable property.
llvm-svn: 118470
2010-11-09 01:51:06 +00:00
Dale Johannesen
88f85df7f7 Fix an inline asm pasto from 117667; was preventing
{i64, i64} from matching i128.

llvm-svn: 118465
2010-11-09 01:15:07 +00:00
Chris Lattner
0d6d870628 go to great lengths to work around a GAS bug my previous patch
exposed:

GAS doesn't accept "fcomip %st(1)", it requires "fcomip %st(1), %st(0)"
even though st(0) is implicit in all other fp stack instructions.

Fortunately, there is an alias for fcomip named "fcompi" and gas does
accept the default argument for the alias (boggle!).

As such, switch the canonical form of this instruction to "pi" instead
of "ip".  This makes the code generator and disassembler generate pi,
avoiding the gas bug.

llvm-svn: 118356
2010-11-06 21:37:06 +00:00
Dale Johannesen
74ebf1b011 This test assumes SSE is present; that is not the default
on non-X86 hosts.  Hopefully fixes ppc-host buildbot.

llvm-svn: 118182
2010-11-03 18:08:41 +00:00
Dan Gohman
8071a75d31 Fix DAGCombiner to avoid going into an infinite loop when it
encounters (and:i64 (shl:i64 (load:i64), 1), 0xffffffff).
This fixes rdar://8606584.

llvm-svn: 118143
2010-11-03 01:47:46 +00:00
John Thompson
4255425219 Inline asm mult-alt constraint tests.
llvm-svn: 118107
2010-11-02 23:01:44 +00:00
Devang Patel
efd9ac540a Use frameindex, if available, as a last resort to emit debug info for a parameter.
llvm-svn: 118020
2010-11-02 17:01:30 +00:00
Dale Johannesen
b78530f9b0 Fix pastos in handling of AVX cvttsd2si, PR8491.
Bruno, please review, but I'm pretty sure this is right.
Patch by Alex Mac!

llvm-svn: 117514
2010-10-28 00:35:54 +00:00
Dale Johannesen
2566d39d07 An stdcall function calling a non-stdcall function
cannot use tailcall.  PR 8461.

llvm-svn: 117322
2010-10-25 22:17:05 +00:00
Michael J. Spencer
87c8212d41 X86: Emit _fltused instead of __fltused on Windows x64.
llvm-svn: 117205
2010-10-23 09:06:59 +00:00
Evan Cheng
9a06e4c7c8 More accurate estimate / tracking of register pressure.
- Initial register pressure in the loop should be all the live defs into the
  loop. Not just those from loop preheader which is often empty.
- When an instruction is hoisted, update register pressure from loop preheader
  to the original BB.
- Treat only use of a virtual register as kill since the code is still SSA.

llvm-svn: 116956
2010-10-20 22:03:58 +00:00
Evan Cheng
1c8dafd12a Re-enable register pressure aware machine licm with fixes. Hoist() may have
erased the instruction during LICM so UpdateRegPressureAfter() should not
reference it afterwards.

llvm-svn: 116845
2010-10-19 18:58:51 +00:00
Daniel Dunbar
6ff550c84d Revert r116781 "- Add a hook for target to determine whether an instruction def
is", which breaks some nightly tests.

llvm-svn: 116816
2010-10-19 17:14:24 +00:00
Evan Cheng
9c3f6f486e - Add a hook for target to determine whether an instruction def is
"long latency" enough to hoist even if it may increase spilling. Reloading
  a value from spill slot is often cheaper than performing an expensive
  computation in the loop. For X86, that means machine LICM will hoist
  SQRT, DIV, etc. ARM will be somewhat aggressive with VFP and NEON
  instructions.
- Enable register pressure aware machine LICM by default.

llvm-svn: 116781
2010-10-19 00:55:07 +00:00
Michael J. Spencer
e57b670425 X86-Windows: Emit an undefined global __fltused symbol when targeting Windows
if any floating point arguments are passed to an external function.

llvm-svn: 116665
2010-10-16 08:25:41 +00:00
Rafael Espindola
b1ae74bd73 Fix another case where we were preferring instructions with large
immediates instead of 8 bits ones.

llvm-svn: 116410
2010-10-13 17:14:25 +00:00
Rafael Espindola
ff7f11c151 Fix PR8365 by adding a more specialized Pat that checks if an 'and' with
8 bit constants can be used.

llvm-svn: 116403
2010-10-13 13:31:20 +00:00
Eric Christopher
efbd4146d8 FileCheckize this in a hope to quiet a valgrind warning on grep.
llvm-svn: 116376
2010-10-12 23:47:58 +00:00
Andrew Trick
5361f35978 PR8297
llvm-svn: 116223
2010-10-11 21:08:42 +00:00
Andrew Trick
5704a15e36 Fixes bug 8297: i386 cmpxchg8b, missing MachineMemOperand
llvm-svn: 116214
2010-10-11 19:02:04 +00:00
Michael J. Spencer
8f7251a0e9 X86: MinGW should always use libgcc on Windows.
llvm-svn: 116177
2010-10-10 23:11:06 +00:00
Michael J. Spencer
b13cb4dd72 X86: Call _alldiv instead of __divdi3 on Windows (excluding cygwin).
llvm-svn: 116174
2010-10-10 22:04:34 +00:00
Cameron Esfahani
664317d6cd Recommit 116056, now with the missing file...
llvm-svn: 116083
2010-10-08 19:24:18 +00:00
Andrew Trick
0d8a3d67e0 reverting 116056: win64_params.ll may need to be conditionalized?
llvm-svn: 116063
2010-10-08 17:22:42 +00:00
Cameron Esfahani
a9f8bb1356 Small patch to restore home register stack space allocation for the Win64 case. Add test case. This code eventually needs to be tighter, since it's always allocating it, even in leaf routines.
llvm-svn: 116056
2010-10-08 10:31:30 +00:00
Chris Lattner
761f40f7ee testcase that goes with r116053
llvm-svn: 116054
2010-10-08 05:12:30 +00:00
Chris Lattner
c77216343c rename test
llvm-svn: 116052
2010-10-08 05:05:06 +00:00
Chris Lattner
c694d80729 merge tests
llvm-svn: 116051
2010-10-08 05:04:58 +00:00
Chris Lattner
79fbe67839 filecheckize.
llvm-svn: 116050
2010-10-08 05:02:29 +00:00
Chris Lattner
82ce325f16 reapply: Use the new TB_NOT_REVERSABLE flag instead of special
reapply: reimplement the second half of the or/add optimization.  We should now

with no changes.  Turns out that one missing "Defs = [EFLAGS]" can upset things
a bit.

llvm-svn: 116040
2010-10-08 03:57:25 +00:00
Daniel Dunbar
983fae5a86 Revert "reimplement the second half of the or/add optimization. We should now",
which depends on r116007, which I am about to revert.

llvm-svn: 116031
2010-10-08 02:07:26 +00:00
Chris Lattner
7577cb7b49 reimplement the second half of the or/add optimization. We should now
only end up emitting LEA instead of OR.  If we aren't able to promote
something into an LEA, we should never be emitting it as an ADD.

Add some testcases that we emit "or" in cases where we used to produce
an "add".

llvm-svn: 116026
2010-10-08 01:05:10 +00:00
Chris Lattner
bd89233607 convert cmp to use a multipattern
llvm-svn: 115978
2010-10-07 20:56:25 +00:00
Evan Cheng
7c89d70f27 Canonicalize X86ISD::MOVDDUP nodes to v2f64 to make sure all cases match. Also eliminate unneeded isel patterns. rdar://8520311
llvm-svn: 115977
2010-10-07 20:50:20 +00:00
Bill Wendling
4938e4d00a PSHUFW is in SSE, not SSSE3.
llvm-svn: 115691
2010-10-05 21:58:12 +00:00
Owen Anderson
20b48697cd Use a more efficient lowering of uint64_t --> float that can take advantage of hardware signed integer conversion without
having to do a double cast (uint64_t --> double --> float).  This is based on the algorithm from compiler_rt's __floatundisf
for X86-64.

llvm-svn: 115634
2010-10-05 17:24:05 +00:00
NAKAMURA Takumi
33e956ce63 test/CodeGen/X86/atomic_op.ll: Rename @main to @func. Extra sequences will be inserted to @main as prologue on cygming, to fail.
llvm-svn: 115611
2010-10-05 11:16:24 +00:00
Anton Korobeynikov
f1acea8615 va_args support for Win64.
Patch by Cameron!

llvm-svn: 115480
2010-10-03 22:52:07 +00:00
Anton Korobeynikov
31b3b2ca41 Properly emit stack probe on win64 (for non-mingw targets).
Based on the patch by Cameron Esfahani!

llvm-svn: 115479
2010-10-03 22:02:38 +00:00
Chris Lattner
f58652610d unbreak buildbot
llvm-svn: 115476
2010-10-03 20:02:48 +00:00
Bill Wendling
f1552256a1 Add test to make sure that the MMX intrinsic calls make it out the other end in
tact.

llvm-svn: 115458
2010-10-03 03:30:30 +00:00
Bill Wendling
1f47b01f08 Need to specify SSE4 for machines which don't have SSE4. The code checked for is generated by SSE4. Otherwise, we get something else.
llvm-svn: 115352
2010-10-01 21:39:35 +00:00
Bill Wendling
89da1661a3 We must check for something.
llvm-svn: 115309
2010-10-01 10:20:10 +00:00
Bill Wendling
a5e9e55778 Disable tests until I can figure out why they're failing on just two machines but not others.
llvm-svn: 115308
2010-10-01 10:01:10 +00:00
Bill Wendling
18960d3eaa Try adding an mtriple.
llvm-svn: 115307
2010-10-01 09:40:50 +00:00
Bill Wendling
7cb3d0c43e FileCheck-ize this test.
llvm-svn: 115304
2010-10-01 08:55:48 +00:00
Bill Wendling
7b19f58e1c FileCheck-ize this test.
llvm-svn: 115303
2010-10-01 08:50:12 +00:00
Chris Lattner
01c6e93ea4 fix rdar://8494845 + PR8244 - a miscompile exposed by my patch in r101350
llvm-svn: 115294
2010-10-01 05:36:09 +00:00
Dale Johannesen
d2ab5dccdc One more +sse2.
llvm-svn: 115293
2010-10-01 05:08:18 +00:00
Dale Johannesen
4d1ca2ab29 Mark all these as needing SSE2. Should fix PPC and
maybe even Linux.

llvm-svn: 115291
2010-10-01 04:17:55 +00:00
Dale Johannesen
c9a4261060 Disable these tests for now; it's not obvious why they fail on Linux.
llvm-svn: 115257
2010-10-01 00:59:21 +00:00
Dale Johannesen
51bf690108 Make test not sensitive to register choice.
llvm-svn: 115250
2010-10-01 00:16:17 +00:00
Dale Johannesen
c14a1eda84 Massive rewrite of MMX:
The x86_mmx type is used for MMX intrinsics, parameters and
return values where these use MMX registers, and is also
supported in load, store, and bitcast.

Only the above operations generate MMX instructions, and optimizations
do not operate on or produce MMX intrinsics. 

MMX-sized vectors <2 x i32> etc. are lowered to XMM or split into
smaller pieces.  Optimizations may occur on these forms and the
result casted back to x86_mmx, provided the result feeds into a
previous existing x86_mmx operation.

The point of all this is prevent optimizations from introducing
MMX operations, which is unsafe due to the EMMS problem.

llvm-svn: 115243
2010-09-30 23:57:10 +00:00
NAKAMURA Takumi
66978c202a test/CodeGen/X86/sibcall.ll: Add explicit triplets and remove XFAIL: apple-darwin8.
llvm-svn: 115215
2010-09-30 22:02:06 +00:00
Jakob Stoklund Olesen
603950c4af Try again to disable critical edge splitting in CodeGenPrepare.
The bug that broke i386 linux has been fixed in r115191.

llvm-svn: 115204
2010-09-30 20:51:52 +00:00
Jakob Stoklund Olesen
8f4d623f9a When isel is emitting instructions for an x86 target without CMOV, the CFG is
edited during emission.

If the basic block ends in a switch that gets lowered to a jump table, any
phis at the default edge were getting updated wrong. The jump table data
structure keeps a pointer to the header blocks that wasn't getting updated
after the MBB is split.

This bug was exposed on 32-bit Linux when disabling critical edge splitting in
codegen prepare.

The fix is to uipdate stale MBB pointers whenever a block is split during
emission.

llvm-svn: 115191
2010-09-30 19:44:31 +00:00
Bill Wendling
b01224f53c And remove r114997's test.
llvm-svn: 115003
2010-09-28 23:24:18 +00:00
Bill Wendling
fc29f0d706 Revert r114997. It was causing a failure on darwin10-selfhost.
llvm-svn: 115002
2010-09-28 23:11:55 +00:00
Bill Wendling
13775375ed Fix a FIXME. _foo.eh symbols are currently always exported so that the linker
knows about them. This is not necessary on 10.6 and later.

llvm-svn: 114997
2010-09-28 22:36:56 +00:00
Jakob Stoklund Olesen
cfed90fe40 Revert "Disable codegen prepare critical edge splitting. Machine instruction passes now"
This reverts revision 114633. It was breaking llvm-gcc-i386-linux-selfhost.

It seems there is a downstream bug that is exposed by
-cgp-critical-edge-splitting=0. When that bug is fixed, this patch can go back
in.

Note that the changes to tailcallfp2.ll are not reverted. They were good are
required.

llvm-svn: 114859
2010-09-27 18:43:48 +00:00
Evan Cheng
1493b1799e Disable codegen prepare critical edge splitting. Machine instruction passes now
break critical edges on demand.

llvm-svn: 114633
2010-09-23 06:55:34 +00:00
Owen Anderson
7d6373ea9d A select between a constant and zero, when fed by a bit test, can be efficiently
lowered using a series of shifts.
Fixes <rdar://problem/8285015>.

llvm-svn: 114599
2010-09-22 22:58:22 +00:00
Cameron Esfahani
662193bddc Fix PR8201: Update the code to call via X86::CALL64pcrel32 in the 64-bit case.
llvm-svn: 114597
2010-09-22 22:35:21 +00:00
Chris Lattner
1864d6728d Fix an inconsistency in the x86 backend that led it to reject "calll foo" on
x86-32: 32-bit calls were named "call" not "calll".  64-bit calls were correctly
named "callq", so this only impacted x86-32.

This fixes rdar://8456370 - llvm-mc rejects 'calll'

This also exposes that mingw/64 is generating a 32-bit call instead of a 64-bit call,
I will file a bugzilla.

llvm-svn: 114534
2010-09-22 05:49:14 +00:00
Chris Lattner
26d11d7501 reimplement elf TLS support in terms of addressing modes, eliminating SegmentBaseAddress.
llvm-svn: 114529
2010-09-22 04:39:11 +00:00
Chris Lattner
d42791ad4a linux has a different stack alignment than the mac, relax this a bit.
llvm-svn: 114519
2010-09-22 00:46:26 +00:00
Chris Lattner
e52da86fab give VZEXT_LOAD a memory operand, it now works with segment registers.
llvm-svn: 114515
2010-09-22 00:34:38 +00:00
Chris Lattner
706b9206da revert r114386 now that address modes work correctly, we get a nice
call through gs-relative memory now.

llvm-svn: 114510
2010-09-22 00:11:31 +00:00
Chris Lattner
f9861312cb give LCMPXCHG_DAG[8] a memory operand, allowing it to work with addrspace 256/257
llvm-svn: 114508
2010-09-21 23:59:42 +00:00
Chris Lattner
08b4ce2b31 filecheckize
llvm-svn: 114507
2010-09-21 23:57:27 +00:00
Devang Patel
53b709a85c Use FileCheck
llvm-svn: 114475
2010-09-21 20:50:32 +00:00
Owen Anderson
97a8fdc19c When adding the carry bit to another value on X86, exploit the fact that the carry-materialization
(sbbl x, x) sets the registers to 0 or ~0.  Combined with two's complement arithmetic, we can fold
the intermediate AND and the ADD into a single SUB.

This fixes <rdar://problem/8449754>.

llvm-svn: 114460
2010-09-21 18:41:19 +00:00
Chris Lattner
ecdba24738 fix rdar://8453210, a crash handling a call through a GS relative load.
For now, just disable folding the load into the call.

llvm-svn: 114386
2010-09-21 03:37:00 +00:00
Evan Cheng
1ce02d180e Enable machine sinking critical edge splitting. e.g.
define double @foo(double %x, double %y, i1 %c) nounwind {
  %a = fdiv double %x, 3.2
  %z = select i1 %c, double %a, double %y
  ret double %z
}

Was:
_foo:
        divsd   LCPI0_0(%rip), %xmm0
        testb   $1, %dil
        jne     LBB0_2
        movaps  %xmm1, %xmm0
LBB0_2:
        ret

Now:
_foo:
        testb   $1, %dil
        je      LBB0_2
        divsd   LCPI0_0(%rip), %xmm0
        ret
LBB0_2:
        movaps  %xmm1, %xmm0
        ret

This avoids the divsd when early exit is taken.
rdar://8454886

llvm-svn: 114372
2010-09-20 22:52:00 +00:00
Owen Anderson
fc94b337eb When TCO is turned on, it is possible to end up with aliasing FrameIndex's. Therefore,
CombinerAA cannot assume that different FrameIndex's never alias, but can instead use
MachineFrameInfo to get the actual offsets of these slots and check for actual aliasing.

This fixes CodeGen/X86/2010-02-19-TailCallRetAddrBug.ll and CodeGen/X86/tailcallstack64.ll
when CombinerAA is enabled, modulo a different register allocation sequence.

llvm-svn: 114348
2010-09-20 20:39:59 +00:00
NAKAMURA Takumi
a8d8b5f3c3 test/CodeGen/X86: Add explicit triplet -mtriple=i686-linux to 3 tests incompatible to Win32 codegen.
r114297 raises 3 failures. They might fail also on mingw.

llvm-svn: 114317
2010-09-19 21:58:55 +00:00
Owen Anderson
015641f659 Invert the logic of reachesChainWithoutSideEffects(). What we want to check is that there is
NO path to the destination containing side effects, not that SOME path contains no side effects.
In  practice, this only manifests with CombinerAA enabled, because otherwise the chain has little
to no branching, so "any" is effectively equivalent to "all".

llvm-svn: 114268
2010-09-18 04:45:14 +00:00
Evan Cheng
8c2bde65f0 Teach machine sink to
1) Do forward copy propagation. This makes it easier to estimate the cost of the
   instruction being sunk.
2) Break critical edges on demand, including cases where the value is used by
   PHI nodes.
Critical edge splitting is not yet enabled by default.

llvm-svn: 114227
2010-09-17 22:28:18 +00:00
Dan Gohman
aaed2c137f Avoid emitting a PIC base register if no PIC addresses are needed.
This fixes rdar://8396318.

llvm-svn: 114201
2010-09-17 20:24:24 +00:00
Dale Johannesen
cf9dc14249 When substituting sunkaddrs into indirect arguments an asm, we were
walking the asm arguments once and stashing their Values.  This is
wrong because the same memory location can be in the list twice, and
if the first one has a sunkaddr substituted, the stashed value for the
second one will be wrong (use-after-free).  PR 8154.

llvm-svn: 114104
2010-09-16 18:30:55 +00:00
Bruno Cardoso Lopes
49efee5c95 Add one more pattern to fallback movddup
llvm-svn: 113522
2010-09-09 18:48:34 +00:00
Devang Patel
acddbab1a5 remove these tests for now.
llvm-svn: 113293
2010-09-07 22:03:44 +00:00
Devang Patel
751d5ec40a There is no need to force target if the test is going to run on other x86 platforms.
llvm-svn: 113285
2010-09-07 20:59:09 +00:00
Devang Patel
e9dcee0a69 Fix command line used to link these test cases.
llvm-svn: 113237
2010-09-07 18:17:56 +00:00
Devang Patel
ca974014e7 Reintroduce dbg-declare tests.
llvm-svn: 113232
2010-09-07 18:01:49 +00:00
Devang Patel
c1cdbbb6d5 Remove last three tests. I need to make them independent of my setup.
llvm-svn: 113213
2010-09-07 17:08:57 +00:00
Devang Patel
cb223026e6 Add a test case to check handling of dbg-declare during hybrid mode where we begin using fast-isel but switch back to DAG building at some point.
llvm-svn: 113210
2010-09-07 17:03:44 +00:00
Devang Patel
5b5dd1d32b Add a test case to check handling of dbg-declare by selection DAG builder.
llvm-svn: 113209
2010-09-07 16:56:35 +00:00
Devang Patel
bcdf6f20ac Add a test case to check handling of dbg-declare by fast-isel.
llvm-svn: 113208
2010-09-07 16:40:53 +00:00
Chris Lattner
684ae57b8e implement rdar://6653118 - fastisel should fold loads where possible.
Since mem2reg isn't run at -O0, we get a ton of reloads from the stack,
for example, before, this code:

int foo(int x, int y, int z) {
  return x+y+z;
}

used to compile into:

_foo:                                   ## @foo
	subq	$12, %rsp
	movl	%edi, 8(%rsp)
	movl	%esi, 4(%rsp)
	movl	%edx, (%rsp)
	movl	8(%rsp), %edx
	movl	4(%rsp), %esi
	addl	%edx, %esi
	movl	(%rsp), %edx
	addl	%esi, %edx
	movl	%edx, %eax
	addq	$12, %rsp
	ret

Now we produce:

_foo:                                   ## @foo
	subq	$12, %rsp
	movl	%edi, 8(%rsp)
	movl	%esi, 4(%rsp)
	movl	%edx, (%rsp)
	movl	8(%rsp), %edx
	addl	4(%rsp), %edx    ## Folded load
	addl	(%rsp), %edx     ## Folded load
	movl	%edx, %eax
	addq	$12, %rsp
	ret

Fewer instructions and less register use = faster compiles.

llvm-svn: 113102
2010-09-05 02:18:34 +00:00
Dale Johannesen
2f4f8f5705 Remove the rest of the nonexistent 64-bit AVX instructions.
Bruno, please review.

llvm-svn: 113014
2010-09-03 21:23:00 +00:00
NAKAMURA Takumi
736651f79b test/CodeGen/X86: Add explicit -mtriple=(i686|x86_64)-linux for Win32 host.
llvm-svn: 112947
2010-09-03 03:24:08 +00:00
Bruno Cardoso Lopes
f91bd70e9a AVX doesn't support mm operations neither its instrinsics.
The AVX versions of PALIGN and PABS* should only exist for
128-bit. Remove the unnecessary stuff.

llvm-svn: 112944
2010-09-03 02:08:45 +00:00
Anton Korobeynikov
32cecc0ecc Properly emit __chkstk call instead of __alloca on non-mingw windows targets.
Patch by Cameron Esfahani!

llvm-svn: 112902
2010-09-02 23:03:46 +00:00
Dan Gohman
6824bfc554 Don't narrow the load and store in a load+twiddle+store sequence unless
there are clearly no stores between the load and the store. This fixes
this miscompile reported as PR7833.

This breaks the test/CodeGen/X86/narrow_op-2.ll optimization, which is
safe, but awkward to prove safe. Move it to X86's README.txt.

llvm-svn: 112861
2010-09-02 21:18:42 +00:00
NAKAMURA Takumi
f5566eebf8 test/loop-strength-reduce4: Add explicit triplet for Win32 host.
llvm-svn: 112802
2010-09-02 03:45:58 +00:00
NAKAMURA Takumi
8e620796b6 test/twoaddr-coalesce: Do not use @main.
Win32 codegen emits implicit invoking __main into, to fail.

llvm-svn: 112801
2010-09-02 03:45:51 +00:00
Bruno Cardoso Lopes
601bf4c6d3 Using target specific nodes for shuffle nodes makes the mask
check more strict, breaking some cases not checked in the
testsuite, but also exposes some foldings not done before,
as this example:

  movaps  (%rdi), %xmm0
  movaps  (%rax), %xmm1
  movaps  %xmm0, %xmm2
  movss %xmm1, %xmm2
  shufps  $36, %xmm2, %xmm0

now is generated as:

  movaps  (%rdi), %xmm0
  movaps  %xmm0, %xmm1
  movlps  (%rax), %xmm1
  shufps  $36, %xmm1, %xmm0

llvm-svn: 112753
2010-09-01 22:33:20 +00:00
Jakob Stoklund Olesen
fe11c81560 Teach RemoveCopyByCommutingDef to check all aliases, not just subregisters.
This caused a miscompilation in WebKit where %RAX had conflicting defs when
RemoveCopyByCommutingDef was commuting a %EAX use.

llvm-svn: 112751
2010-09-01 22:15:35 +00:00
Dan Gohman
ad593f9d93 Revert 112442 and 112440 until the compile time problems introduced
by 112440 are resolved.

llvm-svn: 112692
2010-09-01 01:45:53 +00:00
Chris Lattner
765e59210c two changes:
1) nuke ConstDataCoalSection, which is dead.
2) revise my previous patch for rdar://8018335,
  which was completely wrong.  Specifically, it doesn't 
  make sense to mark __TEXT,__const_coal as PURE_INSTRUCTIONS,
  because it is for readonly data.  templates (it turns out)
  go to const_coal_nt.  The real fix for rdar://8018335 was
  to give ConstTextCoalSection a section kind of ReadOnly 
  instead of Text.

llvm-svn: 112496
2010-08-30 18:12:35 +00:00
Duncan Sands
254f8ff0a6 Correct bogus module triple specifications.
llvm-svn: 112469
2010-08-30 10:48:29 +00:00
Dan Gohman
9ea315c5ca Make IVUsers iterative instead of recursive.
This has the side effect of reversing the order of most of
IVUser's results.

llvm-svn: 112442
2010-08-29 16:40:03 +00:00
Dan Gohman
0fdab32e0f Make this test less dependent on register allocation choices.
llvm-svn: 112426
2010-08-29 14:49:42 +00:00
Chris Lattner
5f911e6fe9 merge a bunch of shuffle tests into sse2.ll
llvm-svn: 112398
2010-08-29 03:19:04 +00:00
Chris Lattner
c2ef3180a8 add some nounwind's
llvm-svn: 112396
2010-08-29 03:07:47 +00:00
Chris Lattner
8cb4abbc0e fix the buildvector->insertp[sd] logic to not always create a redundant
insertp[sd] $0, which is a noop.  Before:

_f32:                                   ## @f32
	pshufd	$1, %xmm1, %xmm2
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm2, %xmm3
	addss	%xmm1, %xmm0
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
	insertps	$0, %xmm0, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

after:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movdqa	%xmm2, %xmm0
	insertps	$16, %xmm3, %xmm0
	ret

The extra movs are due to a random (poor) scheduling decision.

llvm-svn: 112379
2010-08-28 17:59:08 +00:00
Chris Lattner
c3b630d64b fix the BuildVector -> unpcklps logic to not do pointless shuffles
when the top elements of a vector are undefined.  This happens all
the time for X86-64 ABI stuff because only the low 2 elements of
a 4 element vector are defined.  For example, on:

_Complex float f32(_Complex float A, _Complex float B) {
  return A+B;
}

We used to produce (with SSE2, SSE4.1+ uses insertps):

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$16, %xmm2, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm0
	addss	%xmm1, %xmm0
	pshufd	$16, %xmm0, %xmm1
	movdqa	%xmm2, %xmm0
	unpcklps	%xmm1, %xmm0
	ret

We now produce:

_f32:                                   ## @f32
	movdqa	%xmm0, %xmm2
	addss	%xmm1, %xmm2
	pshufd	$1, %xmm1, %xmm1
	pshufd	$1, %xmm0, %xmm3
	addss	%xmm1, %xmm3
	movaps	%xmm2, %xmm0
	unpcklps	%xmm3, %xmm0
	ret

This implements rdar://8368414

llvm-svn: 112378
2010-08-28 17:28:30 +00:00
Dan Gohman
507f5a8ae7 Completely disable tail calls when fast-isel is enabled, as fast-isel
doesn't currently support dealing with this.

llvm-svn: 112341
2010-08-28 00:51:03 +00:00
Chris Lattner
d5777f8e47 get this test passing on linux builders.
llvm-svn: 112280
2010-08-27 18:49:08 +00:00
Daniel Dunbar
f642d43594 X86: Fix an encoding issue with LOCK_ADD64mr, which could lead to very hard to find miscompiles with the integrated assembler.
llvm-svn: 112250
2010-08-27 01:30:14 +00:00
Chris Lattner
bc2f7bb5f3 Add a hackaround for PR7993 which is causing failures on x86 builders that lack sse2.
llvm-svn: 112175
2010-08-26 06:57:07 +00:00
Chris Lattner
ab96342d40 I think enough general codegen bugs are fixed to allow this to work
on random hosts, lets see!

llvm-svn: 112172
2010-08-26 05:52:42 +00:00
Chris Lattner
148485f707 implement SplitVecOp_CONCAT_VECTORS, fixing the included testcase with SSE1.
llvm-svn: 112171
2010-08-26 05:51:22 +00:00
Chris Lattner
3bec87fb38 Make sure this forces the x86 targets
llvm-svn: 112169
2010-08-26 05:25:05 +00:00
Chris Lattner
5256226fc8 fix sse1 only codegen in x86-64 mode, which is something we
apparently try to support.

llvm-svn: 112168
2010-08-26 05:24:29 +00:00
Chris Lattner
7bae652c62 temporarily disable this, which started failing on the llvm-i686-linux
builder.  I will investigate tonight.

llvm-svn: 112113
2010-08-25 23:43:14 +00:00
Chris Lattner
fe7c4ec039 Change handling of illegal vector types to widen when possible instead of
expanding: e.g. <2 x float> -> <4 x float> instead of -> 2 floats.  This
affects two places in the code: handling cross block values and handling
function return and arguments.  Since vectors are already widened by 
legalizetypes, this gives us much better code and unblocks x86-64 abi
and SPU abi work.

For example, this (which is a silly example of a cross-block value):
define <4 x float> @test2(<4 x float> %A) nounwind {
 %B = shufflevector <4 x float> %A, <4 x float> undef, <2 x i32> <i32 0, i32 1>
 %C = fadd <2 x float> %B, %B
  br label %BB
BB:
 %D = fadd <2 x float> %C, %C
 %E = shufflevector <2 x float> %D, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
 ret <4 x float> %E
}

Now compiles into:

_test2:                                 ## @test2
## BB#0:
 addps %xmm0, %xmm0
 addps %xmm0, %xmm0
 ret

previously it compiled into:

_test2:                                 ## @test2
## BB#0:
 addps %xmm0, %xmm0
 pshufd $1, %xmm0, %xmm1
                                        ## kill: XMM0<def> XMM0<kill> XMM0<def>
 insertps $0, %xmm0, %xmm0
 insertps $16, %xmm1, %xmm0
 addps %xmm0, %xmm0
 ret

This implements rdar://8230384

llvm-svn: 112101
2010-08-25 22:49:25 +00:00
Bruno Cardoso Lopes
c51b6e98c7 Convert test to use filecheck and make it more specific
llvm-svn: 112016
2010-08-25 01:47:16 +00:00
Dan Gohman
e400c660e4 Fix X86's isLegalAddressingMode to recognize that static addresses
need not be RIP-relative in small mode.

llvm-svn: 111917
2010-08-24 15:55:12 +00:00
Chris Lattner
f0f35c4aea Add a new llvm.x86.int intrinsic, allowing access to the
x86 int and int3 instructions.  Patch by Peter Housel!

llvm-svn: 111831
2010-08-23 19:39:25 +00:00
Dan Gohman
30b8e6cfd2 Fix x86 fast-isel's cmp+branch folding to avoid folding when the
comparison is in a different basic block from the branch. In such
cases, the comparison's operands may not have initialized virtual
registers available.

llvm-svn: 111709
2010-08-21 02:32:36 +00:00
Evan Cheng
44e9a498ac It's possible to sink a def if its local uses are PHI's.
llvm-svn: 111537
2010-08-19 18:33:29 +00:00
Dan Gohman
2d5b6bad99 When sending stats output to stdout for grepping, don't emit normal
output to standard output also.

llvm-svn: 111401
2010-08-18 20:32:46 +00:00
Dan Gohman
ee89338e37 Tweak IVUsers' concept of "interesting" to exclude add recurrences
where the step value is an induction variable from an outer loop, to
avoid trouble trying to re-expand such expressions. This effectively
hides such expressions from indvars and lsr, which prevents them
from getting into trouble.

llvm-svn: 111317
2010-08-17 22:50:37 +00:00
Evan Cheng
d6348fe9b2 Add nounwind.
llvm-svn: 111312
2010-08-17 22:35:20 +00:00
Dale Johannesen
535ca58e85 Make fast scheduler handle asm clobbers correctly.
PR 7882.  Follows suggestion by Amaury Pouly, thanks.

llvm-svn: 111306
2010-08-17 22:17:24 +00:00
Evan Cheng
0163d059e4 PHI elimination should not break back edge. It can cause some significant code placement issues. rdar://8263994
good:
LBB0_2:
  mov     r2, r0
  . . .
  mov     r1, r2
  bne     LBB0_2

bad:
LBB0_2:
  mov     r2, r0
  . . .
@ BB#3:
  mov     r1, r2
  b       LBB0_2

llvm-svn: 111221
2010-08-17 01:20:36 +00:00
Benjamin Kramer
0224854fdc Test expects SSE, give him SSE.
llvm-svn: 111115
2010-08-15 23:32:03 +00:00
Benjamin Kramer
3116e6f58d Restore arch on these test, they fail on arm.
llvm-svn: 111109
2010-08-15 20:42:56 +00:00
Dale Johannesen
6e5cf0f5b6 Mark as XFAIL on darwin 8. PR 7886.
llvm-svn: 111108
2010-08-15 19:40:29 +00:00
Dale Johannesen
3f9c148d0e Revert 110491. While not wrong, it was based on a
misanalysis and is undesirable.

llvm-svn: 111028
2010-08-13 18:43:45 +00:00
Bruno Cardoso Lopes
7cb26cb8be - Teach SSEDomainFix to switch between different levels of AVX instructions. Here we guess that AVX will have domain issues, so just implement them for consistency and in the future we remove if it's unnecessary.
- Make foldMemoryOperandImpl aware of 256-bit zero vectors folding and support the 128-bit counterparts of AVX too.
- Make sure MOV[AU]PS instructions are only selected when SSE1 is enabled, and duplicate the patterns to match AVX.
- Add a testcase for a simple 128-bit zero vector creation.

llvm-svn: 110946
2010-08-12 20:20:53 +00:00
Bruno Cardoso Lopes
bb491bd56c Begin to support some vector operations for AVX 256-bit intructions. The long
term goal here is to be able to match enough of vector_shuffle and build_vector
so all avx intrinsics which aren't mapped to their own built-ins but to
shufflevector calls can be codegen'd. This is the first (baby) step, support
building zeroed vectors.

llvm-svn: 110897
2010-08-12 02:06:36 +00:00
Devang Patel
66fc7d88ae This is x86 only test.
llvm-svn: 110887
2010-08-12 00:17:38 +00:00
Bruno Cardoso Lopes
2051068483 Add testcases for all AVX 256-bit intrinsics added in the last couple days
llvm-svn: 110854
2010-08-11 21:12:09 +00:00
Bruno Cardoso Lopes
fa19084e79 Reapply r109881 using a more strict command line for llc.
llvm-svn: 110833
2010-08-11 17:39:23 +00:00
Jakob Stoklund Olesen
99402e857d Fix test for more architectures. Patch by Tobias Grosser.
llvm-svn: 110685
2010-08-10 16:48:24 +00:00
Tobias Grosser
766f219db9 Fix failing testcase.
Those look like typos to me.

llvm-svn: 110664
2010-08-10 09:54:29 +00:00
Devang Patel
84f48b5483 Handle TAG_constant for integers.
llvm-svn: 110656
2010-08-10 07:11:13 +00:00
Dale Johannesen
23f9086dd3 Use sdmem and sse_load_f64 (etc.) for the vector
form of CMPSD (etc.)  Matching a 128-bit memory
operand is wrong, the instruction uses only 64 bits
(same as ADDSD etc.)  8193553.

llvm-svn: 110491
2010-08-07 00:33:42 +00:00
Eric Christopher
cf17d8dfa7 Add an option to always emit realignment code for a particular module.
llvm-svn: 110404
2010-08-05 23:57:43 +00:00
Devang Patel
9801232716 Move x86 specific tests into test/CodeGen/X86.
llvm-svn: 110372
2010-08-05 20:25:37 +00:00
Dan Gohman
d108d2b2f8 Move x86-specific tests out of test/Transforms/LoopStrengthReduce and
into test/CodeGen/X86, so that they aren't run when the x86 target is
not enabled.

Fix uglygep.ll to not be x86-specific.

llvm-svn: 110343
2010-08-05 17:04:15 +00:00
Daniel Dunbar
c93cd33f41 tests: CodeGen/X86/GC tests require X86.
llvm-svn: 110338
2010-08-05 15:45:33 +00:00
Bill Wendling
446a54d234 The lower invoke pass needs to have unreachable code elimination run after it
because it could create such things. This fixes a MingW buildbot test failure.

llvm-svn: 110279
2010-08-04 23:36:02 +00:00
Eli Friedman
401dbe036d PR7814: Truncates cannot be ignored for signed comparisons.
llvm-svn: 110268
2010-08-04 22:40:58 +00:00
Stuart Hastings
003c3778ff call-imm.ll test case regex fix. Patch by Dimitry Andric!
llvm-svn: 110199
2010-08-04 15:31:35 +00:00
Jakob Stoklund Olesen
058d1fc5bd OK, that's it. This test is going away now. But don't worry, I am taking it to a
nice farm in the country where it can play with other tests. And bunnies.

It is not clear what is being tested, and the revision history shows a bunch of
random changes to the expected instruction count. Clearly, we are just fudging
it to pass whenever it fails.

llvm-svn: 110118
2010-08-03 17:21:14 +00:00
Bob Wilson
43273fe746 Revert new AVX intrinsic tests. They are breaking buildbots and Bruno is
away from a computer now.
--- Reverse-merging r109881 into '.':
D    test/CodeGen/X86/avx-intrinsics-x86.ll
D    test/CodeGen/X86/avx-intrinsics-x86_64.ll

llvm-svn: 109959
2010-07-31 22:36:03 +00:00
Bruno Cardoso Lopes
f6ed26ef55 A *bunch* of tests for AVX intrinsics
llvm-svn: 109881
2010-07-30 19:57:56 +00:00
Eli Friedman
bea7c851cf Fix for bug reported by Evzen Muller on llvm-commits: make sure to correctly
check the range of the constant when optimizing a comparison between a
constant and a sign_extend_inreg node.

llvm-svn: 109854
2010-07-30 06:44:31 +00:00
Nate Begeman
133820e806 Implement a vectorized algorithm for <16 x i8> << <16 x i8>
This is about 4x faster and smaller than the existing scalarization.

llvm-svn: 109566
2010-07-28 00:21:48 +00:00
Nate Begeman
068e932975 ~40% faster vector shl <4 x i32> on SSE 4.1 Larger improvements for smaller types coming in future patches.
For:

define <2 x i64> @shl(<4 x i32> %r, <4 x i32> %a) nounwind readnone ssp {
entry:
  %shl = shl <4 x i32> %r, %a                     ; <<4 x i32>> [#uses=1]
  %tmp2 = bitcast <4 x i32> %shl to <2 x i64>     ; <<2 x i64>> [#uses=1]
  ret <2 x i64> %tmp2
}

We get:

_shl:                                   ## @shl
	pslld	$23, %xmm1
	paddd	LCPI0_0, %xmm1
	cvttps2dq	%xmm1, %xmm1
	pmulld	%xmm1, %xmm0
	ret

Instead of:

_shl:                                   ## @shl
	pshufd	$3, %xmm0, %xmm2
	movd	%xmm2, %eax
	pshufd	$3, %xmm1, %xmm2
	movd	%xmm2, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	pshufd	$1, %xmm0, %xmm3
	movd	%xmm3, %eax
	pshufd	$1, %xmm1, %xmm3
	movd	%xmm3, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm3
	punpckldq	%xmm2, %xmm3
	movd	%xmm0, %eax
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm2
	movhlps	%xmm0, %xmm0
	movd	%xmm0, %eax
	movhlps	%xmm1, %xmm1
	movd	%xmm1, %ecx
	shll	%cl, %eax
	movd	%eax, %xmm0
	punpckldq	%xmm0, %xmm2
	movdqa	%xmm2, %xmm0
	punpckldq	%xmm3, %xmm0
	ret

llvm-svn: 109549
2010-07-27 22:37:06 +00:00
Dan Gohman
1694c4352a Use the proper type for shift counts. This fixes a bootstrap error.
llvm-svn: 109265
2010-07-23 21:08:12 +00:00
Dan Gohman
8859ab786b DAGCombine (shl (anyext x, c)) to (anyext (shl x, c)) if the high bits
are not demanded. This often allows the anyext to be folded away.

llvm-svn: 109242
2010-07-23 18:03:30 +00:00
Eric Christopher
4924d5fb93 Custom lower the memory barrier instructions and add support
for lowering without sse2.  Add a couple of new testcases.

Fixes a few libgomp tests and latent bugs.  Remove a few todos.

llvm-svn: 109078
2010-07-22 02:48:34 +00:00
Dan Gohman
45ba7b5c5c Fix SCEV denormalization of expressions where the exit value from
one loop is involved in the increment of an addrec for another
loop. This fixes rdar://8168938.

llvm-svn: 108863
2010-07-20 17:06:20 +00:00
Duncan Sands
6203c740a9 The same problem was being tracked in PR7652.
llvm-svn: 108843
2010-07-20 15:52:32 +00:00
Dan Gohman
28f747a608 After a custom inserter, in a block which has constant instructions,
update the current basic block in addition to the current insert
position, so that they remain consistent. This fixes rdar://8204072.

llvm-svn: 108765
2010-07-19 22:48:56 +00:00
Owen Anderson
acd445be06 Remove r108639 now that it is handled by InstCombine instead.
llvm-svn: 108688
2010-07-19 08:10:24 +00:00
Owen Anderson
f1ae3b35e7 Add a testcase for r108639.
llvm-svn: 108640
2010-07-18 08:57:19 +00:00
Bill Wendling
85d6ed81b7 Consider this function:
void foo() { __builtin_unreachable(); }

It will output the following on Darwin X86:

_func1:
Leh_func_begin0:
        pushq %rbp
Ltmp0:
        movq %rsp, %rbp
Ltmp1:
Leh_func_end0:

This prolog adds a new Call Frame Information (CFI) row to the FDE with an
address that is not within the address range of the code it describes -- part is
equal to the end of the function -- and therefore results in an invalid EH
frame. If we emit a nop in this situation, then the CFI row is now within the
address range.

llvm-svn: 108568
2010-07-16 22:51:10 +00:00
Jakob Stoklund Olesen
858d6bb512 Remove the X86::FP_REG_KILL pseudo-instruction and the X86FloatingPointRegKill
pass that inserted it.

It is no longer necessary to limit the live ranges of FP registers to a single
basic block.

llvm-svn: 108536
2010-07-16 17:41:44 +00:00
Jakob Stoklund Olesen
114bab20ae Add forgotten test case.
llvm-svn: 108506
2010-07-16 04:45:35 +00:00
Dan Gohman
5e485c833f Use the source-order scheduler instead of the "fast" scheduler at -O0,
because it's more likely to keep debug line information in its original
order.

llvm-svn: 108496
2010-07-16 02:01:19 +00:00
Bill Wendling
756b0a4d45 Revert. This isn't the correct way to go.
llvm-svn: 108478
2010-07-15 23:42:21 +00:00
Bill Wendling
991234752d Handle code gen for the unreachable instruction if it's the only instruction in
the function. We'll just turn it into a "trap" instruction instead.

The problem with not handling this is that it might generate a prologue without
the equivalent epilogue to go with it:

$ cat t.ll
define void @foo() {
entry:
  unreachable
}
$ llc -o - t.ll -relocation-model=pic -disable-fp-elim -unwind-tables
        .section        __TEXT,__text,regular,pure_instructions
        .globl  _foo
        .align  4, 0x90
_foo:                                   ## @foo
Leh_func_begin0:
## BB#0:                                ## %entry
        pushq   %rbp
Ltmp0:
        movq    %rsp, %rbp
Ltmp1:
Leh_func_end0:
...

The unwind tables then have bad data in them causing all sorts of problems.

Fixes <rdar://problem/8096481>.

llvm-svn: 108473
2010-07-15 23:32:40 +00:00
Evan Cheng
ffbae6ad52 Split -enable-finite-only-fp-math to two options:
-enable-no-nans-fp-math and -enable-no-infs-fp-math. All of the current codegen fp math optimizations only care whether the fp arithmetics arguments and results can never be NaN.

llvm-svn: 108465
2010-07-15 22:07:12 +00:00
Chris Lattner
e2f110cba5 fix the definitions of ConstTextCoalSection/ConstDataCoalSection
to keep "Text" in sync with the "pure instructions" section attribute.
Lack of this attribute was preventing the assembler from emitting
multibyte noops instructions for templates (and inlines, and other
coalesced stuff) and was causing the assembler to mismatch .o files.

This fixes rdar://8018335

llvm-svn: 108461
2010-07-15 21:22:00 +00:00
Devang Patel
3028e38bd8 Fix crash reported in PR7653.
llvm-svn: 108441
2010-07-15 18:45:27 +00:00
Dan Gohman
d75ba463e0 Watch out for a constant offset cancelling out a base register, forming
a zero. This situation arrises in Fortran code with induction variables
that start at 1 instead of 0. This fixes PR7651.

llvm-svn: 108424
2010-07-15 15:14:45 +00:00
Devang Patel
8924b27a38 Make it a .ll test case.
llvm-svn: 108370
2010-07-14 23:12:52 +00:00
Dan Gohman
8e01a639c0 Delete fast-isel's trivial load optimization; it breaks debugging because
it can look past points where a debugger might modify user variables.

llvm-svn: 108336
2010-07-14 17:25:37 +00:00
Evan Cheng
f6478f489d Fix for PR7193 was overly conservative. The only case where sibcall callee
address cannot be allocated a register is in 32-bit mode where the first
three arguments are marked inreg. In that case EAX, EDX, and ECX will be
used for argument passing.

This fixes PR7610.

llvm-svn: 108327
2010-07-14 06:44:01 +00:00
Evan Cheng
7ff31f22a4 Re-enable the test with fix.
llvm-svn: 108319
2010-07-14 05:49:23 +00:00
Chris Lattner
b800dea3b6 temporarily disable to test to fix buildbots.
llvm-svn: 108310
2010-07-14 02:21:59 +00:00
Evan Cheng
72e40c4e08 Teach ProcessImplicitDefs to transform more COPY instructions into IMPLICIT_DEF (and subsequently eliminate them). This allows machine LICM to hoist IMPLICIT_DEF's. PR7620.
llvm-svn: 108304
2010-07-14 01:22:19 +00:00
Dale Johannesen
f84a7f2b4f In inline asm treat indirect 'X' constraint as 'm'.
This may not be right in all cases, but it's better
than asserting which it was doing before.  PR 7528.

llvm-svn: 108268
2010-07-13 20:17:05 +00:00
Evan Cheng
8cce7c7351 -enable-unsafe-fp-math should not imply -enable-finite-only-fp-math.
llvm-svn: 108254
2010-07-13 18:46:14 +00:00
Dale Johannesen
57a48e254e Fix PR number.
llvm-svn: 108251
2010-07-13 18:14:47 +00:00
Dan Gohman
e9c4426bb0 Apply the SSE dependence idiom for SSE unary operations to
SD instructions too, in addition to SS instructions. And
add a comment about it.

llvm-svn: 108191
2010-07-12 20:46:04 +00:00
Dan Gohman
7734fa156d Fix this test.
llvm-svn: 108059
2010-07-10 22:42:12 +00:00
Jakob Stoklund Olesen
36e769aa7b FileCheckize inline asm FP stack tests
llvm-svn: 108046
2010-07-10 16:30:25 +00:00
Dan Gohman
fef30fcd5e Reapply bottom-up fast-isel, with several fixes for x86-32:
- Check getBytesToPopOnReturn().
 - Eschew ST0 and ST1 for return values.
 - Fix the PIC base register initialization so that it doesn't ever
   fail to end up the top of the entry block.

llvm-svn: 108039
2010-07-10 09:00:22 +00:00
Jakob Stoklund Olesen
53d777f3bd Fix a few tests
llvm-svn: 108011
2010-07-09 20:43:09 +00:00
Dan Gohman
e7a09d478d Add a target triple.
llvm-svn: 108003
2010-07-09 19:17:36 +00:00
Dan Gohman
0c7d7ee288 Fix MachineLICM to actually visit inner loops.
llvm-svn: 108001
2010-07-09 18:49:45 +00:00
Bob Wilson
9e8c9204ef --- Reverse-merging r107947 into '.':
U    utils/TableGen/FastISelEmitter.cpp
--- Reverse-merging r107943 into '.':
U    test/CodeGen/X86/fast-isel.ll
U    test/CodeGen/X86/fast-isel-loads.ll
U    include/llvm/Target/TargetLowering.h
U    include/llvm/Support/PassNameParser.h
U    include/llvm/CodeGen/FunctionLoweringInfo.h
U    include/llvm/CodeGen/CallingConvLower.h
U    include/llvm/CodeGen/FastISel.h
U    include/llvm/CodeGen/SelectionDAGISel.h
U    lib/CodeGen/LLVMTargetMachine.cpp
U    lib/CodeGen/CallingConvLower.cpp
U    lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
U    lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp
U    lib/CodeGen/SelectionDAG/FastISel.cpp
U    lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
U    lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
U    lib/CodeGen/SelectionDAG/InstrEmitter.cpp
U    lib/CodeGen/SelectionDAG/TargetLowering.cpp
U    lib/Target/XCore/XCoreISelLowering.cpp
U    lib/Target/XCore/XCoreISelLowering.h
U    lib/Target/X86/X86ISelLowering.cpp
U    lib/Target/X86/X86FastISel.cpp
U    lib/Target/X86/X86ISelLowering.h

llvm-svn: 107987
2010-07-09 16:37:18 +00:00
Dan Gohman
7e6e4dd058 Re-apply bottom-up fast-isel, with fixes. Be very careful to avoid emitting
a DBG_VALUE after a terminator, or emitting any instructions before an EH_LABEL.

llvm-svn: 107943
2010-07-09 00:39:23 +00:00
Bill Wendling
8f0fcd1623 Extension of r107506. Make sure that we don't mark a function as having a call
if the inline ASM doesn't need a stack frame.

llvm-svn: 107922
2010-07-08 22:38:02 +00:00
Eric Christopher
091bf69467 A slight reworking of the custom patterns for x86-64 tpoff codegen and
correct the testcase for valid assembly.

Needs more tests.

llvm-svn: 107860
2010-07-08 07:36:46 +00:00
Dan Gohman
4dcc56a102 Revert 107840 107839 107813 107804 107800 107797 107791.
Debug info intrinsics win for now.

llvm-svn: 107850
2010-07-08 01:00:56 +00:00
Jakob Stoklund Olesen
34ec644313 Allow copies between GR8_ABCD_L and GR8_ABCD_H.
This fixes PR7540.

llvm-svn: 107809
2010-07-07 20:33:27 +00:00
Dan Gohman
d0caefa601 Implement bottom-up fast-isel. This has the advantage of not requiring
a separate DCE pass over MachineInstrs.

llvm-svn: 107804
2010-07-07 19:20:32 +00:00
Dan Gohman
424cc6b616 Add X86FastISel support for return statements. This entails refactoring
a bunch of stuff, to allow the target-independent calling convention
logic to be employed.

llvm-svn: 107800
2010-07-07 18:32:53 +00:00
Dale Johannesen
81ea05c193 Accept RIP-relative symbols with 'i' constraint, and
print the (%rip) only if the 'a' modifier is present.
PR 7528.

llvm-svn: 107727
2010-07-06 23:27:00 +00:00