1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-27 14:02:50 +01:00
Commit Graph

1487 Commits

Author SHA1 Message Date
Evan Cheng
1afd04fc59 Recognize inline asm 'rev /bin/bash, ' as a bswap intrinsic call.
llvm-svn: 123048
2011-01-08 01:24:27 +00:00
Evan Cheng
ae26b91353 Revert r122955. It seems using movups to lower memcpy can cause massive regression (even on Nehalem) in edge cases. I also didn't see any real performance benefit.
llvm-svn: 123015
2011-01-07 19:35:30 +00:00
Evan Cheng
1a1771584e Use movups to lower memcpy and memset even if it's not fast (like corei7).
The theory is it's still faster than a pair of movq / a quad of movl. This
will probably hurt older chips like P4 but should run faster on current
and future Intel processors. rdar://8817010

llvm-svn: 122955
2011-01-06 07:58:36 +00:00
Evan Cheng
cb39cc2164 Re-implement r122936 with proper target hooks. Now getMaxStoresPerMemcpy
etc. takes an option OptSize. If OptSize is true, it would return
the inline limit for functions with attribute OptSize.

llvm-svn: 122952
2011-01-06 06:52:41 +00:00
Benjamin Kramer
d8387aa9bd X86: Lower a select directly to a setcc_carry if possible.
int test(unsigned long a, unsigned long b) { return -(a < b); }
compiles to
  _test:                              ## @test
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    sbbl  %eax, %eax                  ## encoding: [0x19,0xc0]
    ret                               ## encoding: [0xc3]
instead of
  _test:                              ## @test
    xorl  %ecx, %ecx                  ## encoding: [0x31,0xc9]
    cmpq  %rsi, %rdi                  ## encoding: [0x48,0x39,0xf7]
    movl  $-1, %eax                   ## encoding: [0xb8,0xff,0xff,0xff,0xff]
    cmovael %ecx, %eax                ## encoding: [0x0f,0x43,0xc1]
    ret                               ## encoding: [0xc3]

llvm-svn: 122451
2010-12-22 23:09:28 +00:00
Benjamin Kramer
369872edfc Add some x86 specific dagcombines for conditional increments.
(add Y, (sete  X, 0)) -> cmp X, 1; adc  0, Y
(add Y, (setne X, 0)) -> cmp X, 1; sbb -1, Y
(sub (sete  X, 0), Y) -> cmp X, 1; sbb  0, Y
(sub (setne X, 0), Y) -> cmp X, 1; adc -1, Y

for
  unsigned foo(unsigned a, unsigned b) {
    if (a == 0) b++;
    return b;
  }
we now get:
  foo:
    cmpl  $1, %edi
    movl  %esi, %eax
    adcl  $0, %eax
    ret
instead of:
  foo:
    testl %edi, %edi
    sete  %al
    movzbl  %al, %eax
    addl  %esi, %eax
    ret

llvm-svn: 122364
2010-12-21 21:41:44 +00:00
Chris Lattner
65c5243bd6 rename MVT::Flag to MVT::Glue. "Flag" is a terrible name for
something that just glues two nodes together, even if it is
sometimes used for flags.

llvm-svn: 122310
2010-12-21 02:38:05 +00:00
Nate Begeman
c7dfecb10e Implement feedback from Bruno on making pblendvb an x86-specific ISD node in addition to being an intrinsic, and convert
lowering to use it.  Hopefully the pattern fragment is doing the right thing with XMM0, looks correct in testing.

llvm-svn: 122277
2010-12-20 22:04:24 +00:00
Chris Lattner
bee7320c3c now that addc/adde are gone, "ADDC" in the X86 backend uses EFLAGS results,
the same as setcc.  Optimize ADDC(0,0,FLAGS) -> SET_CARRY(FLAGS).  This is
a step towards finishing off PR5443.  In the testcase in that bug we now  get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbq	%rcx, %rcx
	testb	$1, %cl
	setne	%dl
	ret

instead of:

	movq	%rdi, %rax
	addq	%rsi, %rax
	movl	$0, %ecx
	adcq	$0, %rcx
	testq	%rcx, %rcx
	setne	%dl
	ret

llvm-svn: 122219
2010-12-20 01:37:09 +00:00
Chris Lattner
16ea7f257f use for loop over types.
llvm-svn: 122214
2010-12-20 01:03:27 +00:00
Chris Lattner
8b1f76cad6 Change the X86 backend to stop using the evil ADDC/ADDE/SUBC/SUBE nodes (which
their carry depenedencies with MVT::Flag operands) and use clean and beautiful
EFLAGS dependences instead.

We do this by changing the modelling of SBB/ADC to have EFLAGS input and outputs
(which is what requires the previous scheduler change) and change X86 ISelLowering
to custom lower ADDC and friends down to X86ISD::ADD/ADC/SUB/SBB nodes.

With the previous series of changes, this causes no changes in the testsuite, woo.

llvm-svn: 122213
2010-12-20 00:59:46 +00:00
Mon P Wang
d3adab7a64 Prevents PerformShuffleCombine from creating a node with an illegal type after legalize types
has run, e.g., prevent creating an i64 node from a v2i64 when i64 is not a legal type.

llvm-svn: 122206
2010-12-19 23:55:53 +00:00
Chris Lattner
297259f6f1 improve the setcc -> setcc_carry optimization to happen more
consistently by moving it out of lowering into dag combine.

Add some missing patterns for matching away extended versions of setcc_c.

llvm-svn: 122201
2010-12-19 22:08:31 +00:00
Chris Lattner
1f31c7fa15 simplify some code to just reuse a setcc if we can instead of
going through the CSE maps to get it.

llvm-svn: 122196
2010-12-19 21:23:48 +00:00
Chris Lattner
2d59eef5fd now that generic vector types aren't selected onto MMX operations,
we don't need -disable-mmx anymore.

llvm-svn: 122189
2010-12-19 20:19:20 +00:00
Chris Lattner
30438e63c8 reduce copy/paste programming with the power of for loops.
llvm-svn: 122187
2010-12-19 20:07:10 +00:00
Chris Lattner
29475c23d0 X86 supports i8/i16 overflow ops (except i8 multiplies), we should
generate them.  

Now we compile:

define zeroext i8 @X(i8 signext %a, i8 signext %b) nounwind ssp {
entry:
  %0 = tail call %0 @llvm.sadd.with.overflow.i8(i8 %a, i8 %b)
  %cmp = extractvalue %0 %0, 1
  br i1 %cmp, label %if.then, label %if.end

into:

_X:                                     ## @X
## BB#0:                                ## %entry
	subl	$12, %esp
	movb	16(%esp), %al
	addb	20(%esp), %al
	jo	LBB0_2

Before we were generating:

_X:                                     ## @X
## BB#0:                                ## %entry
	pushl	%ebp
	movl	%esp, %ebp
	subl	$8, %esp
	movb	12(%ebp), %al
	testb	%al, %al
	setge	%cl
	movb	8(%ebp), %dl
	testb	%dl, %dl
	setge	%ah
	cmpb	%cl, %ah
	sete	%cl
	addb	%al, %dl
	testb	%dl, %dl
	setge	%al
	cmpb	%al, %ah
	setne	%al
	andb	%cl, %al
	testb	%al, %al
	jne	LBB0_2

llvm-svn: 122186
2010-12-19 20:03:11 +00:00
Nate Begeman
ef5f3c0fa7 Add support for matching psign & plendvb to the x86 target
Remove unnecessary pandn patterns, 'vnot' patfrag looks through bitcasts

llvm-svn: 122098
2010-12-17 22:55:37 +00:00
Nate Begeman
cb6d1c8193 Formalize the notion that AVX and SSE are non-overlapping extensions from the compiler's point of view. Per email discussion, we either want to always use VEX-prefixed instructions or never use them, and are taking "HasAVX" to mean "Always use VEX". Passing -mattr=-avx,+sse42 should serve to restore legacy SSE support when desirable.
llvm-svn: 121439
2010-12-10 00:26:57 +00:00
Eric Christopher
0e40452eb0 Rewrite the darwin tlv support to use a chain and return to copying
the output to the correct register. Fixes a hidden problem uncovered
by the last patch where we'd try to DAG combine our MVT::Other node
oddly.

llvm-svn: 121358
2010-12-09 06:25:53 +00:00
Eric Christopher
64e662fce9 Stop confusing people, it's not really a chain, or a tumor.
llvm-svn: 121340
2010-12-09 00:57:19 +00:00
Eric Christopher
0100a8fda4 Remove extraneous copy from DAG conversion for darwin tls. This was
popping up at O0 when it wasn't folded and the fast allocator would
complain.

llvm-svn: 121330
2010-12-09 00:27:58 +00:00
Chris Lattner
e30adfb732 Teach X86ISelLowering that the second result of X86ISD::UMUL is a flags
result.  This allows us to compile:

void *test12(long count) {
      return new int[count];
}

into:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	movq	$-1, %rdi
	cmovnoq	%rax, %rdi
	jmp	__Znam                  ## TAILCALL

instead of:

test12:
	movl	$4, %ecx
	movq	%rdi, %rax
	mulq	%rcx
	seto	%cl
	testb	%cl, %cl
	movq	$-1, %rdi
	cmoveq	%rax, %rdi
	jmp	__Znam

Of course it would be even better if the regalloc inverted the cmov to 'cmovoq',
which would eliminate the need for the 'movq %rdi, %rax'.

llvm-svn: 120936
2010-12-05 07:49:54 +00:00
Chris Lattner
76601e7a99 it turns out that when ".with.overflow" intrinsics were added to the X86
backend that they were all implemented except umul.  This one fell back
to the default implementation that did a hi/lo multiply and compared the
top.  Fix this to check the overflow flag that the 'mul' instruction
sets, so we can avoid an explicit test.  Now we compile:

void *func(long count) {
      return new int[count];
}

into:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	seto	%cl                     ## encoding: [0x0f,0x90,0xc1]
	testb	%cl, %cl                ## encoding: [0x84,0xc9]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

instead of:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL

Other than the silly seto+test, this is using the o bit directly, so it's going in the right
direction.

llvm-svn: 120935
2010-12-05 07:30:36 +00:00
Chris Lattner
16bafb2414 generalize the previous check to handle -1 on either side of the
select, inserting a not to compensate.  Add a missing isZero check
that I lost somehow.

This improves codegen of:

void *func(long count) {
      return new int[count];
}

from:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	testq	%rdx, %rdx              ## encoding: [0x48,0x85,0xd2]
	movq	$-1, %rdi               ## encoding: [0x48,0xc7,0xc7,0xff,0xff,0xff,0xff]
	cmoveq	%rax, %rdi              ## encoding: [0x48,0x0f,0x44,0xf8]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

to:

__Z4funcl:                              ## @_Z4funcl
	movl	$4, %ecx                ## encoding: [0xb9,0x04,0x00,0x00,0x00]
	movq	%rdi, %rax              ## encoding: [0x48,0x89,0xf8]
	mulq	%rcx                    ## encoding: [0x48,0xf7,0xe1]
	cmpq	$1, %rdx                ## encoding: [0x48,0x83,0xfa,0x01]
	sbbq	%rdi, %rdi              ## encoding: [0x48,0x19,0xff]
	notq	%rdi                    ## encoding: [0x48,0xf7,0xd7]
	orq	%rax, %rdi              ## encoding: [0x48,0x09,0xc7]
	jmp	__Znam                  ## TAILCALL
                                        ## encoding: [0xeb,A]

llvm-svn: 120932
2010-12-05 02:00:51 +00:00
Chris Lattner
474ed0aa9b Improve an integer select optimization in two ways:
1. generalize 
    (select (x == 0), -1, 0) -> (sign_bit (x - 1))
to:
    (select (x == 0), -1, y) -> (sign_bit (x - 1)) | y

2. Handle the identical pattern that happens with !=:
   (select (x != 0), y, -1) -> (sign_bit (x - 1)) | y

cmov is often high latency and can't fold immediates or
memory operands.  For example for (x == 0) ? -1 : 1, before 
we got:

< 	testb	%sil, %sil
< 	movl	$-1, %ecx
< 	movl	$1, %eax
< 	cmovel	%ecx, %eax

now we get:

> 	cmpb	$1, %sil
> 	sbbl	%eax, %eax
> 	orl	$1, %eax

llvm-svn: 120929
2010-12-05 01:23:24 +00:00
Benjamin Kramer
851691ddb2 Add patterns for the x86 popcnt instruction.
- Also adds a new POPCNT subtarget feature that is currently enabled if the target
  supports SSE4.2 (nehalem) or SSE4A (barcelona).

llvm-svn: 120917
2010-12-04 20:32:23 +00:00
Benjamin Kramer
77faee6ba1 Simplify code. No functionality change.
llvm-svn: 120907
2010-12-04 14:22:24 +00:00
Evan Cheng
4118b24aca Fix and re-enable tail call optimization of expanded libcalls.
llvm-svn: 120622
2010-12-01 22:59:46 +00:00
Duncan Sands
cd4f56b8e2 I don't think it makes any sense to assert that the target supports SSE3 here.
The user (i.e. whoever generated a call to the intrinsic in the first place) is
essentially asking for a particular instruction to be placed in the assembler.
If that instruction won't execute on the target machine, that's their problem
not ours.  Two buildbots with processors that don't support SSE3 were barfing
on the apm.ll test in CodeGen/X86 because of this assertion.

llvm-svn: 120574
2010-12-01 12:58:13 +00:00
Evan Cheng
84162760b7 Speculatively disable x86 portion of r120501 to appease the x86_64 buildbot.
llvm-svn: 120549
2010-12-01 03:27:20 +00:00
Evan Cheng
f7e586d749 Enable sibling call optimization of libcalls which are expanded during
legalization time. Since at legalization time there is no mapping from
SDNode back to the corresponding LLVM instruction and the return
SDNode is target specific, this requires a target hook to check for
eligibility. Only x86 and ARM support this form of sibcall optimization
right now.
rdar://8707777

llvm-svn: 120501
2010-11-30 23:55:39 +00:00
Eric Christopher
73365ae8b6 Fix insertion point in pcmp expander.
While I'm there, clean up too many \n even for me.

llvm-svn: 120411
2010-11-30 08:20:21 +00:00
Eric Christopher
2170738538 Fix some cleanups from my last patch.
llvm-svn: 120410
2010-11-30 08:10:28 +00:00
Eric Christopher
f27f0b5234 Rewrite mwait and monitor support and custom lower arguments.
Fixes PR8573.

llvm-svn: 120404
2010-11-30 07:20:12 +00:00
Rafael Espindola
9287c4b38f Move lowering of TLS_addr32 and TLS_addr64 to X86MCInstLower.
llvm-svn: 120263
2010-11-28 21:16:39 +00:00
Rafael Espindola
45cd9713f2 Lower TLS_addr32 and TLS_addr64.
llvm-svn: 120225
2010-11-27 20:43:02 +00:00
Wesley Peck
d589353ad0 Renaming ISD::BIT_CONVERT to ISD::BITCAST to better reflect the LLVM IR concept.
llvm-svn: 119990
2010-11-23 03:31:01 +00:00
Anton Korobeynikov
269e7d3be1 Move hasFP() and few related hooks to TargetFrameInfo.
llvm-svn: 119740
2010-11-18 21:19:35 +00:00
Chris Lattner
9a0a840839 add targetoperand flags for jump tables, constant pool and block address
nodes to indicate when ha16/lo16 modifiers should be used.  This lets
us pass PowerPC/indirectbr.ll.

The one annoying thing about this patch is that the MCSymbolExpr isn't
expressive enough to represent ha16(label1-label2) which we need on
PowerPC.  I have a terrible hack in the meantime, but this will have
to be revisited at some point.

Last major conversion item left is global variable references.

llvm-svn: 119105
2010-11-15 02:46:57 +00:00
Chris Lattner
51168d6510 move the pic base symbol stuff up to MachineFunction
since it is trivial and will be shared between ppc and x86.
This substantially simplifies the X86 backend also.

llvm-svn: 119089
2010-11-14 22:48:15 +00:00
Chris Lattner
09ea2ce78f simplify getPICBaseSymbol a bit.
llvm-svn: 119088
2010-11-14 22:37:11 +00:00
Peter Collingbourne
4ec6b2dbe7 Recognise 32-bit ror-based bswap implementation used by uclibc
llvm-svn: 119007
2010-11-13 19:54:30 +00:00
Peter Collingbourne
6c3094234e Support ; as asm separator
llvm-svn: 119006
2010-11-13 19:54:23 +00:00
Dale Johannesen
659061fcd3 Remove possibly useful info from comment, per Chris.
llvm-svn: 118865
2010-11-12 00:43:18 +00:00
Duncan Sands
41edf30895 Simplify uses of MVT and EVT. An MVT can be compared directly
with a SimpleValueType, while an EVT supports equality and
inequality comparisons with SimpleValueType.

llvm-svn: 118169
2010-11-03 12:17:33 +00:00
Duncan Sands
92f33ea784 Factorize the duplicated logic for choosing the right argument
calling convention out of the fast and normal ISel files, and
into the calling convention TD file.

llvm-svn: 117856
2010-10-31 13:21:44 +00:00
John Thompson
6115a7f1d4 Inline asm multiple alternative constraints development phase 2 - improved basic logic, added initial platform support.
llvm-svn: 117667
2010-10-29 17:29:13 +00:00
Michael J. Spencer
5518dda87e x86-Win32: Switch ftol2 calling convention from stdcall to C.
llvm-svn: 117474
2010-10-27 18:52:38 +00:00
Dale Johannesen
2566d39d07 An stdcall function calling a non-stdcall function
cannot use tailcall.  PR 8461.

llvm-svn: 117322
2010-10-25 22:17:05 +00:00