1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 14:33:02 +02:00
Commit Graph

62 Commits

Author SHA1 Message Date
Eli Friedman
a343d87eac Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768.
llvm-svn: 148240
2012-01-16 16:42:21 +00:00
Eli Friedman
a2b480b010 Get rid of unused codegen-only instruction.
llvm-svn: 148239
2012-01-16 16:29:35 +00:00
Benjamin Kramer
ae4ad5f924 X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63.
llvm-svn: 148024
2012-01-12 12:41:34 +00:00
Chandler Carruth
9ef50ef1f7 Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244
2011-12-24 10:55:54 +00:00
Chandler Carruth
7564e8371a Begin teaching the X86 target how to efficiently codegen patterns that
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974
2011-12-20 11:19:37 +00:00
Rafael Espindola
1958dc7193 Fixes an issue reported by -verify-machineinstrs.
Patch by Sanjoy Das.

llvm-svn: 143064
2011-10-26 21:16:41 +00:00
Rafael Espindola
90896edc6c This commit introduces two fake instructions MORESTACK_RET and
MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET
followed by a MOV respectively.  Having a fake instruction prevents
the verifier from seeing a MachineBasicBlock end with a
non-terminator (MOV).  It also prevents the rather eccentric case of a
MachineBasicBlock ending with RET but having successors nevertheless.

Patch by Sanjoy Das.

llvm-svn: 143062
2011-10-26 21:12:27 +00:00
Eli Friedman
34ffc961d7 Fix the assembler strings for a couple of atomic instructions. Doesn't really matter much in practice, but it's a bit cleaner.
llvm-svn: 139563
2011-09-13 00:27:04 +00:00
Eli Friedman
9ea5599729 Fix atomic load and store on x86 to pass -verify-machineinstrs (and possibly fix some subtle bugs involving passes which check mayStore()).
This isn't exactly ideal, but it is good enough for the moment.

llvm-svn: 139245
2011-09-07 18:48:32 +00:00
Jakob Stoklund Olesen
ef8527b836 Pseudo CMOV instructions don't clobber EFLAGS.
The explanation about a 0 argument being materialized as xor is no
longer valid.  Rematerialization will check if EFLAGS is live before
clobbering it.

The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.

This causes one less testb instruction to be generated in the cmov.ll
test case.

llvm-svn: 139057
2011-09-02 23:52:55 +00:00
Rafael Espindola
7721c15106 Adds a SelectionDAG node X86SegAlloca which will be custom lowered
from DYNAMIC_STACKALLOC.

Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added.  They
will be custom emitted to inject the actual stack handling code.

Patch by Sanjoy Das.

llvm-svn: 138814
2011-08-30 19:43:21 +00:00
Eli Friedman
9f95c7d381 Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction.
llvm-svn: 138660
2011-08-26 21:21:21 +00:00
Eli Friedman
6f95a6ae1b Basic x86 code generation for atomic load and store instructions.
llvm-svn: 138478
2011-08-24 20:50:09 +00:00
Bruno Cardoso Lopes
9a695724bd Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Eli Friedman
44fd5b2b59 Fix a couple ridiculous copy-paste errors. rdar://9914773 .
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Eli Friedman
1a80401da2 X86ISD::MEMBARRIER does not require SSE2; it doesn't actually generate any code, and all x86 processors will honor the required semantics.
llvm-svn: 136249
2011-07-27 19:43:50 +00:00
Dan Gohman
4762d28ff9 Add a comment describing why transforming (shl x, 1) to (add x, x) is to be
considered safe enough in this context.

llvm-svn: 133159
2011-06-16 15:55:48 +00:00
Benjamin Kramer
85e86083d5 X86: smulo -> add is now done target-independently in DAGCombiner, remove the patterns.
llvm-svn: 131801
2011-05-21 18:32:01 +00:00
Stuart Hastings
e3158f93ec Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends.
rdar://problem/8614450

llvm-svn: 131746
2011-05-20 19:04:40 +00:00
Stuart Hastings
ff15dfa12e Reverting 131641 to investigate 'bot complaint.
llvm-svn: 131654
2011-05-19 17:54:42 +00:00
Stuart Hastings
7baa1babdb Revise MOVSX16rr8/MOVZX16rr8 (and rm variants) to no longer be
pseudos.  rdar://problem/8614450

llvm-svn: 131641
2011-05-19 16:59:50 +00:00
Eric Christopher
c03ef7ebb3 Support XOR and AND optimization with no return value.
Finishes off rdar://8470697

llvm-svn: 131458
2011-05-17 08:10:18 +00:00
Eric Christopher
3c17ef53c3 Optimize atomic lock or that doesn't use the result value.
Next up: xor and and.

Part of rdar://8470697

llvm-svn: 131171
2011-05-10 23:57:45 +00:00
Eric Christopher
aa7c86ec19 Refactor lock versions of binary operators to be a little less
cut and paste.

llvm-svn: 131139
2011-05-10 18:36:16 +00:00
Benjamin Kramer
ba7c9948e8 X86: Add a bunch of peeps for add and sub of SETB.
"b + ((a < b) ? 1 : 0)" compiles into
	cmpl	%esi, %edi
	adcl	$0, %esi
instead of
	cmpl	%esi, %edi
	sbbl	%eax, %eax
	andl	$1, %eax
	addl	%esi, %eax

This saves a register, a false dependency on %eax
(Intel's CPUs still don't ignore it) and it's shorter.

llvm-svn: 131070
2011-05-08 18:36:07 +00:00
Dan Gohman
71117af2db The labyrinthine X86 backend no longer appears to require
these patterns.

llvm-svn: 125759
2011-02-17 18:50:19 +00:00
NAKAMURA Takumi
8ace7260cc Target/X86: Tweak win64's tailcall.
llvm-svn: 124272
2011-01-26 02:04:09 +00:00
NAKAMURA Takumi
066378440a Fix whitespace.
llvm-svn: 124270
2011-01-26 02:03:37 +00:00
Eric Christopher
e8aa8b114f The stub routine that we're calling uses test and so clobbers
the flags.

llvm-svn: 123712
2011-01-18 01:37:20 +00:00
Chris Lattner
2d4e17d195 We lower setb to sbb with the hope that the and will go away, when it
doesn't, match it back to setb.

On a 64-bit version of the testcase before we'd get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbb	%dl, %dl
	andb	$1, %dl
	ret

now we get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	setb	%dl
	ret

llvm-svn: 122217
2010-12-20 01:16:03 +00:00
Chris Lattner
297259f6f1 improve the setcc -> setcc_carry optimization to happen more
consistently by moving it out of lowering into dag combine.

Add some missing patterns for matching away extended versions of setcc_c.

llvm-svn: 122201
2010-12-19 22:08:31 +00:00
Evan Cheng
72dca1ee17 Only rr forms of ADD*_DB are commutable.
llvm-svn: 121908
2010-12-15 22:57:36 +00:00
Eric Christopher
cc8a622ca4 Add rsp to the uses for the same reason as 32-bit.
llvm-svn: 121328
2010-12-09 00:26:41 +00:00
Rafael Espindola
9287c4b38f Move lowering of TLS_addr32 and TLS_addr64 to X86MCInstLower.
llvm-svn: 120263
2010-11-28 21:16:39 +00:00
Rafael Espindola
45cd9713f2 Lower TLS_addr32 and TLS_addr64.
llvm-svn: 120225
2010-11-27 20:43:02 +00:00
Chris Lattner
9da275f86b reject instructions that contain a \n in their asmstring. Mark
various X86 and ARM instructions that are bitten by this as isCodeGenOnly,
as they are.

llvm-svn: 117884
2010-11-01 00:46:16 +00:00
Chris Lattner
5d088218e5 two changes: make the asmmatcher generator ignore ARM pseudos properly,
and make it a hard error for instructions to not have an asm string.
These instructions should be marked isCodeGenOnly.

llvm-svn: 117861
2010-10-31 19:15:18 +00:00
Michael J. Spencer
5a68d7ce94 X86: Add alloca probing to dynamic alloca on Windows. Fixes PR8424.
llvm-svn: 116984
2010-10-21 01:41:01 +00:00
Michael J. Spencer
54b462089f Fix Whitespace.
llvm-svn: 116972
2010-10-20 23:40:27 +00:00
Rafael Espindola
b1ae74bd73 Fix another case where we were preferring instructions with large
immediates instead of 8 bits ones.

llvm-svn: 116410
2010-10-13 17:14:25 +00:00
Rafael Espindola
ff7f11c151 Fix PR8365 by adding a more specialized Pat that checks if an 'and' with
8 bit constants can be used.

llvm-svn: 116403
2010-10-13 13:31:20 +00:00
Dan Gohman
d904add908 Initial va_arg support for x86-64. Patch by David Meyer!
llvm-svn: 116319
2010-10-12 18:00:49 +00:00
Chris Lattner
82ce325f16 reapply: Use the new TB_NOT_REVERSABLE flag instead of special
reapply: reimplement the second half of the or/add optimization.  We should now

with no changes.  Turns out that one missing "Defs = [EFLAGS]" can upset things
a bit.

llvm-svn: 116040
2010-10-08 03:57:25 +00:00
Chris Lattner
fbdd285dd6 reapply the patch reverted in r116033:
"Reimplement (part of) the or -> add optimization.  Matching 'or' into 'add'"

With a critical fix: the add pseudos clobber EFLAGS.

llvm-svn: 116039
2010-10-08 03:54:52 +00:00
Daniel Dunbar
d3b6b8bf2b Revert "Reimplement (part of) the or -> add optimization. Matching 'or' into
'add'", which seems to have broken just about everything.

llvm-svn: 116033
2010-10-08 02:07:32 +00:00
Daniel Dunbar
983fae5a86 Revert "reimplement the second half of the or/add optimization. We should now",
which depends on r116007, which I am about to revert.

llvm-svn: 116031
2010-10-08 02:07:26 +00:00
Chris Lattner
7577cb7b49 reimplement the second half of the or/add optimization. We should now
only end up emitting LEA instead of OR.  If we aren't able to promote
something into an LEA, we should never be emitting it as an ADD.

Add some testcases that we emit "or" in cases where we used to produce
an "add".

llvm-svn: 116026
2010-10-08 01:05:10 +00:00
Chris Lattner
d8f05bf65e Reimplement (part of) the or -> add optimization. Matching 'or' into 'add'
is general goodness because it allows ORs to be converted to LEA to avoid
inserting copies.  However, this is bad because it makes the generated .s
file less obvious and gives valgrind heartburn (tons of false positives in
bitfield code).

While the general fix should be in valgrind, we can at least try to avoid
emitting ADD instructions that *don't* get promoted to LEA.  This is more
work because it requires introducing pseudo instructions to represents
"add that knows the bits are disjoint", but hey, people really love valgrind.

This fixes this testcase:
https://bugs.kde.org/show_bug.cgi?id=242137#c20

the add r/i cases are coming next.

llvm-svn: 116007
2010-10-07 23:36:18 +00:00
Chris Lattner
ef2e024af8 Move cmov pseudo instructions to InstrCompiler,
convert all the rest of the cmovs to the multiclass,
with good results:

 X86InstrCMovSetCC.td |  598 +--------------------------------------------------
 X86InstrCompiler.td  |   61 +++++
 2 files changed, 77 insertions(+), 582 deletions(-)

llvm-svn: 115707
2010-10-05 23:09:10 +00:00
Chris Lattner
195a9c3877 Use #NAME# to have the CMOV multiclass define things with the same names as before
(e.g. CMOVBE16rr instead of CMOVBErr16).

llvm-svn: 115705
2010-10-05 23:00:14 +00:00