1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00
Commit Graph

173 Commits

Author SHA1 Message Date
Robin Morisset
7114ac8258 [X86] Allow atomic operations using immediates to avoid using a register
The only valid lowering of atomic stores in the X86 backend was mov from
register to memory. As a result, storing an immediate required a useless copy
of the immediate in a register. Now these can be compiled as a simple mov.

Similarily, adding/and-ing/or-ing/xor-ing an
immediate to an atomic location (but through an atomic_store/atomic_load,
not a fetch_whatever intrinsic) can now make use of an 'add $imm, x(%rip)'
instead of using a register. And the same applies to inc/dec.

This second point matches the first issue identified in
  http://llvm.org/bugs/show_bug.cgi?id=17281

llvm-svn: 216980
2014-09-02 22:16:29 +00:00
Reid Kleckner
07d84ca71d Fix failure to invoke exception handler on Win64
When the last instruction prior to a function epilogue is a call, we
need to emit a nop so that the return address is not in the epilogue IP
range.  This is consistent with MSVC's behavior, and may be a workaround
for a bug in the Win64 unwinder.

Differential Revision: http://reviews.llvm.org/D4751

Patch by Vadim Chugunov!

llvm-svn: 214775
2014-08-04 21:05:27 +00:00
Akira Hatanaka
2cf112b51e [X86] Simplify X87 stackifier pass.
Stop using ST registers for function returns and inline-asm instructions and use
FP registers instead. This allows removing a large amount of code in the
stackifier pass that was needed to track register liveness and handle copies
between ST and FP registers and function calls returning floating point values.

It also fixes a bug which manifests when an ST register defined by an
inline-asm instruction was live across another inline-asm instruction, as shown
in the following sequence of machine instructions:

1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5>
2. INLINEASM <es:fldcw $0>
3. %FP0<def> = COPY %ST0

<rdar://problem/16952634>

llvm-svn: 214580
2014-08-01 22:19:41 +00:00
Cameron McInally
e9e4e99ecf Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...).
llvm-svn: 213073
2014-07-15 16:24:24 +00:00
Cameron McInally
6eb8b83c5d Add x86 patterns to match a specific add-with-carry.
llvm-svn: 213070
2014-07-15 15:03:32 +00:00
Tim Northover
60e9ada729 X86: expand atomics in IR instead of as MachineInstrs.
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..

rdar://problem/13496295

llvm-svn: 212119
2014-07-01 18:53:31 +00:00
NAKAMURA Takumi
c5a2c81f7e Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211691
2014-06-25 12:41:52 +00:00
NAKAMURA Takumi
35a44c8eda Reformat.
llvm-svn: 211689
2014-06-25 12:40:56 +00:00
NAKAMURA Takumi
eca89a0522 Revert r211399, "Generate native unwind info on Win64"
It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64.

llvm-svn: 211480
2014-06-22 22:00:56 +00:00
Reid Kleckner
40d9c6f936 Generate native unwind info on Win64
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211399
2014-06-20 20:35:47 +00:00
Alexey Volkov
a3a5a1d7f1 [X86] Use ADD/SUB instead of INC/DEC for Silvermont
According to Intel Software Optimization Manual 
on Silvermont INC or DEC instructions require 
an additional uop to merge the flags.
As a result, a branch instruction depending 
on an INC or a DEC instruction incurs a 1 cycle penalty.

Differential Revision: http://reviews.llvm.org/D3990

llvm-svn: 210466
2014-06-09 11:40:41 +00:00
Jay Foad
e0eac700cb Rename ComputeMaskedBits to computeKnownBits. "Masked" has been
inappropriate since it lost its Mask parameter in r154011.

llvm-svn: 208811
2014-05-14 21:14:37 +00:00
Adam Nemet
fc761e9a09 [X86] Add peephole for masked rotate amount
Extend what's currently done for shift because the HW performs this masking
implicitly:

   (rotl:i32 x, (and y, 31)) -> (rotl:i32 x, y)

I use the newly factored out multiclass that was only supporting shifts so
far.

For testing I extended my testcase for the new rotation idiom.

<rdar://problem/15295856>

llvm-svn: 203718
2014-03-12 21:20:55 +00:00
Adam Nemet
4305369df5 [X86] Refactor peepholes for masked shift amount into a multiclass
The peephole (shift x, (and y, 31)) -> (shift x, y) is repeated for each
integer type and each shift variant.

To improve this a new multiclass is added that covers all integer types.  The
shift patterns are now instantiated from this.  I am planning to add new
instances for rotates as well.

No functional change intended:

  * test/CodeGen/X86/shift-and.ll provides coverage

  * Compared the expanded tablegen output and matched up the defs for these
    Pat<>s before and after

llvm-svn: 203685
2014-03-12 18:02:33 +00:00
Jim Grosbach
3b6ef12947 X86: Enable ISel of 16-bit MOVBE instructions.
When the MOVBE instructions are available, use them for 16-bit endian
swapping as well as for 32 and 64 bit.

The patterns were already present on the instructions, but weren't being
matched because the operation was unconditionally marked to 'Expand.'
Change that to be conditional on whether the MOVBE instructions are
available. Use 'rolw' to implement the in-register version (32 and 64
bit have the dedicated 'bswap' instruction for that).

Patch by Louis Gerbarg <lgg@apple.com>.

rdar://15479984

llvm-svn: 203524
2014-03-11 00:44:14 +00:00
Craig Topper
e916566881 Merge x86 HasOpSizePrefix/HasOpSize16Prefix into a 2-bit OpSize field with 0 meaning no 0x66 prefix in any mode. Rename Opsize16->OpSize32 and OpSize->OpSize16. The classes now refer to their operand size rather than the mode in which they need a 0x66 prefix. Hopefully can merge REX_W into this as OpSize64.
llvm-svn: 200626
2014-02-02 09:25:09 +00:00
David Woodhouse
772280aaa8 [x86] Remove OpSize16 flag from MOV32r0
It's not a real instruction any more and doesn't need encoding information.

llvm-svn: 198778
2014-01-08 18:38:26 +00:00
David Woodhouse
8bc826fd14 [x86] Add OpSize16 to instructions that need it
This fixes the bulk of 16-bit output, and the corresponding test case
x86-16.s now looks mostly like the x86-32.s test case that it was
originally based on. A few irrelevant instructions have been dropped,
and there are still some corner cases to be fixed in subsequent patches.

llvm-svn: 198752
2014-01-08 12:57:40 +00:00
Craig Topper
423b149a44 Remove opcode from MOV32r0 that I accidentally left when I converted it to Pseudo. Remove FIXME as well.
llvm-svn: 198564
2014-01-05 19:25:13 +00:00
Craig Topper
ed98df1d3a Handle MOV32r0 in expandPostRAPseudo instead of MCInst lowering. No functional change intended.
llvm-svn: 198254
2013-12-31 03:05:38 +00:00
Eric Christopher
24d8bb6edd [x86] Rename In32BitMode predicate to Not64BitMode
That's what it actually means, and with 16-bit support it's going to be
a little more relevant since in a few corner cases we may actually want
to distinguish between 16-bit and 32-bit mode (for example the bare 'push'
aliases to pushw/pushl etc.)

Patch by David Woodhouse

llvm-svn: 197768
2013-12-20 02:04:49 +00:00
Duncan P. N. Exon Smith
85e7983ab3 Revert "Revert "Mark vastart_save_xmm_regs as changing EFLAGS""
This reverts commit r197481, recommiting r197469 with an extra fix.

The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS.  Mark it so, or else the scheduler might
place it in the middle of another test+branch.

This fixes a bug exposed by r192750, which changed the initial scheduler
to source-order as part of enabling the MI Scheduler for X86.

This re-commit changes the VASTART_SAVE_XMM_REGS custom inserter not to
try to save %flags, and adds a test that catches the bad behavior of
r197469.

<rdar://problem/15627766>

llvm-svn: 197503
2013-12-17 15:54:45 +00:00
Duncan P. N. Exon Smith
3f31d678ca Revert "Mark vastart_save_xmm_regs as changing EFLAGS"
This reverts commit r197469.

The sanitizer and dragonegg buildbots are failing, I think because of
this change.  Reverting until I figure out why.

llvm-svn: 197481
2013-12-17 07:13:58 +00:00
Duncan P. N. Exon Smith
2cc99f0e39 Mark vastart_save_xmm_regs as changing EFLAGS
The vastart_save_xmm_regs pseudo-instruction expands to a test and a
branch, so it modifies EFLAGS.  Mark it so, or else the scheduler might
place it in the middle of another test+branch.

This fixes a bug exposed by r192750, which turned on the MI Scheduler
for X86.

<rdar://problem/15627766>

llvm-svn: 197469
2013-12-17 06:12:05 +00:00
Elena Demikhovsky
1c867680b8 AVX-512: Implemented CMOV for 512-bit vectors
llvm-svn: 193747
2013-10-31 13:15:32 +00:00
Eric Christopher
1a04817b81 Revert part of a fix from 2010, changes since then:
a) x86-64 TLS has been documented
b) the code path should use movq for the correct relocation
   to be generated.

I've also added a fixme for the test case that we should improve
the code generated, it should look something like is documented
in the tls abi document.

llvm-svn: 192631
2013-10-14 21:52:26 +00:00
Eric Christopher
d6f19023b0 Remove some extraneous whitespace.
llvm-svn: 192629
2013-10-14 21:52:18 +00:00
Craig Topper
82f1ff48e7 Mark that the _ftol2 function used by windows on x86 to handle fptoui modifies ECX.
llvm-svn: 186787
2013-07-21 07:28:13 +00:00
Tim Northover
8efc0e4868 X86: change MOV64ri64i32 into MOV32ri64
The MOV64ri64i32 instruction required hacky MCInst lowering because it
was allocated as setting a GR64, but the eventual instruction ("movl")
only set a GR32. This converts it into a so-called "MOV32ri64" which
still accepts a (appropriate) 64-bit immediate but defines a GR32.
This is then converted to the full GR64 by a SUBREG_TO_REG operation,
thus keeping everyone happy.

This fixes a typo in the opcode field of the original patch, which
should make the legact JIT work again (& adds test for that problem).

llvm-svn: 183068
2013-06-01 09:55:14 +00:00
Eric Christopher
e4ab862999 Temporarily Revert "X86: change MOV64ri64i32 into MOV32ri64" as it
seems to have caused PR16192 and other JIT related failures.

llvm-svn: 183059
2013-05-31 23:30:45 +00:00
Tim Northover
8940245595 X86: change MOV64ri64i32 into MOV32ri64
The MOV64ri64i32 instruction required hacky MCInst lowering because it was
allocated as setting a GR64, but the eventual instruction ("movl") only set a
GR32. This converts it into a so-called "MOV32ri64" which still accepts a
(appropriate) 64-bit immediate but defines a GR32. This is then converted to
the full GR64 by a SUBREG_TO_REG operation, thus keeping everyone happy.

llvm-svn: 182991
2013-05-31 09:57:13 +00:00
Tim Northover
aa5932cde5 X86: use sub-register sequences for MOV*r0 operations
Instead of having a bunch of separate MOV8r0, MOV16r0, ... pseudo-instructions,
it's better to use a single MOV32r0 (which will expand to "xorl %reg, %reg")
and obtain other sizes with EXTRACT_SUBREG and SUBREG_TO_REG. The encoding is
smaller and partial register updates can sometimes be avoided.

Until recently, this sequence was a barrier to rematerialization though. That
should now be fixed so it's an appropriate time to make the change.

llvm-svn: 182928
2013-05-30 13:19:42 +00:00
Tim Northover
4a589eb424 X86: change zext moves to use sub-register infrastructure.
32-bit writes on amd64 zero out the high bits of the corresponding 64-bit
register. LLVM makes use of this for zero-extension, but until now relied on
custom MCLowering and other code to fixup instructions. Now we have proper
handling of sub-registers, this can be done by creating SUBREG_TO_REG
instructions at selection-time.

Should be no change in functionality.

llvm-svn: 182921
2013-05-30 10:43:18 +00:00
Jakob Stoklund Olesen
19c4788c8a Annotate X86InstrCompiler.td with SchedRW lists.
llvm-svn: 177936
2013-03-25 23:07:35 +00:00
Jakob Stoklund Olesen
71393fdd98 Annotate X86InstrCompiler.td with SchedRW lists.
Add a new WriteZero SchedWrite type for the common dependency-breaking
instructions that clear a register.

llvm-svn: 177442
2013-03-19 21:16:56 +00:00
Ulrich Weigand
4570892d73 Remove an invalid and unnecessary Pat pattern from the X86 backend:
def : Pat<(load (i64 (X86Wrapper tglobaltlsaddr :$dst))),
            (MOV64rm tglobaltlsaddr :$dst)>;

This pattern is invalid because the MOV64rm instruction expects a
source operand of type "i64mem", which is a subclass of X86MemOperand
and thus actually consists of five MI operands, but the Pat provides
only a single MI operand ("tglobaltlsaddr" matches an SDnode of
type ISD::TargetGlobalTLSAddress and provides a single output).

Thus, if the pattern were ever matched, subsequent uses of the MOV64rm
instruction pattern would access uninitialized memory.  In addition,
with the TableGen patch I'm about to check in, this would actually be
reported as a build-time error.

Fortunately, the pattern does in fact never match, for at least two
independent reasons.

First, the code generator actually never generates a pattern of the
form (load (X86Wrapper (tglobaltlsaddr))).  For most combinations of
TLS and code models, (tglobaltlsaddr) represents just an offset that
needs to be added to some base register, so it is never directly
dereferenced.  The only exception is the initial-exec model, where
(tglobaltlsaddr) refers to the (pc-relative) address of a GOT slot,
which *is* in fact directly dereferenced: but in that case, the
X86WrapperRIP node is used, not X86Wrapper, so the Pat doesn't match.

Second, even if some patterns along those lines *were* ever generated,
we should not need an extra Pat pattern to match it.  Instead, the
original MOV64rm instruction pattern ought to match directly, since
it uses an "addr" operand, which is implemented via the SelectAddr
C++ routine; this routine is supposed to accept the full range of
input DAGs that may be implemented by a single mov instruction,
including those cases involving ISD::TargetGlobalTLSAddress (and
actually does so e.g. in the initial-exec case as above).

To avoid build breaks (due to the above-mentioned error) after the
TableGen patch is checked in, I'm removing this Pat here.

llvm-svn: 177426
2013-03-19 19:49:52 +00:00
Benjamin Kramer
bdb1d9aad3 X86: Disable cmov-memory patterns on subtargets without cmov.
Fixes PR15115.

llvm-svn: 175962
2013-02-23 10:40:58 +00:00
Michael Liao
f1ce1e547c Fix an issue of pseudo atomic instruction DAG schedule
- Add list of physical registers clobbered in pseudo atomic insts
  Physical registers are clobbered when pseudo atomic instructions are
  expanded. Add them in clobber list to prevent DAG scheduler to
  mis-schedule them after these insns are declared side-effect free.
- Add test case from Michael Kuperstein <michael.m.kuperstein@intel.com>

llvm-svn: 173200
2013-01-22 21:47:38 +00:00
Craig Topper
8884832622 Remove # from the beginning and end of def names.
llvm-svn: 171696
2013-01-07 05:26:58 +00:00
Craig Topper
fe4506dc6c Add hasSideEffects=0 to some atomic instructions.
llvm-svn: 171122
2012-12-26 23:08:12 +00:00
Michael Liao
a7e5913fde Add __builtin_setjmp/_longjmp supprt in X86 backend
- Besides used in SjLj exception handling, __builtin_setjmp/__longjmp is also
  used as a light-weight replacement of setjmp/longjmp which are used to
  implementation continuation, user-level threading, and etc. The support added
  in this patch ONLY addresses this usage and is NOT intended to support SjLj
  exception handling as zero-cost DWARF exception handling is used by default
  in X86.

llvm-svn: 165989
2012-10-15 22:39:43 +00:00
Benjamin Kramer
ea6041a09a X86: fcmov doesn't handle all possible EFLAGS, fall back to a branch for the others.
Otherwise it will try to use SSE patterns and fail horribly if sse is disabled.
Fixes PR14035.

llvm-svn: 165377
2012-10-07 15:34:27 +00:00
Craig Topper
ba721cd4c0 Remove some encoding bits I forgot to remove from SETB_C16r and SETB_C64r in r165302.
llvm-svn: 165303
2012-10-05 06:11:52 +00:00
Craig Topper
ec537b4c86 Move expansion of SETB_C(8/16/32/64)r from MCInstLower to ExpandPostRAPseudos and mark them as pseudos in the td file.
llvm-svn: 165302
2012-10-05 06:05:15 +00:00
Michael Liao
5412422f4a Add 'lock' prefix output support in assembly printer
- Instead of embedding 'lock' into each mnemonic of atomic
  instructions except 'xchg', we teach X86 assembly printer to output 'lock'
  prefix similar to or consistent with code emitter.

llvm-svn: 164659
2012-09-26 05:13:44 +00:00
Michael Liao
3d9c40c0c8 Fix 16-bit atomic inst encoding and keep pseudo-inst starting with '#'
llvm-svn: 164453
2012-09-22 05:41:15 +00:00
Michael Liao
0a4f3eefaf Fix typo in r164357
llvm-svn: 164452
2012-09-22 03:39:42 +00:00
Michael Liao
9a17cba52b Fix a typo in r164357
llvm-svn: 164372
2012-09-21 16:03:03 +00:00
Michael Liao
439a9cea68 Revise td of X86 atomic instructions
- Rewirte most atomic instructions in templates for both better
  maintenance and future extensions, such as HLE in TSX.

llvm-svn: 164357
2012-09-21 03:00:17 +00:00
Michael Liao
34658dca78 Re-work X86 code generation of atomic ops with spin-loop
- Rewrite/merge pseudo-atomic instruction emitters to address the
  following issue:
  * Reduce one unnecessary load in spin-loop

    previously the spin-loop looks like

        thisMBB:
        newMBB:
          ld  t1 = [bitinstr.addr]
          op  t2 = t1, [bitinstr.val]
          not t3 = t2  (if Invert)
          mov EAX = t1
          lcs dest = [bitinstr.addr], t3  [EAX is implicit]
          bz  newMBB
          fallthrough -->nextMBB

    the 'ld' at the beginning of newMBB should be lift out of the loop
    as lcs (or CMPXCHG on x86) will load the current memory value into
    EAX. This loop is refined as:

        thisMBB:
          EAX = LOAD [MI.addr]
        mainMBB:
          t1 = OP [MI.val], EAX
          LCMPXCHG [MI.addr], t1, [EAX is implicitly used & defined]
          JNE mainMBB
        sinkMBB:

  * Remove immopc as, so far, all pseudo-atomic instructions has
    all-register form only, there is no immedidate operand.

  * Remove unnecessary attributes/modifiers in pseudo-atomic instruction
    td

  * Fix issues in PR13458

- Add comprehensive tests on atomic ops on various data types.
  NOTE: Some of them are turned off due to missing functionality.

- Revise tests due to the new spin-loop generated.

llvm-svn: 164281
2012-09-20 03:06:15 +00:00
Jakob Stoklund Olesen
72138019a9 Fix the TCRETURNmi64 bug differently.
Add a PatFrag to match X86tcret using 6 fixed registers or less. This
avoids folding loads into TCRETURNmi64 using 7 or more volatile
registers.

<rdar://problem/12282281>

llvm-svn: 163819
2012-09-13 18:31:27 +00:00
Jakob Stoklund Olesen
eae8fc91cf Revert r163761 "Don't fold indexed loads into TCRETURNmi64."
The patch caused "Wrong topological sorting" assertions.

llvm-svn: 163810
2012-09-13 16:52:17 +00:00
Jakob Stoklund Olesen
b15912aafd Don't fold indexed loads into TCRETURNmi64.
We don't have enough GR64_TC registers when calling a varargs function
with 6 arguments. Since %al holds the number of vector registers used,
only %r11 is available as a scratch register.

This means that addressing modes using both base and index registers
can't be folded into TCRETURNmi64.

<rdar://problem/12282281>

llvm-svn: 163761
2012-09-13 00:25:00 +00:00
Hans Wennborg
4344ad4a86 Implement the local-dynamic TLS model for x86 (PR3985)
This implements codegen support for accesses to thread-local variables
using the local-dynamic model, and adds a clean-up pass so that the base
address for the TLS block can be re-used between local-dynamic access on
an execution path.

llvm-svn: 157818
2012-06-01 16:27:21 +00:00
Jakob Stoklund Olesen
88cf278739 Use ptr_rc_tailcall instead of GR32_TC.
The getPointerRegClass() hook will return GR32_TC, or whatever is
appropriate for the current function.

Patch by Yiannis Tsiouris!

llvm-svn: 156459
2012-05-09 01:50:09 +00:00
Manman Ren
6fde9f74b4 X86: optimization for -(x != 0)
This patch will optimize -(x != 0) on X86
FROM 
cmpl	$0x01,%edi
sbbl	%eax,%eax
notl	%eax
TO
negl %edi
sbbl %eax %eax

In order to generate negl, I added patterns in Target/X86/X86InstrCompiler.td:
def : Pat<(X86sub_flag 0, GR32:$src), (NEG32r GR32:$src)>;

rdar: 10961709
llvm-svn: 156312
2012-05-07 18:06:23 +00:00
Rafael Espindola
88a1aeb123 Always compute all the bits in ComputeMaskedBits.
This allows us to keep passing reduced masks to SimplifyDemandedBits, but
know about all the bits if SimplifyDemandedBits fails. This allows instcombine
to simplify cases like the one in the included testcase.

llvm-svn: 154011
2012-04-04 12:51:34 +00:00
Lang Hames
94d892c492 Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode.
llvm-svn: 153680
2012-03-29 19:54:28 +00:00
Preston Gurd
d1ae391210 This patch adds X86 instruction itineraries for non-pseudo opcodes in
X86InstrCompiler.td.
 
It also adds –mcpu-generic to the legalize-shift-64.ll test so the test
will pass if run on an Intel Atom CPU, which would otherwise
produce an instruction schedule which differs from that which the test expects.

llvm-svn: 153033
2012-03-19 14:10:12 +00:00
Michael J. Spencer
d2f0ce2674 Add WIN_FTOL_* psudo-instructions to model the unique calling convention
used by the Win32 _ftol2 runtime function. Patch by Joe Groff!

llvm-svn: 151382
2012-02-24 19:01:22 +00:00
Jakob Stoklund Olesen
b498ebe5b7 Use the same CALL instructions for Windows as for everything else.
The different calling conventions and call-preserved registers are
represented with regmask operands that are added dynamically.

llvm-svn: 150708
2012-02-16 17:56:02 +00:00
Eli Friedman
a343d87eac Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768.
llvm-svn: 148240
2012-01-16 16:42:21 +00:00
Eli Friedman
a2b480b010 Get rid of unused codegen-only instruction.
llvm-svn: 148239
2012-01-16 16:29:35 +00:00
Benjamin Kramer
ae4ad5f924 X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63.
llvm-svn: 148024
2012-01-12 12:41:34 +00:00
Chandler Carruth
9ef50ef1f7 Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the
X86ISelLowering C++ code. Because this is lowered via an xor wrapped
around a bsr, we want the dagcombine which runs after isel lowering to
have a chance to clean things up. In particular, it is very common to
see code which looks like:

  (sizeof(x)*8 - 1) ^ __builtin_clz(x)

Which is trying to compute the most significant bit of 'x'. That's
actually the value computed directly by the 'bsr' instruction, but if we
match it too late, we'll get completely redundant xor instructions.

The more naive code for the above (subtracting rather than using an xor)
still isn't handled correctly due to the dagcombine getting confused.

Also, while here fix an issue spotted by inspection: we should have been
expanding the zero-undef variants to the normal variants when there is
an 'lzcnt' instruction. Do so, and test for this. We don't want to
generate unnecessary 'bsr' instructions.

These two changes fix some regressions in encoding and decoding
benchmarks. However, there is still a *lot* to be improve on in this
type of code.

llvm-svn: 147244
2011-12-24 10:55:54 +00:00
Chandler Carruth
7564e8371a Begin teaching the X86 target how to efficiently codegen patterns that
use the zero-undefined variants of CTTZ and CTLZ. These are just simple
patterns for now, there is more to be done to make real world code using
these constructs be optimized and codegen'ed properly on X86.

The existing tests are spiffed up to check that we no longer generate
unnecessary cmov instructions, and that we generate the very important
'xor' to transform bsr which counts the index of the most significant
one bit to the number of leading (most significant) zero bits. Also they
now check that when the variant with defined zero result is used, the
cmov is still produced.

llvm-svn: 146974
2011-12-20 11:19:37 +00:00
Rafael Espindola
1958dc7193 Fixes an issue reported by -verify-machineinstrs.
Patch by Sanjoy Das.

llvm-svn: 143064
2011-10-26 21:16:41 +00:00
Rafael Espindola
90896edc6c This commit introduces two fake instructions MORESTACK_RET and
MORESTACK_RET_RESTORE_R10; which are lowered to a RET and a RET
followed by a MOV respectively.  Having a fake instruction prevents
the verifier from seeing a MachineBasicBlock end with a
non-terminator (MOV).  It also prevents the rather eccentric case of a
MachineBasicBlock ending with RET but having successors nevertheless.

Patch by Sanjoy Das.

llvm-svn: 143062
2011-10-26 21:12:27 +00:00
Eli Friedman
34ffc961d7 Fix the assembler strings for a couple of atomic instructions. Doesn't really matter much in practice, but it's a bit cleaner.
llvm-svn: 139563
2011-09-13 00:27:04 +00:00
Eli Friedman
9ea5599729 Fix atomic load and store on x86 to pass -verify-machineinstrs (and possibly fix some subtle bugs involving passes which check mayStore()).
This isn't exactly ideal, but it is good enough for the moment.

llvm-svn: 139245
2011-09-07 18:48:32 +00:00
Jakob Stoklund Olesen
ef8527b836 Pseudo CMOV instructions don't clobber EFLAGS.
The explanation about a 0 argument being materialized as xor is no
longer valid.  Rematerialization will check if EFLAGS is live before
clobbering it.

The code produced by X86TargetLowering::EmitLoweredSelect does not
clobber EFLAGS.

This causes one less testb instruction to be generated in the cmov.ll
test case.

llvm-svn: 139057
2011-09-02 23:52:55 +00:00
Rafael Espindola
7721c15106 Adds a SelectionDAG node X86SegAlloca which will be custom lowered
from DYNAMIC_STACKALLOC.

Two new pseudo instructions (SEG_ALLOCA_32 and SEG_ALLOCA_64) which
will match X86SegAlloca (based on word size) are also added.  They
will be custom emitted to inject the actual stack handling code.

Patch by Sanjoy Das.

llvm-svn: 138814
2011-08-30 19:43:21 +00:00
Eli Friedman
9f95c7d381 Add support for generating CMPXCHG16B on x86-64 for the cmpxchg IR instruction.
llvm-svn: 138660
2011-08-26 21:21:21 +00:00
Eli Friedman
6f95a6ae1b Basic x86 code generation for atomic load and store instructions.
llvm-svn: 138478
2011-08-24 20:50:09 +00:00
Bruno Cardoso Lopes
9a695724bd Add 256-bit support for v8i32, v4i64 and v4f64 ISD::SELECT. Fix PR10556
llvm-svn: 137179
2011-08-09 23:27:13 +00:00
Eli Friedman
44fd5b2b59 Fix a couple ridiculous copy-paste errors. rdar://9914773 .
llvm-svn: 137160
2011-08-09 22:17:39 +00:00
Eli Friedman
1a80401da2 X86ISD::MEMBARRIER does not require SSE2; it doesn't actually generate any code, and all x86 processors will honor the required semantics.
llvm-svn: 136249
2011-07-27 19:43:50 +00:00
Dan Gohman
4762d28ff9 Add a comment describing why transforming (shl x, 1) to (add x, x) is to be
considered safe enough in this context.

llvm-svn: 133159
2011-06-16 15:55:48 +00:00
Benjamin Kramer
85e86083d5 X86: smulo -> add is now done target-independently in DAGCombiner, remove the patterns.
llvm-svn: 131801
2011-05-21 18:32:01 +00:00
Stuart Hastings
e3158f93ec Re-commit 131641 with fixes; de-pseudoize MOVSX16rr8 and friends.
rdar://problem/8614450

llvm-svn: 131746
2011-05-20 19:04:40 +00:00
Stuart Hastings
ff15dfa12e Reverting 131641 to investigate 'bot complaint.
llvm-svn: 131654
2011-05-19 17:54:42 +00:00
Stuart Hastings
7baa1babdb Revise MOVSX16rr8/MOVZX16rr8 (and rm variants) to no longer be
pseudos.  rdar://problem/8614450

llvm-svn: 131641
2011-05-19 16:59:50 +00:00
Eric Christopher
c03ef7ebb3 Support XOR and AND optimization with no return value.
Finishes off rdar://8470697

llvm-svn: 131458
2011-05-17 08:10:18 +00:00
Eric Christopher
3c17ef53c3 Optimize atomic lock or that doesn't use the result value.
Next up: xor and and.

Part of rdar://8470697

llvm-svn: 131171
2011-05-10 23:57:45 +00:00
Eric Christopher
aa7c86ec19 Refactor lock versions of binary operators to be a little less
cut and paste.

llvm-svn: 131139
2011-05-10 18:36:16 +00:00
Benjamin Kramer
ba7c9948e8 X86: Add a bunch of peeps for add and sub of SETB.
"b + ((a < b) ? 1 : 0)" compiles into
	cmpl	%esi, %edi
	adcl	$0, %esi
instead of
	cmpl	%esi, %edi
	sbbl	%eax, %eax
	andl	$1, %eax
	addl	%esi, %eax

This saves a register, a false dependency on %eax
(Intel's CPUs still don't ignore it) and it's shorter.

llvm-svn: 131070
2011-05-08 18:36:07 +00:00
Dan Gohman
71117af2db The labyrinthine X86 backend no longer appears to require
these patterns.

llvm-svn: 125759
2011-02-17 18:50:19 +00:00
NAKAMURA Takumi
8ace7260cc Target/X86: Tweak win64's tailcall.
llvm-svn: 124272
2011-01-26 02:04:09 +00:00
NAKAMURA Takumi
066378440a Fix whitespace.
llvm-svn: 124270
2011-01-26 02:03:37 +00:00
Eric Christopher
e8aa8b114f The stub routine that we're calling uses test and so clobbers
the flags.

llvm-svn: 123712
2011-01-18 01:37:20 +00:00
Chris Lattner
2d4e17d195 We lower setb to sbb with the hope that the and will go away, when it
doesn't, match it back to setb.

On a 64-bit version of the testcase before we'd get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	sbbb	%dl, %dl
	andb	$1, %dl
	ret

now we get:

	movq	%rdi, %rax
	addq	%rsi, %rax
	setb	%dl
	ret

llvm-svn: 122217
2010-12-20 01:16:03 +00:00
Chris Lattner
297259f6f1 improve the setcc -> setcc_carry optimization to happen more
consistently by moving it out of lowering into dag combine.

Add some missing patterns for matching away extended versions of setcc_c.

llvm-svn: 122201
2010-12-19 22:08:31 +00:00
Evan Cheng
72dca1ee17 Only rr forms of ADD*_DB are commutable.
llvm-svn: 121908
2010-12-15 22:57:36 +00:00
Eric Christopher
cc8a622ca4 Add rsp to the uses for the same reason as 32-bit.
llvm-svn: 121328
2010-12-09 00:26:41 +00:00
Rafael Espindola
9287c4b38f Move lowering of TLS_addr32 and TLS_addr64 to X86MCInstLower.
llvm-svn: 120263
2010-11-28 21:16:39 +00:00
Rafael Espindola
45cd9713f2 Lower TLS_addr32 and TLS_addr64.
llvm-svn: 120225
2010-11-27 20:43:02 +00:00
Chris Lattner
9da275f86b reject instructions that contain a \n in their asmstring. Mark
various X86 and ARM instructions that are bitten by this as isCodeGenOnly,
as they are.

llvm-svn: 117884
2010-11-01 00:46:16 +00:00
Chris Lattner
5d088218e5 two changes: make the asmmatcher generator ignore ARM pseudos properly,
and make it a hard error for instructions to not have an asm string.
These instructions should be marked isCodeGenOnly.

llvm-svn: 117861
2010-10-31 19:15:18 +00:00
Michael J. Spencer
5a68d7ce94 X86: Add alloca probing to dynamic alloca on Windows. Fixes PR8424.
llvm-svn: 116984
2010-10-21 01:41:01 +00:00
Michael J. Spencer
54b462089f Fix Whitespace.
llvm-svn: 116972
2010-10-20 23:40:27 +00:00