1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

10331 Commits

Author SHA1 Message Date
Tim Northover
f7aeb57c87 X86: delegate expanding atomic libcalls to generic code.
On targets without cmpxchg16b or cmpxchg8b, the borderline atomic
operations were slipping through the gaps.

X86AtomicExpand.cpp was delegating to ISelLowering. Generic
ISelLowering was delegating to X86ISelLowering and X86ISelLowering was
asserting. The correct behaviour is to expand to a libcall, preferably
in generic ISelLowering.

This can be achieved by X86ISelLowering deciding it doesn't want the
faff after all.

llvm-svn: 212134
2014-07-01 21:44:59 +00:00
Tim Northover
60e9ada729 X86: expand atomics in IR instead of as MachineInstrs.
The logic for expanding atomics that aren't natively supported in
terms of cmpxchg loops is much simpler to express at the IR level. It
also allows the normal optimisations and CodeGen improvements to help
out with atomics, instead of using a limited set of possible
instructions..

rdar://problem/13496295

llvm-svn: 212119
2014-07-01 18:53:31 +00:00
Adam Nemet
da774db14e [X86] AVX512: Allow writemasks with vpcmp
For now I only updated the _alt variants.  The main variants are used by
codegen and that will need a bit more work to trigger.

<rdar://problem/17492620>

llvm-svn: 212114
2014-07-01 18:03:45 +00:00
Adam Nemet
3b6318203d [X86] AVX512: Factor generating the AsmString into avx512_icmp_cc
Adding a writemask variant would require a third asm string to be passed to
the template.  Generate the AsmString in the template instead.

No change in X86.td.expanded.

llvm-svn: 212113
2014-07-01 18:03:43 +00:00
Reid Kleckner
f824c863c7 Fix .seh_stackalloc 0
seh_stackalloc 0 is not representable in Win64 SEH info, so emitting it
is a bug.

Reviewers: rnk

Differential Revision: http://reviews.llvm.org/D4334

Patch by Vadim Chugunov!

llvm-svn: 212081
2014-07-01 00:42:47 +00:00
Andrea Di Biagio
2bc066b1b4 [X86] Add support for builtin to read performance monitoring counters.
This patch adds support for a new builtin instruction called
__builtin_ia32_rdpmc.

Builtin '__builtin_ia32_rdpmc' is defined as a 'GCC builtin'; on X86, it can
be used to read performance monitoring counters. It takes as input the index
of the performance counter to read, and returns the value of the specified
performance counter as a 64-bit number.

Calls to this new builtin will map to instruction RDPMC.
The index in input to the builtin call is moved to register %ECX. The result
of the builtin call is the value of the specified performance counter (RDPMC
would return that quantity in registers RDX:RAX).

This patch:
 - Adds builtin int_x86_rdpmc as a GCCBuiltin;
 - Adds a new x86 DAG node called 'RDPMC_DAG';
 - Teaches how to lower this new builtin;
 - Adds an ISel pattern to select instruction RDPMC;
 - Fixes the definition of instruction RDPMC adding %RAX and %RDX as
   implicit definitions, and adding %ECX as implicit use;
 - Adds a LLVM test to verify that the new builtin is correctly selected.

llvm-svn: 212049
2014-06-30 17:14:21 +00:00
Saleem Abdulrasool
73ee195392 X86: fix comment
Fix a comment typo `DbgLocLImport` instead of `DLLImport`.

llvm-svn: 212012
2014-06-30 03:11:18 +00:00
Saleem Abdulrasool
499f3c1213 CodeGen: rename Win64 ExceptionHandling to WinEH
This exception format is not specific to Windows x64.  A similar approach is
taken on nearly all architectures.  Generalise the name to reflect reality.
This will eventually be used for Windows on ARM data emission as well.

Switch the enum and namespace into an enum class.

llvm-svn: 212000
2014-06-29 21:43:47 +00:00
Saleem Abdulrasool
d45105fbb4 MC: rename EmitWin64EH routines
Rename the routines to reflect the reality that they are more related to call
frame information than to Win64 EH. Although EH is implemented in an intertwined
manner by augmenting with an exception handler and an associated parameter, the
majority of these routines emit information required to unwind the frames. This
also helps identify that these routines are generic for most windows platforms
(they apply equally to nearly all architectures except x86) although the
encoding of the information is architecture dependent.

Unwinding data is emitted via EmitWinCFI* and exception handling information via
EmitWinEH*.

llvm-svn: 211994
2014-06-29 01:52:01 +00:00
Chandler Carruth
747f8eca83 [x86] Fix a bug in the v8i16 shuffling exposed by the new splat-like
lowering for v16i8.

ASan and some bots caught this bug with existing test cases. Fixing it
even fixed a miscompile with one of the test cases. I'm still a bit
suspicious of this test case as I've not taken a proper amount of time
to think about it, but the fix here is strict goodness.

llvm-svn: 211976
2014-06-28 05:46:28 +00:00
Chandler Carruth
d3455a84b3 [x86] Add handling for splat-like widenings of v16i8 shuffles.
These show up really frequently, not the least with actual splats. =] We
lowered these quite badly before. The new code path tries to widen i8
shuffles to i16 shuffles in a splat-like way. There are still some
inefficiencies in our i16 splat logic though, so we aren't really done
here.

Also, for certain patterns (bit of a gather-and-splat) we still
generate pretty silly code, and I've left a fixme for addressing it.
However, I'm not actually worried about this code pattern as much. The
old shuffle lowering generates a 29 instruction monstrosity for it that
should execute much more slowly.

llvm-svn: 211974
2014-06-28 05:16:40 +00:00
Chandler Carruth
c99cbd9cc7 [x86] Fix another bug hit when bootstrapping with the new shuffle
lowering.

For maximum irony, I had already discovered this bug, diagnosed it, and
left FIXMEs about it in the test cases. =[ I just failed to go back over
those until after i had reduced a bootstrap miscompile down to a single
TU, stared at the assembly for an hour, and figured out the bug. Again.

Oh well.

llvm-svn: 211955
2014-06-27 20:07:40 +00:00
Chandler Carruth
ec4fb8a59b [x86] Fix a miscompile in the new shuffle lowering uncovered by
a bootstrap.

I managed to mis-remember how PACKUS worked on x86, and was using undef
for the high bytes instead of zero. The fix is fairly obvious.

llvm-svn: 211922
2014-06-27 18:25:23 +00:00
Juergen Ributzka
f7b405f472 [FastISel][X86] Fix typos.
llvm-svn: 211911
2014-06-27 17:16:34 +00:00
Alexander Kornienko
3be4bd19cf Clean up unused variable warning in release build.
llvm-svn: 211902
2014-06-27 15:30:55 +00:00
Chandler Carruth
608f397a25 [x86] Clean up some unused variables, especially in release builds.
llvm-svn: 211894
2014-06-27 12:04:18 +00:00
Chandler Carruth
e7aaf337d0 [x86] Teach the target combine step to aggressively fold pshufd insturcions.
Summary:
This allows it to fold pshufd instructions across intervening
half-shuffles and other noise. This pattern actually shows up in the
generic lowering tests, but I've also added direct tests using
intrinsics to make sure that the specific desired functionality is
working even if the lowering stuff changes in the future.

Differential Revision: http://reviews.llvm.org/D4292

llvm-svn: 211892
2014-06-27 11:40:13 +00:00
Chandler Carruth
127f1b532c [x86] Teach the target-specific combining how to aggressively fold
half-shuffles, even looking through intervening instructions in a chain.

Summary:
This doesn't happen to show up with any test cases I've found for the current
shuffle lowering, but previous attempts would benefit from this and it seems
generally useful. I've tested it directly using intrinsics, which also shows
that it will work with hand vectorized code as well.

Note that even though pshufd isn't directly used in these tests, it gets
exercised because we combine some of the half shuffles into a pshufd
first, and then merge them.

Differential Revision: http://reviews.llvm.org/D4291

llvm-svn: 211890
2014-06-27 11:34:40 +00:00
Chandler Carruth
c62b16ce89 [x86] Teach the X86 backend to DAG-combine SSE2 shuffles that are
trivially redundant.

This fixes several cases in the new vector shuffle lowering algorithm
which would generate redundant shuffle instructions for the sake of
simplicity.

I'm also deleting a testcase which was somewhat ridiculous. It was
checking for a bug in 2007 about incorrectly transforming shuffles by
looking for the string "-86" in the output of a pretty substantial
function. This test case doesn't seem to have any value at this point.

Differential Revision: http://reviews.llvm.org/D4240

llvm-svn: 211889
2014-06-27 11:27:52 +00:00
Chandler Carruth
e28f70fce3 [x86] Begin a significant overhaul of how vector lowering is done in the
x86 backend.

This sketches out a new code path for vector lowering, hidden behind an
off-by-default flag while it is under development. The fundamental idea
behind the new code path is to aggressively break down the problem space
in ways that ease selecting the odd set of instructions available on
x86, and carefully avoid scalarizing code even when forced to use older
ISAs. Notably, this starts off restricting itself to SSE2 and implements
the complete vector shuffle and blend space for 128-bit vectors in SSE2
without scalarizing. The plan is to layer on top of this ISA extensions
where we can bail out of the complex SSE2 lowering and opt for
a cheaper, specialized instruction (or set of instructions). It also
needs to be generalized to AVX and AVX512 vector widths.

Currently, this does a decent but not perfect job for SSE2. There are
some specific shortcomings that I plan to address:
- We need a peephole combine to fold together shuffles where possible.
  There are cases where a previous shuffle could be modified slightly to
  arrange for elements to be in the correct position and a later shuffle
  eliminated. Doing this eagerly added quite a bit of complexity, and
  so my plan is to combine away these redundancies afterward.
- There are a lot more clever ways to use unpck and pack that need to be
  added. This is essential for real world shuffles as it turns out...

Once SSE2 is polished a bit I should be able to get interesting numbers
on performance improvements on benchmarks conducive to vectorization.
All of this will be off by default until it is functionally equivalent
of course.

Differential Revision: http://reviews.llvm.org/D4225

llvm-svn: 211888
2014-06-27 11:23:44 +00:00
Craig Topper
9586f4812f Rename getX86ConditonCode -> getX86ConditionCode
llvm-svn: 211869
2014-06-27 05:18:21 +00:00
Adam Nemet
e14e099b2e [X86] AVX512: Add vbroadcasti*
For now I used a separate template for these sub-vector/tuple broadcasts
rather than sharing the mem variants with avx512_int_broadcast_rm.

<rdar://problem/17402869>

llvm-svn: 211828
2014-06-27 00:43:38 +00:00
Alp Toker
97022b0c1f Revert "Introduce a string_ostream string builder facilty"
Temporarily back out commits r211749, r211752 and r211754.

llvm-svn: 211814
2014-06-26 22:52:05 +00:00
Eric Christopher
04713c650c Remove extraneous includes from the target machines.
llvm-svn: 211800
2014-06-26 19:30:05 +00:00
Andrea Di Biagio
d30e61d81e Silence a warning due to a comparison between signed and unsigned.
No functional change intended.

llvm-svn: 211782
2014-06-26 13:41:10 +00:00
Andrea Di Biagio
3be8532e69 [X86] Improve the selection of SSE3/AVX addsub instructions.
This patch teaches the backend how to canonicalize a shuffle vectors
according to the rule:

 - (shuffle (FADD A, B), (FSUB A, B), Mask) ->
       (shuffle (FSUB A, -B), (FADD A, -B), Mask)

Where 'Mask' is:
  <0,5,2,7>            ;; for v4f32 and v4f64 shuffles.
  <0,3>                ;; for v2f64 shuffles.
  <0,9,2,11,4,13,6,15> ;; for v8f32 shuffles.

In general, ISel only knows how to pattern-match a canonical
'fadd + fsub + blendi' dag node sequence into an ADDSUB instruction.

This new rule allows to convert a non-canonical dag sequence into a
canonical one that will be matched by a single ADDSUB at ISel stage.

The idea of converting a non-canonical ADDSUB into a canonical one by
swapping the first two operands of the shuffle, and then negating the
second operand of the FADD and FSUB, was originally proposed by Hal Finkel.

llvm-svn: 211771
2014-06-26 10:45:21 +00:00
Adam Nemet
aa04918ad4 [X86] AVX512: Fix asm syntax for packed vcmp
The *_alt defs for vcmp are used by the InstParser (the asm string in the main
def is used by the InstPrinter) .  The former was accepting vector registers
as destination rather than mask registers.

llvm-svn: 211750
2014-06-26 00:21:12 +00:00
Alp Toker
fd9ead3b6f Introduce a string_ostream string builder facilty
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.

small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.

This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.

The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.

llvm-svn: 211749
2014-06-26 00:00:48 +00:00
Juergen Ributzka
97839af0ec [FastISel][X86] More refactoring of select lowering and XALU folding. NFC.
llvm-svn: 211740
2014-06-25 22:50:59 +00:00
Juergen Ributzka
70d663743e [FastISel][X86] Refactor XALU folding. NFC.
llvm-svn: 211735
2014-06-25 22:17:23 +00:00
Juergen Ributzka
87fbbdf13d [FastISel][X86] Only fold the cmp into the select when both instructions are in the same basic block.
If the cmp is in a different basic block, then it is possible that not all
operands of that compare have defined registers. This can happen when one of
the operands to the cmp is a load and the load gets folded into the cmp. In
this case FastISel will skip the load instruction and the vreg is never
defined.

llvm-svn: 211730
2014-06-25 20:06:12 +00:00
Andrea Di Biagio
de7985179f [X86] Always prefer to lower a VECTOR_SHUFFLE into a BLENDI instead of SHUFP (or VPERM2X128).
This patch teaches method 'LowerVECTOR_SHUFFLE' to give higher precedence to
the check for 'isBlendMask'; the idea is that, when possible, we should firstly
check if a shuffle performs a blend, and in case, try to lower it into a BLENDI
instead of selecting a SHUFP or (worse) a VPERM2X128.

In general:
 - AVX VBLENDPS/D always have better latency and throughput than VPERM2F128;
 - BLENDPS/D instructions tend to always have better 'reciprocal throughput'
   than the equivalent SHUFPS/D;
 - Both BLENDPS/D and SHUFPS/D are often decoded into the same number of
   m-ops; however, a m-op obtained from a BLENDPS/D can be scheduled to more
   than one execution port.

This patch:
 - Moves the check for 'isBlendMask' immediately before the check for
   'isSHUFPMask' within method 'LowerVECTOR_SHUFFLE';
 - Updates existing tests for sse/avx shuffle/blend instructions to verify
   that we select (v)blendps/d when possible (instead of (v)shufps/d or
   vperm2f128).

llvm-svn: 211720
2014-06-25 17:41:58 +00:00
Juergen Ributzka
15c30a5d46 Fix indentation.
llvm-svn: 211717
2014-06-25 16:49:37 +00:00
Chandler Carruth
2262038283 [x86] Add intrinsics for the pshufd, pshuflw, and pshufhw instructions.
llvm-svn: 211694
2014-06-25 13:12:54 +00:00
NAKAMURA Takumi
c5a2c81f7e Re-apply r211399, "Generate native unwind info on Win64" with a fix to ignore SEH pseudo ops in X86 JIT emitter.
--
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211691
2014-06-25 12:41:52 +00:00
NAKAMURA Takumi
35a44c8eda Reformat.
llvm-svn: 211689
2014-06-25 12:40:56 +00:00
Andrea Di Biagio
7f4e676911 [X86] Add target combine rule to select ADDSUB instructions from a build_vector
This patch teaches the backend how to combine a build_vector that implements
an 'addsub' between packed float vectors into a sequence of vector add
and vector sub followed by a VSELECT.

The new VSELECT is expected to be lowered into a BLENDI.
At ISel stage, the sequence 'vector add + vector sub + BLENDI' is
pattern-matched against ISel patterns added at r211427 to select
'addsub' instructions.
Added three more ISel patterns for ADDSUB.

Added test sse3-avx-addsub-2.ll to verify that we correctly emit 'addsub'
instructions.

llvm-svn: 211679
2014-06-25 10:02:21 +00:00
Juergen Ributzka
236ea1c61a [FastISel][X86] Fold XALU condition into branch and compare.
Optimize the codegen of select and branch instructions to directly use the
EFLAGS from the {s|u}{add|sub|mul}.with.overflow intrinsics.

llvm-svn: 211645
2014-06-24 23:51:21 +00:00
Robert Khasanov
c1b0024016 vpblend intrinsics combines as shifts intrinsics due to absence return stmt between them
Fix PR20088

Differential Revision: http://reviews.llvm.org/D4277

llvm-svn: 211617
2014-06-24 18:08:04 +00:00
Adam Nemet
d1b4b6771c [Disasm][AVX512] Implement decoding of top bit for non-destructive reg fields
V' bit in the P2 byte of the EVEX prefix provides the top bit of the NDD and
NDS register fields.  This was simply not used in the decoder until now.

Fixes <rdar://problem/17402661>

llvm-svn: 211565
2014-06-24 01:42:32 +00:00
Juergen Ributzka
978bad2e2e [FastISel][X86] Lower unsupported selects to control-flow.
The extends the select lowering coverage by emiting pseudo cmov
instructions. These insturction will be later on lowered to control-flow to
simulate the select.

llvm-svn: 211545
2014-06-23 21:55:44 +00:00
Juergen Ributzka
9a7053da70 [FastISel][X86] Add support for floating-point select.
This extends the select lowering to support floating-point selects. The
lowering depends on SSE instructions and that the conditon comes from a
floating-point compare. Under this conditions it is possible to emit an
optimized instruction sequence that doesn't require any branches to
simulate the select.

llvm-svn: 211544
2014-06-23 21:55:40 +00:00
Juergen Ributzka
5ad4bec279 [FastISel][X86] Optimize selects when the condition comes from a compare.
Optimize the select instructions sequence to use the EFLAGS directly from a
compare when possible.

llvm-svn: 211543
2014-06-23 21:55:36 +00:00
NAKAMURA Takumi
eca89a0522 Revert r211399, "Generate native unwind info on Win64"
It broke Legacy JIT Tests on x86_64-{mingw32|msvc}, aka Windows x64.

llvm-svn: 211480
2014-06-22 22:00:56 +00:00
Filipe Cabecinhas
f28bdb1d10 Fix PR20087 by using the source index when changing the vector load
llvm-svn: 211472
2014-06-22 17:21:37 +00:00
Andrea Di Biagio
53a4d382f7 [X86] Add ISel patterns to select SSE3/AVX ADDSUB instructions.
This patch adds ISel patterns to select SSE3/AVX ADDSUB instructions
from a sequence of "vadd + vsub + blend".

Example:

///
typedef float float4 __attribute__((ext_vector_type(4)));

float4 foo(float4 A, float4 B) {
  float4 X = A - B;
  float4 Y = A + B;
  return (float4){X[0], Y[1], X[2], Y[3]};
}
///

Before this patch, (with flag -mcpu=corei7) llc produced the following
assembly sequence:
  movaps  %xmm0, %xmm2
  addps   %xmm1, %xmm2
  subps   %xmm1, %xmm0
  blendps $10, %xmm2, %xmm0


With this patch, we now get a single
  addsubps  %xmm1, %xmm0

llvm-svn: 211427
2014-06-21 01:31:15 +00:00
Rafael Espindola
28af33f6db Delete dead code.
The compact unwind info is only used by code that knows it is supported.

llvm-svn: 211412
2014-06-20 22:30:31 +00:00
Rafael Espindola
962bd7f649 Don't produce eh_frame relocations when targeting the IOS simulator.
First step for fixing pr19185.

llvm-svn: 211404
2014-06-20 21:15:27 +00:00
Reid Kleckner
40d9c6f936 Generate native unwind info on Win64
This patch enables LLVM to emit Win64-native unwind info rather than
DWARF CFI.  It handles all corner cases (I hope), including stack
realignment.

Because the unwind info is not flexible enough to describe stack frames
with a gap of unknown size in the middle, such as the one caused by
stack realignment, I modified register spilling code to place all spills
into the fixed frame slots, so that they can be accessed relative to the
frame pointer.

Patch by Vadim Chugunov!

Reviewed By: rnk

Differential Revision: http://reviews.llvm.org/D4081

llvm-svn: 211399
2014-06-20 20:35:47 +00:00
Karthik Bhat
fb8456f1af Add Support to Recognize and Vectorize NON SIMD instructions in SLPVectorizer.
This patch adds support to recognize patterns such as fadd,fsub,fadd,fsub.../add,sub,add,sub... and
vectorizes them as vector shuffles if they are profitable.
These patterns of vector shuffle can later be converted to instructions such as addsubpd etc on X86.
Thanks to Arnold and Hal for the reviews. http://reviews.llvm.org/D4015 

llvm-svn: 211339
2014-06-20 04:32:48 +00:00