1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00
Commit Graph

3550 Commits

Author SHA1 Message Date
Simon Pilgrim
c23de517a6 [X86][AVX] Fold loads + splats into broadcast instructions
On AVX and AVX2, BROADCAST instructions can load a scalar into all elements of a target vector.

This patch improves the lowering of 'splat' shuffles of a loaded vector into a broadcast - currently the lowering only works for cases where we are splatting the zero'th element, which is now generalised to any element.

Fix for PR23022

Differential Revision: http://reviews.llvm.org/D15310

llvm-svn: 255061
2015-12-08 22:17:11 +00:00
Elena Demikhovsky
80454cacaf VX-512: Fixed a bug in FP logic operation lowering
FP logic instructions are supported in DQ extension on AVX-512 target.
I use integer operations instead.
Added tests.
I also enabled FABS in this patch in order to check ANDPS.
The operations are FOR, FXOR, FAND, FANDN.
The instructions, that supported for 512-bit vector under DQ are:
VORPS/PD, VXORPS/PD, VANDPS/PD, FANDNPS/PD.

Differential Revision: http://reviews.llvm.org/D15110

llvm-svn: 254913
2015-12-07 14:33:34 +00:00
Elena Demikhovsky
4e301fd23f AVX-512: Fixed masked load / store instruction selection for KNL.
Patterns were missing for KNL target for <8 x i32>, <8 x float> masked load/store.

This intrinsic comes with all legal types:
<8 x float> @llvm.masked.load.v8f32(<8 x float>* %addr, i32 align, <8 x i1> %mask, <8 x float> %passThru),
but still requires lowering, because VMASKMOVPS, VMASKMOVDQU32 work with 512-bit vectors only.

All data operands should be widened to 512-bit vector.
The mask operand should be widened to v16i1 with zeroes.

Differential Revision: http://reviews.llvm.org/D15265

llvm-svn: 254909
2015-12-07 13:39:24 +00:00
Igor Breger
2e5da39635 AVX-512: implement kunpck intrinsics.
Differential Revision: http://reviews.llvm.org/D14821

llvm-svn: 254908
2015-12-07 13:25:18 +00:00
Igor Breger
83ce603ff7 AVX512: support AVX512BW Intrinsic in 32bit mode.
Differential Revision: http://reviews.llvm.org/D15076

llvm-svn: 254873
2015-12-06 11:35:18 +00:00
Hans Wennborg
b8bd41122a X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported
These instructions are not supported by all CPUs in 64-bit mode. Emitting them
causes Chromium to crash on start-up for users with such chips.

(GCC puts these instructions behind -msahf on 64-bit for the same reason.)

This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets
and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering
from before r244503 when the instructions are not available.

Differential Revision: http://reviews.llvm.org/D15240

llvm-svn: 254793
2015-12-04 23:00:33 +00:00
Reid Kleckner
5ceb906da9 [X86] Put no-op ADJCALLSTACK markers around all dynamic lowerings
Summary:
These ADJCALLSTACK markers don't generate code, but they keep dynamic
alloca code that calls chkstk out of the prologue.

This slightly pessimizes inalloca calls by preventing some register copy
coalescing, but I can live with that.

Reviewers: qcolombet

Subscribers: hans, llvm-commits

Differential Revision: http://reviews.llvm.org/D15200

llvm-svn: 254645
2015-12-03 20:46:59 +00:00
David Majnemer
56dee65385 Move EH-specific helper functions to a more appropriate place
No functionality change is intended.

llvm-svn: 254562
2015-12-02 23:06:39 +00:00
Simon Pilgrim
2bce37605d [X86][FMA] Optimize FNEG(FMUL) Patterns
On FMA targets, we can avoid having to load a constant to negate a float/double multiply by instead using a FNMSUB (-(X*Y)-0)

Fix for PR24366

Differential Revision: http://reviews.llvm.org/D14909

llvm-svn: 254495
2015-12-02 09:07:55 +00:00
Asaf Badouh
d6d08d5567 [X86][AVX512] add comi with Sae
add builtin_ia32_vcomisd and builtin_ia32_vcomisd

Differential Revision: http://reviews.llvm.org/D14331

llvm-svn: 254493
2015-12-02 08:17:51 +00:00
Craig Topper
1e6d9c9b53 [X86] Change getZeroVector to take an MVT instead of EVT. One minor change needed to only try to perform 256-it shuffle combines on legal vector types.
llvm-svn: 254490
2015-12-02 06:39:19 +00:00
Craig Topper
c5149a5512 [X86] Fix weird identation. NFC
llvm-svn: 254487
2015-12-02 05:24:38 +00:00
Sanjay Patel
01aa788c76 [x86] add a convenience method to check for FMA capability; NFCI
llvm-svn: 254425
2015-12-01 17:27:55 +00:00
Sanjay Patel
4a07635eab fix formatting; NFC
llvm-svn: 254310
2015-11-30 17:52:02 +00:00
Aaron Ballman
b3f88e2a89 Silencing a 32-bit to 64-bit implicit conversion warning; NFC.
llvm-svn: 254302
2015-11-30 14:52:33 +00:00
Craig Topper
32bf88a844 [AVX512] The vpermi2 instructions require an integer vector for the index vector. This is reflected correctly in the intrinsics, but was not refelected in the isel patterns.
For the floating point types, this requires adding a bitcast to the index vector when its passed through to the output.

llvm-svn: 254277
2015-11-30 00:13:24 +00:00
Craig Topper
233dd30406 [X86] int_x86_avx2_permps and X86ISD::VPERMV should take an integer vector for its shuffle indices.
llvm-svn: 254269
2015-11-29 22:53:22 +00:00
Simon Pilgrim
a5289493dc [X86][SSE] Added support for lowering to ADDSUBPS/ADDSUBPD with commuted inputs
We could already recognise shuffle(FSUB, FADD) -> ADDSUB, this allow us to recognise shuffle(FADD, FSUB) -> ADDSUB by commuting the shuffle mask prior to matching.

llvm-svn: 254259
2015-11-29 16:41:04 +00:00
Craig Topper
852ecf7fb6 [X86] Split ISD node for Vfpclass and Vfpclasss so that we can write strong type constraints for each that don't cause ambiguous isel.
llvm-svn: 254172
2015-11-26 19:41:34 +00:00
Artyom Skrobov
3803dae0a6 Expose isXxxConstant() functions from SelectionDAGNodes.h (NFC)
Summary:
Many target lowerings copy-paste the code to test SDValues for known constants.
This code can instead be shared in SelectionDAG.cpp, and reused in the targets.

Reviewers: MatzeB, andreadb, tstellarAMD

Subscribers: arsenm, jyknight, llvm-commits

Differential Revision: http://reviews.llvm.org/D14945

llvm-svn: 254085
2015-11-25 19:41:11 +00:00
Elena Demikhovsky
f792042843 AVX-512: Fixed a bug in VPERMT2* intrinsic.
It was wrong order of operands (from intrinsic to DAG node).
I added more strict type specification for instruction selection.

Differential Revision: http://reviews.llvm.org/D14942

llvm-svn: 254059
2015-11-25 08:17:56 +00:00
Kaelyn Takata
37f6ab1581 Fix an asan error where NumElements > 32 for at least one case in
test/CodeGen/X86/avg.ll.

llvm-svn: 254043
2015-11-25 00:03:29 +00:00
Simon Pilgrim
cded55a15d [X86][FMA] Optimize FNEG(FMA) Patterns
X86 needs to use its own FMA opcodes, preventing the standard FNEG(FMA) pattern table recognition method used by other platforms. This patch adds support for lowering FNEG(FMA(X,Y,Z)) into a single suitably negated FMA instruction.

Fix for PR24364

Differential Revision: http://reviews.llvm.org/D14906

llvm-svn: 254016
2015-11-24 20:31:46 +00:00
Cong Hou
c0bb26286b [X86] Fix several issues related to X86's psadbw instruction.
This patch fixes the following issues:

1. Fix the return type of X86psadbw: it should not be the same type of inputs.
   For vNi8 inputs the output should be vMi64, where M = N/8.
2. Fix the return type of int_x86_avx512_psad_bw_512 accordingly.
3. Fix the definiton of PSADBW, VPSADBW, and VPSADBWY accordingly.
4. Adjust the return type when building a DAG node of X86ISD::PSADBW type.
5. Update related tests.


Differential revision: http://reviews.llvm.org/D14897

llvm-svn: 254010
2015-11-24 19:51:26 +00:00
Cong Hou
6fe6cafdd5 [X86][SSE] Detect AVG pattern during instruction combine for SSE2/AVX2/AVX512BW.
This patch detects the AVG pattern in vectorized code, which is simply
c = (a + b + 1) / 2, where a, b, and c have the same type which are vectors of
either unsigned i8 or unsigned i16. In the IR, i8/i16 will be promoted to
i32 before any arithmetic operations. The following IR shows such an example:

%1 = zext <N x i8> %a to <N x i32>
%2 = zext <N x i8> %b to <N x i32>
%3 = add nuw nsw <N x i32> %1, <i32 1 x N>
%4 = add nuw nsw <N x i32> %3, %2
%5 = lshr <N x i32> %N, <i32 1 x N>
%6 = trunc <N x i32> %5 to <N x i8>

and with this patch it will be converted to a X86ISD::AVG instruction.

The pattern recognition is done when combining instructions just before type
legalization during instruction selection. We do it here because after type
legalization, it is much more difficult to do pattern recognition based
on many instructions that are doing type conversions. Therefore, for
target-specific instructions (like X86ISD::AVG), we need to take care of type
legalization by ourselves. However, as X86ISD::AVG behaves similarly to
ISD::ADD, I am wondering if there is a way to legalize operands and result
types of X86ISD::AVG together with ISD::ADD. It seems that the current design
doesn't support this idea.

Tests are added for SSE2, AVX2, and AVX512BW and both i8 and i16 types of
variant vector sizes.


Differential revision: http://reviews.llvm.org/D14761

llvm-svn: 253952
2015-11-24 05:44:19 +00:00
Elena Demikhovsky
678dc46339 AVX-512: Optimized INSERT_SUBVECTOR for i1 vector types
ISERT_SUBVECTOR for i1 vectors may be done with shifts, when we insert into the lower part, or into the upper part, on into all-zero vector.
CONCAT_VECTORS uses ISERT_SUBVECTOR.

Differential Revision: http://reviews.llvm.org/D14815

llvm-svn: 253819
2015-11-22 13:57:38 +00:00
Sanjay Patel
0b3c9472a6 fix formatting; NFC
llvm-svn: 253802
2015-11-22 00:03:16 +00:00
Simon Pilgrim
764ff90848 [X86][SSE4A] Fix issue with EXTRQI shuffles not starting at the correct start index.
Found during stress testing.

llvm-svn: 253611
2015-11-19 22:13:56 +00:00
Hans Wennborg
1ead7346cd X86: More efficient legalization of wide integer compares
In particular, this makes the code for 64-bit compares on 32-bit targets
much more efficient.

Example:

  define i32 @test_slt(i64 %a, i64 %b) {
  entry:
    %cmp = icmp slt i64 %a, %b
    br i1 %cmp, label %bb1, label %bb2
  bb1:
    ret i32 1
  bb2:
    ret i32 2
  }

Before this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          setae   %al
          cmpl    16(%esp), %ecx
          setge   %cl
          je      .LBB2_2
          movb    %cl, %al
  .LBB2_2:
          testb   %al, %al
          jne     .LBB2_4
          movl    $1, %eax
          retl
  .LBB2_4:
          movl    $2, %eax
          retl

After this patch:

  test_slt:
          movl    4(%esp), %eax
          movl    8(%esp), %ecx
          cmpl    12(%esp), %eax
          sbbl    16(%esp), %ecx
          jge     .LBB1_2
          movl    $1, %eax
          retl
  .LBB1_2:
          movl    $2, %eax
          retl

Differential Revision: http://reviews.llvm.org/D14496

llvm-svn: 253572
2015-11-19 16:35:08 +00:00
Asaf Badouh
e49f73285d [X86][AVX512CD] add mask broadcast intrinsics
Differential Revision: http://reviews.llvm.org/D14573

llvm-svn: 253450
2015-11-18 09:42:45 +00:00
Reid Kleckner
00daa6cd20 [WinEH] Move WinEHFuncInfo from MachineModuleInfo to MachineFunction
Summary:
Now that there is a one-to-one mapping from MachineFunction to
WinEHFuncInfo, we don't need to use a DenseMap to select the right
WinEHFuncInfo for the current funclet.

The main challenge here is that X86WinEHStatePass is an IR pass that
doesn't have access to the MachineFunction. I gave it its own
WinEHFuncInfo object that it uses to calculate state numbers, which it
then throws away. As long as nobody creates or removes EH pads between
this pass and SDAG construction, we will get the same state numbers.

The other thing X86WinEHStatePass does is to mark the EH registration
node. Instead of communicating which alloca was the registration through
WinEHFuncInfo, I added the llvm.x86.seh.ehregnode intrinsic.  This
intrinsic generates no code and simply marks the alloca in use.

Reviewers: JCTremoulet

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14668

llvm-svn: 253378
2015-11-17 21:10:25 +00:00
Simon Pilgrim
a8167da1ce [X86][SSE] Tidyup with implicit SDValue bool check. NFC.
llvm-svn: 253171
2015-11-15 14:57:07 +00:00
Cong Hou
2d7895e79a [X86][SSE] Combine UNPCKL with vector_shuffle into UNPCKH to save one instruction for sext from v16i8 to v16i16 and v8i16 to v8i32.
This patch is enabling combining UNPCKL with vector_shuffle that moves the upper
half of a vector into the lower half, into a UNPCKH instruction. For example:

t2: v16i8 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t1, undef:v16i8
t3: v16i8 = X86ISD::UNPCKL undef:v16i8, t2

will be combined to:

t3: v16i8 = X86ISD::UNPCKH undef:v16i8, t1


Differential revision: http://reviews.llvm.org/D14399

llvm-svn: 253067
2015-11-13 19:47:43 +00:00
Reid Kleckner
1584024037 [WinEH] Make UnwindHelp a fixed stack object allocated after XMM CSRs
Now the offset of UnwindHelp in our EH tables and the offset that we
store to in the prologue agree.

llvm-svn: 253059
2015-11-13 19:06:01 +00:00
Joseph Tremoulet
1a906d8f77 [WinEH] Find root frame correctly in CLR funclets
Summary:
The value that the CoreCLR personality passes to a funclet for the
establisher frame may be the root function's frame or may be the parent
funclet's (mostly empty) frame in the case of nested funclets.  Each
funclet stores a pointer to the root frame in its own (mostly empty)
frame, as does the root function itself.  All frames allocate this slot at
the same offset, measured from the post-prolog stack pointer, so that the
same sequence can accept any ancestor as an establisher frame parameter
value, and so that a single offset can be reported to the GC, which also
looks at this slot.

This change allocate the slot when processing function entry, and records
its frame index on the WinEHFuncInfo object, then inserts the code to
set/copy it during prolog emission.


Reviewers: majnemer, AndyAyers, pgavlin, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14614

llvm-svn: 252983
2015-11-13 00:39:23 +00:00
Manman Ren
3e70e28803 [TLS on Darwin] use a different mask for tls calls on x86-64.
Calls involved in thread-local variable lookup save more registers
than normal calls.

rdar://problem/23073171

llvm-svn: 252837
2015-11-12 00:54:04 +00:00
Joseph Tremoulet
022a01ea22 [WinEH] Only generate UnwindHelp slot for MSVCXX
Summary: Other personalities don't use this special frame slot.

Reviewers: majnemer, andrew.w.kaylor, rnk

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D14580

llvm-svn: 252778
2015-11-11 19:21:09 +00:00
Reid Kleckner
30c859c3a9 [WinEH] Insert the MBB for EH_RESTORE after the catchret
Inserting it before the target block could be bad, we might already have
a fallthrough edge to it.

llvm-svn: 252670
2015-11-10 23:22:20 +00:00
Michael Kuperstein
fa11182125 [X86] Do not try to custom-lower sitofp/fptosi in soft-float mode
Differential Revision: http://reviews.llvm.org/D14495

llvm-svn: 252621
2015-11-10 17:37:49 +00:00
David Blaikie
474d4d0ac0 Remove another variable unused in -Asserts build
llvm-svn: 252582
2015-11-10 04:10:04 +00:00
David Blaikie
a9d50b2e4f Remove some unused variables to clean up the -Werror build
llvm-svn: 252580
2015-11-10 03:16:28 +00:00
Andy Ayers
abf4852bd1 Support for emitting inline stack probes
For CoreCLR on Windows, stack probes must be emitted as inline sequences that probe successive stack pages
between the current stack limit and the desired new stack pointer location. This implements support for
the inline expansion on x64.

For in-body alloca probes, expansion is done during instruction lowering. For prolog probes, a stub call
is initially emitted during prolog creation, and expanded after epilog generation, to avoid complications
that arise when introducing new machine basic blocks during prolog and epilog creation.

Added a new test case, modified an existing one to exclude non-x64 coreclr (for now).

Add test case

Fix tests

llvm-svn: 252578
2015-11-10 01:50:49 +00:00
David Majnemer
624c46d055 [WinEH] Don't emit CATCHRET from visitCatchPad
Instead, emit a CATCHPAD node which will get selected to a target
specific sequence.

llvm-svn: 252528
2015-11-09 23:07:48 +00:00
David Majnemer
b682989c7b [WinEH] Update PHIs of CATCHRET successors
The TailDuplication machine pass ran across a malformed CFG: a PHI node
referred it's predecessor's predecessor instead of it's predecessor.
This occurred because we split the edge in X86ISelLowering when we
processed the CATCHRET but forgot to do something about the PHI nodes.

This fixes PR25444.

llvm-svn: 252413
2015-11-08 02:36:00 +00:00
Joseph Tremoulet
6a3a04f00f [WinEH] Update exception pointer registers
Summary:
The CLR's personality routine passes these in rdx/edx, not rax/eax.

Make getExceptionPointerRegister a virtual method parameterized by
personality function to allow making this distinction.

Similarly make getExceptionSelectorRegister a virtual method parameterized
by personality function, for symmetry.


Reviewers: pgavlin, majnemer, rnk

Subscribers: jyknight, dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D14344

llvm-svn: 252383
2015-11-07 01:11:31 +00:00
Ahmed Bougacha
fe77e7174c [X86] SRL non-LSB extracts when folding to truncating broadcasts.
Now that we recognize this, we can support it instead of bailing out.
That is, we can fold:
  (v8i16 (shufflevector
    (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
    <1,1,...,1>))
into:
  (v8i16 (vbroadcast (i16 (trunc (srl Y, 16)))))

llvm-svn: 252362
2015-11-06 23:16:43 +00:00
Ahmed Bougacha
92bc790d0a [X86] Don't fold non-LSB extracts into truncating broadcasts.
We used to incorrectly assume that the offset we're extracting from
was a multiple of the element size. So, we'd fold:
  (v8i16 (shufflevector
    (v8i16 (bitcast (v4i32 (build_vector X, Y, ...)))),
    <1,1,...,1>))
into:
  (v8i16 (vbroadcast (i16 (trunc Y))))
whereas we should have extracted the higher bits from X.

Instead, bail out if the assumption doesn't hold.

llvm-svn: 252361
2015-11-06 23:16:38 +00:00
Reid Kleckner
a707d8e0c6 [WinEH] Split EH_RESTORE out of CATCHRET for 32-bit EH
This adds the EH_RESTORE x86 pseudo instr, which is responsible for
restoring the stack pointers: EBP and ESP, and ESI if stack realignment
is involved. We only need this on 32-bit x86, because on x64 the runtime
restores CSRs for us.

Previously we had to keep the CATCHRET instruction around during SEH so
that we could convince X86FrameLowering to restore our frame pointers.
Now we can split these instructions earlier.

This was confusing, because we had a return instruction which wasn't
really a return and was ultimately going to be removed by
X86FrameLowering. This change also simplifies X86FrameLowering, which
really shouldn't be building new MBBs.

No observable functional change currently, but with the new register
mask stuff in D14407, CATCHRET will become a register allocator barrier,
and our existing tests rely on us having reasonable register allocation
around SEH.

llvm-svn: 252266
2015-11-06 01:49:05 +00:00
Reid Kleckner
c9f0155a98 [WinEH] Fix funclet prologues with stack realignment
We already had a test for this for 32-bit SEH catchpads, but those don't
actually create funclets. We had a bug that only appeared in funclet
prologues, where we would establish EBP and ESI as our FP and BP, and
then downstream prologue code would overwrite them.

While I was at it, I fixed Win64+funclets+stackrealign. This issue
doesn't come up as often there due to the ABI requring 16 byte stack
alignment, but now we can rest easy that AVX and WinEH will work well
together =P.

llvm-svn: 252210
2015-11-05 21:09:49 +00:00
Asaf Badouh
f3f551dd7e revert rev. 252153 due to build failure on ubuntu
[X86][AVX512] add comi with Sae

llvm-svn: 252154
2015-11-05 08:55:54 +00:00