1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

14827 Commits

Author SHA1 Message Date
Dimitry Andric
d3685839f9 Use correct registers for "A" inline asm constraint
Summary:
In PR32594, inline assembly using the 'A' constraint on x86_64 causes
llvm to crash with a "Cannot select" stack trace.  This is because
`X86TargetLowering::getRegForInlineAsmConstraint` hardcodes that 'A'
means the EAX and EDX registers.

However, on x86_64 it means the RAX and RDX registers, and on 16-bit x86
(ia16?) it means the old AX and DX registers.

Add new register classes in `X86RegisterInfo.td` to support these cases,
and amend the logic in `getRegForInlineAsmConstraint` to cope with
different subtargets.  Also add a test case, derived from PR32594.

Reviewers: craig.topper, qcolombet, RKSimon, ab

Reviewed By: ab

Subscribers: ab, emaste, royger, llvm-commits

Differential Revision: https://reviews.llvm.org/D31902

llvm-svn: 300404
2017-04-15 22:15:01 +00:00
Reid Kleckner
4559e65607 [IR] Make paramHasAttr to use arg indices instead of attr indices
This avoids the confusing 'CS.paramHasAttr(ArgNo + 1, Foo)' pattern.

Previously we were testing return value attributes with index 0, so I
introduced hasReturnAttr() for that use case.

llvm-svn: 300367
2017-04-14 20:19:02 +00:00
Simon Pilgrim
b9c453fa71 [X86][SSE] Update MOVNTDQA non-temporal loads to generic implementation (LLVM)
MOVNTDQA non-temporal aligned vector loads can be correctly represented using generic builtin loads, allowing us to remove the existing x86 intrinsics.

Clang companion patch: D31766.

Differential Revision: https://reviews.llvm.org/D31767

llvm-svn: 300325
2017-04-14 15:05:35 +00:00
Andrew V. Tischenko
b1cf1f0925 This patch closes PR#32216: Better testing of schedule model instruction latencies/throughputs.
The details are here: https://reviews.llvm.org/D30941

llvm-svn: 300311
2017-04-14 07:44:23 +00:00
Serge Pavlov
27205e467c Use methods to access data stored with frame instructions
Instructions CALLSEQ_START..CALLSEQ_END and their target dependent
counterparts keep data like frame size, stack adjustment etc. These
data are accessed by getOperand using hard coded indices. It is
error prone way. This change implements the access by special methods,
which improve readability and allow changing data representation without
massive changes of index values.

Differential Revision: https://reviews.llvm.org/D31953

llvm-svn: 300196
2017-04-13 14:10:52 +00:00
Ayman Musa
90f0f0a976 [X86] Added missing mayLoad/mayStore attributes to some X86 instructions.
Throughout the effort of automatically generating the X86 memory folding tables these missing information were encountered.
This is a preparation work for a future patch including the automation of these tables.

Differential Revision: https://reviews.llvm.org/D31714

llvm-svn: 300190
2017-04-13 10:03:45 +00:00
Ayman Musa
fb33fa064c [X86] Change instructions names to keep consistency with the naming convention. NFC
Differential Revision: https://reviews.llvm.org/D31743

llvm-svn: 300184
2017-04-13 09:12:32 +00:00
Easwaran Raman
92e788d7d1 Fix the bootstrap failure caused by r299986.
llvm-svn: 300069
2017-04-12 15:26:15 +00:00
Igor Breger
3cf58dd1e2 [GlobalIsel][X86] support G_CONSTANT selection.
Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: llvm-commits, dberris, rovka, kristof.beyls

Differential Revision: https://reviews.llvm.org/D31974

llvm-svn: 300057
2017-04-12 12:54:54 +00:00
Jonas Paulsson
90b172efa0 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Easwaran Raman
083f7e69ba [x86] Relax the check in areLoadsFromSameBasePtr
Check if the scale operand is identical (doesn't have to be 1) and
do not check the chaain operand.

Differential revision: https://reviews.llvm.org/D31833

llvm-svn: 299986
2017-04-11 21:05:02 +00:00
Davide Italiano
1805846aae [X86] Create the correct ADC/SBB SDNode when lowering add.
Differential Revision:  https://reviews.llvm.org/D31911

llvm-svn: 299973
2017-04-11 19:11:20 +00:00
Serge Guelton
6ca59da244 Module::getOrInsertFunction is using C-style vararg instead of variadic templates.
From a user prospective, it forces the use of an annoying nullptr to mark the end of the vararg, and there's not type checking on the arguments.
The variadic template is an obvious solution to both issues.

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299949
2017-04-11 15:01:18 +00:00
Diana Picus
14168afb62 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299925 because it broke the buildbots. See e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/6008

llvm-svn: 299928
2017-04-11 10:07:12 +00:00
Serge Guelton
0c5a821318 Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

llvm-svn: 299925
2017-04-11 08:36:52 +00:00
Simon Pilgrim
786bd13a82 [X86][MMX] Add fast-isel support for MMX non-temporal writes
Differential Revision: https://reviews.llvm.org/D31754

llvm-svn: 299852
2017-04-10 16:58:07 +00:00
Dehao Chen
e2d4caaef2 Use PMADDWD to expand reduction in a loop
Summary:
PMADDWD can help improve 8/16 bit integer mutliply-add operation performance for cases like:

for (int i = 0; i < count; i++)
  a += x[i] * y[i];

Reviewers: wmi, davidxl, hfinkel, RKSimon, zvi, mkuper

Reviewed By: mkuper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31679

llvm-svn: 299776
2017-04-07 15:41:52 +00:00
Igor Breger
69332a5db0 [GlobalISel] implement narrowing for G_CONSTANT.
Summary: [GlobalISel] implement narrowing for G_CONSTANT.

Reviewers: bogner, zvi, t.p.northover

Reviewed By: t.p.northover

Subscribers: llvm-commits, dberris, rovka, kristof.beyls

Differential Revision: https://reviews.llvm.org/D31744

llvm-svn: 299772
2017-04-07 14:41:59 +00:00
Michael Kuperstein
9967b578ea [X86] Revert r299387 due to AVX legalization infinite loop.
llvm-svn: 299720
2017-04-06 22:33:25 +00:00
Mehdi Amini
176fa1e694 Revert "Turn some C-style vararg into variadic templates"
This reverts commit r299699, the examples needs to be updated.

llvm-svn: 299702
2017-04-06 20:23:57 +00:00
Mehdi Amini
fc06818ec6 Turn some C-style vararg into variadic templates
Module::getOrInsertFunction is using C-style vararg instead of
variadic templates.

From a user prospective, it forces the use of an annoying nullptr
to mark the end of the vararg, and there's not type checking on the
arguments. The variadic template is an obvious solution to both
issues.

Patch by: Serge Guelton <serge.guelton@telecom-bretagne.eu>

Differential Revision: https://reviews.llvm.org/D31070

llvm-svn: 299699
2017-04-06 20:09:31 +00:00
Daniel Sanders
64f3f18f53 [globalisel][tablegen] Move <Target>InstructionSelector declarations to anonymous namespaces
Summary: This resolves the issue of tablegen-erated includes in the headers for non-GlobalISel builds in a simpler way than before.

Reviewers: qcolombet, ab

Reviewed By: ab

Subscribers: igorb, ab, mgorny, dberris, rovka, llvm-commits, kristof.beyls

Differential Revision: https://reviews.llvm.org/D30998

llvm-svn: 299637
2017-04-06 09:49:34 +00:00
Keno Fischer
c27c9fda8a [X86 TTI] Implement LSV hook
Summary:
LSV wants to know the maximum size that can be loaded to a vector register.
On X86, this always matches the maximum register width. Implement this
accordingly and add a test to make sure that LSV can vectorize up to the
maximum permissible width on X86.

Reviewers: delena, arsenm

Reviewed By: arsenm

Subscribers: wdng, llvm-commits

Differential Revision: https://reviews.llvm.org/D31504

llvm-svn: 299589
2017-04-05 20:51:38 +00:00
Sanjay Patel
bb37f0efa2 [DAGCombiner] add and use TLI hook to convert and-of-seteq / or-of-setne to bitwise logic+setcc (PR32401)
This is a generic combine enabled via target hook to reduce icmp logic as discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32401

It's likely that other targets will want to enable this hook for scalar transforms, 
and there are probably other patterns that can use bitwise logic to reduce comparisons.

Note that we are missing an IR canonicalization for these patterns, and we will probably
prefer the pair-of-compares form in IR (shorter, more likely to fold).

Differential Revision: https://reviews.llvm.org/D31483

llvm-svn: 299542
2017-04-05 14:09:39 +00:00
Simon Pilgrim
f6fed8721d [X86][SSE] Renamed combine to make it clear that it only handles the vector shift by immediate opcodes. NFCI
llvm-svn: 299532
2017-04-05 10:44:42 +00:00
Alex Bradbury
4a2d4860e6 Add MCContext argument to MCAsmBackend::applyFixup for error reporting
A number of backends (AArch64, MIPS, ARM) have been using
MCContext::reportError to report issues such as out-of-range fixup values in
their TgtAsmBackend. This is great, but because MCContext couldn't easily be
threaded through to the adjustFixupValue helper function from its usual
callsite (applyFixup), these backends ended up adding an MCContext* argument
and adding another call to applyFixup to processFixupValue. Adding an
MCContext parameter to applyFixup makes this unnecessary, and even better -
applyFixup can take a reference to MCContext rather than a potentially null
pointer.

Differential Revision: https://reviews.llvm.org/D30264

llvm-svn: 299529
2017-04-05 10:16:14 +00:00
Ahmed Bougacha
aed61d98c5 [X86] Relax assert in broadcast-of-subvector lowering.
Before r294774, there was a problem when lowering broadcasts to use
128-bit subvectors.

When we looked through a bitcast to find the broadcast input, we'd keep
using the original type, so you'd end up with things like:
  (v8f32 (broadcast
    (v4f32 (extract_subvector
      (v8i32 V),
      ...))
    ))

r294774 fixed it to always emit subvectors with the scalar type of the
original source.

It also introduced some asserts, to check that we use scalars with
the same size, and vectors with the same number of elements.

The scalar size equality is checked earlier when looking through bitcasts,
and is a useful assert.

However, the number of elements don't have to be identical: we're always
going to extract a 128-bit subvector, and we can have different size
inputs if we looked through a concat_vector to find a 256-bit source.

Relax the overzealous assert.

Replace it with a check of the original source vector being 256 or 512
bits.  If it's 128 bits, we can't extract_subvector from it.

Fixes PR32371.

llvm-svn: 299490
2017-04-05 00:14:39 +00:00
Sanjay Patel
6ea8420a3c [x86] remove dead select-of-constants transform; NFCI
https://reviews.llvm.org/D30537 / https://reviews.llvm.org/rL296977 added these transforms
and other related transforms to the generic DAGCombiner (with a hook that x86 sets to true),
so these patterns should not exist by the time we reach the target-specific combiner hook.

llvm-svn: 299448
2017-04-04 16:54:58 +00:00
Coby Tayree
75e681a367 [X86][MS-compatability]Allow named synonymous for MS-assembly operators
This patch enhances X86AsmParser's immediate expression parsing abilities, to include a named synonymous for selected binary/unary bitwise operators: {and,shl,shr,or,xor,not}, ultimately achieving better MS-compatability
MASM reference:
https://msdn.microsoft.com/en-us/library/94b6khh4.aspx

Differential Revision: D31277

llvm-svn: 299439
2017-04-04 14:43:23 +00:00
Simon Pilgrim
981bbb8176 Strip trailing whitespace
llvm-svn: 299438
2017-04-04 14:40:53 +00:00
Michael Zuckerman
aaee867d86 [X86][LLVM] Converting __mm{|256|512}_movm_epi{8|16|32|64} LLVMIR call into generic intrinsics.
This patch is a part one of two reviews, one for the clang and the other for LLVM. 
The patch deletes the back-end intrinsics and adds support for them in the auto upgrade.

Differential Revision: https://reviews.llvm.org/D31393

llvm-svn: 299432
2017-04-04 13:32:14 +00:00
Oren Ben Simhon
c298b766f8 [X86] Add 64 bit pattern matching for PSADBW
PSADBW pattern currently supports the 32 bit IR pattern and only GLT (greather than) comparison.
The patch extends the pattern to catch also 64 bit IR pattern and includes all other comparison types (not only GLT).

Differential Revision: https://reviews.llvm.org/D31577

llvm-svn: 299425
2017-04-04 10:23:18 +00:00
Simon Pilgrim
3d76c6d2c8 [X86][SSE]] Lower BUILD_VECTOR with repeated elts as BUILD_VECTOR + VECTOR_SHUFFLE
It can be costly to transfer from the gprs to the xmm registers and can prevent loads merging.

This patch splits vXi16/vXi32/vXi64 BUILD_VECTORS that use the same operand in multiple elements into a BUILD_VECTOR with only a single insertion of each of those elements and then performs an unary shuffle to duplicate the values.

There are a couple of minor regressions this patch unearths due to some missing MOVDDUP/BROADCAST folds that I will address in a future patch.

Note: Now that vector shuffle lowering and combining is pretty good we should be reusing that instead of duplicating so much in LowerBUILD_VECTOR - this is the first of several patches to address this.

Differential Revision: https://reviews.llvm.org/D31373

llvm-svn: 299387
2017-04-03 21:06:51 +00:00
Amjad Aboud
a42dec166e x86 interrupt calling convention: re-align stack pointer on 64-bit if an error code was pushed
The x86_64 ABI requires that the stack is 16 byte aligned on function calls. Thus, the 8-byte error code, which is pushed by the CPU for certain exceptions, leads to a misaligned stack. This results in bugs such as Bug 26413, where misaligned movaps instructions are generated.

This commit fixes the misalignment by adjusting the stack pointer in these cases. The adjustment is done at the beginning of the prologue generation by subtracting another 8 bytes from the stack pointer. These additional bytes are popped again in the function epilogue.

Fixes Bug 26413

Patch by Philipp Oppermann.

Differential Revision: https://reviews.llvm.org/D30049

llvm-svn: 299383
2017-04-03 20:28:45 +00:00
Craig Topper
9b334f2b8c [APInt] Move isMask and isShiftedMask out of APIntOps and into the APInt class. Implement them without memory allocation for multiword
This moves the isMask and isShiftedMask functions to be class methods. They now use the MathExtras.h function for single word size and leading/trailing zeros/ones or countPopulation for the multiword size. The previous implementation made multiple temorary memory allocations to do the bitwise arithmetic operations to match the MathExtras.h implementation.

Differential Revision: https://reviews.llvm.org/D31565

llvm-svn: 299362
2017-04-03 16:34:59 +00:00
Simon Pilgrim
cd51879a46 [X86][MMX] Improve support for folding fptosi from XMM to MMX
llvm-svn: 299338
2017-04-02 17:45:41 +00:00
Simon Pilgrim
bd73b67ed2 [X86][MMX] Simplify tablegen patterns by always combining MOVDQ2Q from v2i64
llvm-svn: 299336
2017-04-02 16:20:34 +00:00
Simon Pilgrim
8a56903b79 [X86][MMX] Added support for subvector extraction to MMX register
llvm-svn: 299335
2017-04-02 15:52:28 +00:00
Craig Topper
40bbf69094 [AVX-512] Update lowering for gather/scatter prefetch intrinsics to match the immediate encodings the frontend uses based on the _MM_HINT_T0/T1 constant values in clang's headers.
Our _MM_HINT_T0/T1 constant values are 3/2 which matches gcc, but not icc or Intel documentation. Interestingly gcc had this same bug on their implementation of the gather/scatter builtins at one point too.

Fixes PR32411.

llvm-svn: 299234
2017-03-31 17:24:29 +00:00
Simon Pilgrim
026e8c9b44 [DAGCombiner] Add vector demanded elements support to ComputeNumSignBits
Currently ComputeNumSignBits returns the minimum number of sign bits for all elements of vector data, when we may only be interested in one/some of the elements.

This patch adds a DemandedElts argument that allows us to specify the elements we actually care about. The original ComputeNumSignBits implementation calls with a DemandedElts demanding all elements to match current behaviour. Scalar types set this to 1.

I've only added support for BUILD_VECTOR and EXTRACT_VECTOR_ELT so far, all others will default to demanding all elements but can be updated in due course.

Followup to D25691.

Differential Revision: https://reviews.llvm.org/D31311

llvm-svn: 299219
2017-03-31 13:54:09 +00:00
Simon Pilgrim
4864971eb2 [DAGCombiner] Add vector demanded elements support to computeKnownBitsForTargetNode
Follow up to D25691, this sets up the plumbing necessary to support vector demanded elements support in known bits calculations in target nodes.

Differential Revision: https://reviews.llvm.org/D31249

llvm-svn: 299201
2017-03-31 11:24:16 +00:00
Craig Topper
d8e999615b [AVX-512] Fix bad comment from r299112. NFC
llvm-svn: 299114
2017-03-30 21:05:33 +00:00
Craig Topper
74c3cc2d1c [AVX-512] Fix another case where fastisel was generating a GR8 to VK1 copy. This time after calls returning i1.
Fixes PR32472.

llvm-svn: 299112
2017-03-30 21:02:52 +00:00
Simon Pilgrim
0743569a2a Spelling mistakes in comments. NFCI.
Based on corrections mentioned in patch for clang for PR27635

llvm-svn: 299072
2017-03-30 12:59:53 +00:00
Simon Pilgrim
b8d88195ba Spelling mistakes in comments. NFCI.
llvm-svn: 299069
2017-03-30 12:30:15 +00:00
Davide Italiano
847c19e1a7 [X86IselLowering] Remove extraneous semicolon. NFCI.
Unbreaks the build with GCC -Werror.

llvm-svn: 299030
2017-03-29 21:34:58 +00:00
Simon Pilgrim
dd2e64944e [X86] Tidied up comment - we don't custom lower add/sub i64 on i686 anymore. NFCI.
llvm-svn: 299004
2017-03-29 15:41:58 +00:00
Simon Pilgrim
1ceebc6423 Spelling mistakes in comments. NFCI.
llvm-svn: 299000
2017-03-29 15:27:24 +00:00
Simon Pilgrim
68651f86d5 [X86][AVX2] Prevent unary interleaving patterns from calling lowerVectorShuffleAsSplitOrBlend (PR32453)
llvm-svn: 298993
2017-03-29 13:00:00 +00:00
Simon Pilgrim
a350d043db [X86] Removed old comment. NFCI.
No longer makes sense as the previous opcode mnemonic it was referring to is long gone.

llvm-svn: 298988
2017-03-29 10:44:51 +00:00
Eric Christopher
24af11b88f Move the x86 cpu feature rtm from Haswell to Skylake matching clang commit r298956.
llvm-svn: 298986
2017-03-29 07:40:44 +00:00
Craig Topper
2eb92d18e0 [AVX-512] Remove explicit KMOVWrk from isel patterns. COPY_TO_REGCLASS to GR32 is enough.
llvm-svn: 298985
2017-03-29 07:31:56 +00:00
Craig Topper
6b818e650d [AVX-512] Remove explicit KMOVWrk/KMOVWKr instructions from patterns where we can just use COPY_TO_REGCLASS instead.
This will result in a KMOVW or KMOVD being emitted during register allocation. And in at least some cases this might allow the register coalescer to remove the copy all together.

llvm-svn: 298984
2017-03-29 06:55:28 +00:00
Craig Topper
c62d4ea689 [AVX-512] Punt on fast-isel of truncates to i1 when AVX512 is enabled.
We should be masking the value and emitting a register copy like we do in non-fast isel. Instead we were just updating the value map and emitting nothing.

After r298928 we started seeing cases where we would create a copy from GR8 to GR32 because the source register in a VK1 to GR32 copy was replaced by the GR8 going into a truncate.

This fixes PR32451.

llvm-svn: 298957
2017-03-28 23:20:37 +00:00
Simon Pilgrim
bbb304742b [X86][MMX] Match MMX fp_to_sint conversions from XMM registers
We currently perform the various fp_to_sint XMM conversion and then transfer to the MMX register (on 32-bit via the stack).

This patch improves support for MOVDQ2Q XMM to MMX transfers and adds the XMM->MMX fp_to_sint direct conversion patterns. The SSE2 specifications are the same as for XMM->XMM and XMM->MMX rounding/exceptions/etc.

Differential Revision: https://reviews.llvm.org/D30868

llvm-svn: 298943
2017-03-28 21:32:11 +00:00
Sanjay Patel
f3276fa38d [x86] use VPMOVMSK to replace memcmp libcalls for 32-byte equality
Follow-up to:
https://reviews.llvm.org/rL298775

llvm-svn: 298933
2017-03-28 17:23:49 +00:00
Simon Pilgrim
5e47794ea5 [X86][AVX2] Add support for combining v16i16 shuffles to VPBLENDW
llvm-svn: 298929
2017-03-28 16:40:38 +00:00
Craig Topper
d41951c118 [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.

This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.

I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.

Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.

This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This caused a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.

Differential Revision: https://reviews.llvm.org/D30968

llvm-svn: 298928
2017-03-28 16:35:29 +00:00
Simon Pilgrim
01b548bf49 [X86][SSE] Refactored shuffle BLEND combining to make future 16i16 support easier. NFCI.
Call the matchVectorShuffleAsBlend test as early as possible.

llvm-svn: 298925
2017-03-28 15:50:23 +00:00
Simon Pilgrim
44109f53a4 Fix signed/unsigned comparison warning
llvm-svn: 298917
2017-03-28 13:40:09 +00:00
Simon Pilgrim
7c1f65226b [X86][SSE] Begin merging vector shuffle to BLEND for lowering and combining.
Split off matchVectorShuffleAsBlend from lowerVectorShuffleAsBlend for reuse in combining.

llvm-svn: 298914
2017-03-28 13:05:48 +00:00
Simon Pilgrim
19851ede33 Wdocumentation fix
llvm-svn: 298911
2017-03-28 12:29:09 +00:00
Simon Pilgrim
a19c4cac9c [X86][SSE] Set second operand to undef instead of first operand in unary shuffle combines.
Copy isn't necessary after the matchVectorShuffleWithUNPCK refactor and undef value will make some future undef/zero handling easier.

llvm-svn: 298910
2017-03-28 12:16:42 +00:00
Simon Pilgrim
5ff4da442a Strip trailing whitespace
llvm-svn: 298909
2017-03-28 11:15:17 +00:00
Igor Breger
a52916b4b8 [GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
    G_LOAD/G_STORE, add alternative RegisterBank mapping.
    For G_LOAD, Fast and Greedy mode choose the same RegisterBank mapping (GprRegBank ) for the G_GLOAD + G_FADD , can't get rid of cross register bank copy GprRegBank->VecRegBank.

    Reviewers: zvi, rovka, qcolombet, ab

    Reviewed By: zvi

    Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

    Differential Revision: https://reviews.llvm.org/D30979

llvm-svn: 298907
2017-03-28 09:35:06 +00:00
Gadi Haber
7a1dc6aba1 [X86][AVX2] bugzilla bug 21281 Performance regression in vector interleave in AVX2
This is a patch for an on-going bugzilla bug 21281 on the generated X86 code for a matrix transpose8x8 subroutine which requires vector interleaving. The generated code in AVX2 is currently non-optimal and requires 60 instructions as opposed to only 40 instructions generated for AVX1.
 The patch includes a fix for the AVX2 case where vector unpack instructions use less operations than the vector blend operations available in AVX2.
 In this case using vector unpack instructions is more efficient.

Reviewers:
zvi  
delena  
igorb  
craig.topper  
guyblank  
eladcohen  
m_zuckerman  
aymanmus  
RKSimon 

llvm-svn: 298840
2017-03-27 12:13:37 +00:00
Simon Pilgrim
916c4c6c69 [X86][SSE] Add computeKnownBitsForTargetNode support for (V)PSLL/(V)PSRL instructions
llvm-svn: 298806
2017-03-26 13:17:55 +00:00
Simon Pilgrim
4513382159 [X86][AVX512F] Fix reg class for VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk
Fixed -verify-machineinstrs errors in fast-isel-select-sse.ll (one of many in PR27481)

The VMOVSSZrr/VMOVSSZrrk and VMOVSDZrr/VMOVSDZrrk instructions were assuming both source registers were V128X when the second is actually supposed to be FR32X/FR64X

Differential Revision: https://reviews.llvm.org/D31200

llvm-svn: 298805
2017-03-26 12:52:28 +00:00
Igor Breger
6596cba796 [GlobalISel][X86] support G_FRAME_INDEX instruction selection.
Summary:
    Support G_FRAME_INDEX instruction selection.

    Reviewers: zvi, rovka, ab, qcolombet

    Reviewed By: ab

    Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

    Differential Revision: https://reviews.llvm.org/D30980

llvm-svn: 298800
2017-03-26 08:11:12 +00:00
Simon Pilgrim
56102564b5 [X86] Pull out repeated ScalarValueSizeInBits code. NFCI.
llvm-svn: 298783
2017-03-25 21:22:12 +00:00
Simon Pilgrim
5a3201823e [X86][SSE] Combine (VSRLI (VSRAI X, Y), (NumSignBits-1)) -> (VSRLI X, (NumSignBits-1))
Part 3 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298782
2017-03-25 20:43:01 +00:00
Simon Pilgrim
14a6166f0d [X86][SSE] Added ComputeNumSignBitsForTargetNode support for (V)PSRAI
Part 2 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298780
2017-03-25 19:58:36 +00:00
Simon Pilgrim
a587f1bc58 [X86][SSE] Generalised CMP+AND1 combine to ZERO/ALLBITS+MASK
Patch to generalize combinePCMPAnd1 (for handling SETCC + ZEXT cases) to work for any input that has zero/all bits set masked with an 'all low bits' mask.

Replaced the implicit assumption of shift availability with a call to SupportedVectorShiftWithImm.

Part 1 of 3.

Differential Revision: https://reviews.llvm.org/D31347

llvm-svn: 298779
2017-03-25 19:50:14 +00:00
Sanjay Patel
27189e735f [x86] use PMOVMSK to replace memcmp libcalls for 16-byte equality
This is the payoff for D31156 - if a target has efficient comparison instructions for vector-sized equality, 
we can replace memcmp calls with inline code that is both smaller and faster.

Differential Revision: https://reviews.llvm.org/D31290

llvm-svn: 298775
2017-03-25 16:05:33 +00:00
Simon Pilgrim
0e7c178f96 [X86][SSE] Generalised lowerTruncate by PACKSS to work with any 'zero/all bits' result, not just comparisons.
Added vector compare opcodes to X86TargetLowering::ComputeNumSignBitsForTargetNode

Covered by existing tests added for D22814.

llvm-svn: 298704
2017-03-24 16:12:31 +00:00
Nirav Dave
05cdc48115 [X86] Fix Stale SDNode use in X86ISelDAGtoDAG
Summary: Fixes pr32329.

Reviewers: spatel, craig.topper

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31286

llvm-svn: 298633
2017-03-23 18:25:17 +00:00
Eric Christopher
c430a7cfb7 Remove the subtarget argument from LowerFP_TO_INT since there's one
stored on X86TargetLowering.

llvm-svn: 298628
2017-03-23 17:35:08 +00:00
Eric Christopher
74d2ceecaf Remove unused X86Subtarget argument from getOnesVector.
llvm-svn: 298627
2017-03-23 17:35:06 +00:00
Simon Pilgrim
d401591143 [X86][SSE] Extract elements from narrower shuffle masks.
Add support for widening narrow shuffle masks so we can directly extract from the relevant input vector of the shuffle.

llvm-svn: 298616
2017-03-23 16:09:34 +00:00
Igor Breger
9b6cc48b31 [GlobalISel][X86] Support G_STORE/G_LOAD operation
Summary:
1. Support pointer type as function argumnet and return value
2. G_STORE/G_LOAD - set legal action for i8/i16/i32/i64/f32/f64/vec128
3. RegisterBank - support typeless operations like G_STORE/G_LOAD, for scalar use GPR bank.
4. Support instruction selection for G_LOAD/G_STORE

Reviewers: zvi, rovka, ab, qcolombet

Reviewed By: rovka

Subscribers: llvm-commits, dberris, kristof.beyls, eladcohen, guyblank

Differential Revision: https://reviews.llvm.org/D30973

llvm-svn: 298609
2017-03-23 15:25:57 +00:00
Zvi Rackover
82dc366986 X86FixupBWInsts: Minor cleanup. NFC
Summary: Cleanup some remnants of code from when the X86FixupBWInsts pass did both forward liveness analysis and backward liveness analysis.

Reviewers: MatzeB, myatsina, DavidKreitzer

Reviewed By: MatzeB

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31264

llvm-svn: 298599
2017-03-23 14:08:26 +00:00
Simon Pilgrim
04c25bb4f6 [X86][SSE] Tidyup canWidenShuffleElements. NFCI.
Pull out mask elements at the start, allowing us to make the widening pattern matching more readable.

llvm-svn: 298594
2017-03-23 13:33:03 +00:00
Igor Breger
c0ea4ba42e [GlobalISel][X86] clang-format. NFC
llvm-svn: 298590
2017-03-23 12:13:29 +00:00
Michael Zuckerman
901af2a6c1 [X86][TD][vpmovm2 ] New TD pattern for the vpmovm2 instruction
Up until now, vpmovm2 instruction described its destination operand size
by the source operand size. This patch adds new pattern for the vpmovm2
instruction. The node describes new expansion of the destination (from
{128|256} to 512).

Differential Revision: https://reviews.llvm.org/D30654

llvm-svn: 298586
2017-03-23 09:57:01 +00:00
Eric Christopher
91db69e5d2 Clean up some Subtarget uses and casts in the X86 backend, removing unnecessary work or calls.
llvm-svn: 298555
2017-03-22 22:44:52 +00:00
Simon Pilgrim
db8434b7d3 [X86] Remove unnecessary duplicate code (PR30649). NFCI.
llvm-svn: 298495
2017-03-22 11:23:49 +00:00
Craig Topper
7ed83fbaf5 [X86] Remove an unused function from release builds. Reported by gccs unused function warning.
llvm-svn: 298485
2017-03-22 06:07:58 +00:00
Coby Tayree
06adb3d9b0 [X86][MS-compatability][llvm] allow MS TYPE/SIZE/LENGTH operators as a part of a compound expression
This patch introduces X86AsmParser with the ability to handle the aforementioned ops within compound "MS" arithmetical expressions.
Currently - only supported as a stand alone Operand, e.g.:
"TYPE X"
now allowed :
"4 + TYPE X * 128"

Clang side: https://reviews.llvm.org/D31174

Differential Revision: https://reviews.llvm.org/D31173

llvm-svn: 298425
2017-03-21 19:31:55 +00:00
Davide Italiano
68c26974a8 [X86] Remove extra semicolon to placate GCC. NFCI.
llvm-svn: 298423
2017-03-21 19:17:23 +00:00
Reid Kleckner
27d17d1713 Rename AttributeSet to AttributeList
Summary:
This class is a list of AttributeSetNodes corresponding the function
prototype of a call or function declaration. This class used to be
called ParamAttrListPtr, then AttrListPtr, then AttributeSet. It is
typically accessed by parameter and return value index, so
"AttributeList" seems like a more intuitive name.

Rename AttributeSetImpl to AttributeListImpl to follow suit.

It's useful to rename this class so that we can rename AttributeSetNode
to AttributeSet later. AttributeSet is the set of attributes that apply
to a single function, argument, or return value.

Reviewers: sanjoy, javed.absar, chandlerc, pete

Reviewed By: pete

Subscribers: pete, jholewinski, arsenm, dschuff, mehdi_amini, jfb, nhaehnle, sbc100, void, llvm-commits

Differential Revision: https://reviews.llvm.org/D31102

llvm-svn: 298393
2017-03-21 16:57:19 +00:00
Sanjay Patel
2c93f98e1b [x86] use PMOVMSK for vector-sized equality comparisons
We could do better by splitting any oversized type into whatever vector size the target supports, 
but I left that for future work if it ever comes up. The motivating case is memcmp() calls on 16-byte
structs, so I think we can wire that up with a TLI hook that feeds into this.

Differential Revision: https://reviews.llvm.org/D31156

llvm-svn: 298376
2017-03-21 13:50:33 +00:00
Andrea Di Biagio
448c951c27 [DebugInfo][X86] Teach Optimize LEAs pass to handle debug values
This patch fixes an issue in the Optimize LEAs pass where redundant LEAs were
not removed because they were being used by debug values. The debug values are
now ignored when determining whether LEAs are redundant.

For now the debug values for the redundant LEAs are marked as undefined,
effectively lost. The intention is for a follow up patch which will attempt to
preserve the debug values where possible.

Patch by Andrew Ng.

Differential Revision: https://reviews.llvm.org/D30835

llvm-svn: 298360
2017-03-21 11:36:21 +00:00
Evgeniy Stepanov
ff7daa50fe [Fuchsia] Use %gs for ABI slots under -mcmodel=kernel
Make x86_64-fuchsia targets under -mcmodel=kernel use %gs rather
than %fs to access ABI slots for stack-protector and safe-stack

Patch by Roland McGrath.

Differential Revision: https://reviews.llvm.org/D30870

llvm-svn: 298302
2017-03-20 20:35:37 +00:00
Craig Topper
c7a2af7fa6 [AVX-512] Handle kor/kand/kandn/kxor/kxnor/knot intrinsics at lowering time instead of isel
Summary:
Currently we handle these intrinsics at isel with special patterns. But as they just map to normal logic operations, we should just handle them at lowering. This will expose them to DAG combine optimizations. Right now the kor-sequence test generates a bunch of regclass copies between GR16 and VK16 that the peephole optimizer and/or register coallescing are removing to keep everything in the mask domain. By handling the logic op intrinsics earlier, these copies become bitcasts in the DAG and get removed by DAG combine which seems more robust.

This should help enable my plan to stop copying between K registers and GR8/GR16. The peephole optimizer can't remove a chain of copies between K and GR32 with insert_subreg/extract_subreg present in the chain so the kor-sequence test break. But this patch should dodge the problem entirely.

Reviewers: zvi, delena, RKSimon, igorb

Reviewed By: igorb

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31056

llvm-svn: 298228
2017-03-19 17:11:09 +00:00
Oren Ben Simhon
cfab6a83e8 [MIR] Support Customed Register Mask and CSRs
The MIR printer dumps a string that describe the register mask of a function.
A static predefined list of register masks matches a static list of strings.
However when the register mask is not from the static predefined list, there is no descriptor string and the printer fails.
This patch adds support to custom register mask printing and dumping.
Also the list of callee saved registers (describing the registers that must be preserved for the caller) might be dynamic.
As such this data needs to be dumped and parsed back to the Machine Register Info.

Differential Revision: https://reviews.llvm.org/D30971

llvm-svn: 298207
2017-03-19 08:14:18 +00:00
Matthias Braun
ea69eb48b2 ExecutionDepsFix: Let targets specialize the pass; NFC
Let targets specialize the pass with the register class so we can get a
parameterless default constructor and can put the pass into the pass
registry to enable testing with -run-pass=.

llvm-svn: 298184
2017-03-18 05:08:58 +00:00
Matthias Braun
ea385b5555 ExecutionDepsFix: Normalize names; NFC
Normalize ExeDepsFix, execution-fix, ExecutionDependencyFix and
ExecutionDepsFix to the last one.

llvm-svn: 298183
2017-03-18 05:05:40 +00:00
Nirav Dave
88936693e4 Make library calls sensitive to regparm module flag (Fixes PR3997).
Reviewers: mkuper, rnk

Subscribers: mehdi_amini, jyknight, aemerson, llvm-commits, rengolin

Differential Revision: https://reviews.llvm.org/D27050

llvm-svn: 298179
2017-03-18 00:44:07 +00:00
Nirav Dave
c91fe2d028 Capitalize ArgListEntry fields. NFC.
llvm-svn: 298178
2017-03-18 00:43:57 +00:00
Sanjay Patel
333f78b282 [x86] clean up setcc with negated operand transform and add missing test; NFCI
llvm-svn: 298118
2017-03-17 20:29:40 +00:00