1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 06:22:56 +02:00
Commit Graph

12722 Commits

Author SHA1 Message Date
Ahmed Bougacha
f01f72bd2d [X86] Remove now-dead variable and redundant assert. NFC.
The variable was made dead in NDEBUG by r260901, but the assert
was redundant anyway: get rid of both.

llvm-svn: 260908
2016-02-15 19:32:54 +00:00
Ahmed Bougacha
7c87497de6 [CodeGen] Document and use getConstant's splat-building feature. NFC.
Differential Revision: http://reviews.llvm.org/D17229

llvm-svn: 260901
2016-02-15 18:07:29 +00:00
Igor Breger
4a5ac08a8e AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits.
If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes.

Differential Revision: http://reviews.llvm.org/D17138

llvm-svn: 260878
2016-02-15 08:25:28 +00:00
Simon Pilgrim
2acf1a4365 [X86][AVX] Lower shuffles as repeated lane shuffles then lane-crossing shuffles
This patch attempts to represent a shuffle as a repeating shuffle (recognisable by is128BitLaneRepeatedShuffleMask) with the source input(s) in their original lanes, followed by a single permutation of the 128-bit lanes to their final destinations.

On AVX2 we can additionally attempt to match using 64-bit sub-lane permutation. AVX2 can also now match a similar 'broadcasted' repeating shuffle.

This patch has several benefits:

 * Avoids prematurely matching with lowerVectorShuffleByMerging128BitLanes which can require both inputs to have their input lanes permuted before shuffling.
 * Can replace PERMPS/PERMD instructions - although these are useful for cross-lane unary shuffling, they require their shuffle mask to be pre-loaded (and increase register pressure).
 * Matching the repeating shuffle makes use of a lot of existing shuffle lowering.

There is an outstanding minor AVX1 regression (combine_unneeded_subvector1 in vector-shuffle-combining.ll) of a previously 128-bit shuffle + subvector splat being converted to a subvector splat + (2 instruction) 256-bit shuffle, I intend to fix this in a followup patch for review.

Differential Revision: http://reviews.llvm.org/D16537

llvm-svn: 260834
2016-02-13 21:54:04 +00:00
Craig Topper
7cf1a380ee Remove Proc feature flags for X86 processors that are used to inherit features from one processor to another. This exposed extra features to the -mattr command line that we shouldn't. Replace with just inherited listconcats.
llvm-svn: 260832
2016-02-13 21:35:37 +00:00
Sanjay Patel
d5eac4bb77 [x86-64] allow mfence even with -mno-sse (PR23203)
As shown in:
https://llvm.org/bugs/show_bug.cgi?id=23203
...we currently die because lowering believes that mfence is allowed without SSE2 on x86-64,
but the instruction def doesn't know that.

I don't know if allowing mfence without SSE is right, but if not, at least now it's consistently wrong. :)

Differential Revision: http://reviews.llvm.org/D17219

llvm-svn: 260828
2016-02-13 17:26:29 +00:00
Yunzhong Gao
9e56bc9706 Disable the vzeroupper insertion pass on PS4.
Differential Revision: http://reviews.llvm.org/D16837

llvm-svn: 260764
2016-02-12 23:37:57 +00:00
Sanjay Patel
f95e368e4f [x86] simplify getZeroVector() ; NFCI
Let DAG.getConstant() handle the splatting; there's no need
to repeat that logic here.

See also:
http://reviews.llvm.org/rL258833
http://reviews.llvm.org/rL260582

llvm-svn: 260609
2016-02-11 22:17:04 +00:00
Kevin B. Smith
14481efc9f [X86] New pass to change byte and word instructions to zero-extending versions.
Differential Revision: http://reviews.llvm.org/D17032

llvm-svn: 260572
2016-02-11 19:43:04 +00:00
Hans Wennborg
3625a4f59b Revert r260507: "[X86] Enable the LEA optimization pass by default."
This caused PR26575.

llvm-svn: 260538
2016-02-11 16:44:06 +00:00
Andrey Turetskiy
50853c0d36 [X86] Enable the LEA optimization pass by default.
Differential Revision: http://reviews.llvm.org/D16877

llvm-svn: 260507
2016-02-11 10:51:26 +00:00
Reid Kleckner
ea1c57fa88 [codeview] Describe int local variables using .cv_def_range
Summary:
Refactor common value, scope, and label tracking logic out of DwarfDebug
into a common base class called DebugHandlerBase.

Update an old LLVM IR test case to avoid an assertion in LexicalScopes.

Reviewers: dblaikie, majnemer

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D16931

llvm-svn: 260432
2016-02-10 20:55:49 +00:00
Sanjay Patel
b9786603cc [x86] refactor masked load/store combine logic ; NFCI
llvm-svn: 260426
2016-02-10 20:02:45 +00:00
Andrey Turetskiy
b4ff8194a0 [X86] Fix stack alignment for MCU target, by Anton Nadolskiy.
This patch fixes stack alignments for MCU (should be aligned to 4 bytes).

Differential Revision: http://reviews.llvm.org/D15646

llvm-svn: 260375
2016-02-10 11:57:06 +00:00
JF Bastien
de7879a249 X86: Remove useless semicolon
llvm-svn: 260359
2016-02-10 04:04:12 +00:00
Sanjay Patel
f3f4f7a88a [x86] convert masked load of exactly one element to scalar load
This is the load counterpart to the store optimization that was added in:
http://reviews.llvm.org/rL260145

llvm-svn: 260325
2016-02-09 23:44:35 +00:00
Ahmed Bougacha
f23bc5a4de [CodeGen] Prefer "if (SDValue R = ...)" to "if (R.getNode())". NFCI.
llvm-svn: 260316
2016-02-09 22:54:12 +00:00
Ahmed Bougacha
3589294b5a [X86] Don't reuse an unrelated variable, create a new one. NFC.
Using Op makes it look like we're doing something with it.
We're really not.

llvm-svn: 260315
2016-02-09 22:54:05 +00:00
Ahmed Bougacha
68b447ec28 [X86] Remove unnecessary assignment. NFC.
llvm-svn: 260314
2016-02-09 22:53:58 +00:00
Sanjay Patel
4c9e344480 [SelectionDAG] make getMemBasePlusOffset() accessible; NFCI
I reinvented this functionality in http://reviews.llvm.org/D16828 because it was
hidden away as a static function. The changes in x86 are not based on a complete
audit. I suspect there are other possible uses there, and there are almost certainly
more potential users in other targets.

llvm-svn: 260295
2016-02-09 21:42:04 +00:00
Sanjay Patel
9ad4cf304b [x86] make getOneTrueElt() a helper function ; NFC
As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform
will need this logic, so I'm moving it out to make that patch smaller.

llvm-svn: 260240
2016-02-09 17:39:58 +00:00
Simon Pilgrim
9569b015d4 [X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets.
On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG.

This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead.

Differential Revision: http://reviews.llvm.org/D16994

llvm-svn: 260210
2016-02-09 08:19:19 +00:00
Simon Pilgrim
4d4989655e [X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.

This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.

Differential Revision: http://reviews.llvm.org/D16956

llvm-svn: 260168
2016-02-08 23:03:46 +00:00
Sanjay Patel
69b9326059 [x86] convert masked store of one element to scalar store
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.

Code comments note that this transform could be extended for other targets / cases.

Differential Revision: http://reviews.llvm.org/D16828

llvm-svn: 260145
2016-02-08 21:05:08 +00:00
Hans Wennborg
36d0cb3fd9 [X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532)
This matches GCC and MSVC's behaviour, and saves on code size.

We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.

The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].

Differential Revision: http://reviews.llvm.org/D16907

 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
 [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ

llvm-svn: 260133
2016-02-08 19:34:30 +00:00
Michael Zuckerman
c705af63bb [AVX512][PROLQ][PROLD] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D16983

llvm-svn: 260101
2016-02-08 15:13:32 +00:00
Craig Topper
bc3298f1ee [X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461.
llvm-svn: 260069
2016-02-08 01:23:15 +00:00
Simon Pilgrim
9ecb59bace [X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.

This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.

Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.

Differential Revision: http://reviews.llvm.org/D16683

llvm-svn: 260063
2016-02-07 22:51:06 +00:00
Simon Pilgrim
8d73f8b49c [X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding.
llvm-svn: 260034
2016-02-07 15:39:22 +00:00
Asaf Badouh
5bbbfafa66 [X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode
Differential Revision: http://reviews.llvm.org/D16629

llvm-svn: 260033
2016-02-07 14:59:13 +00:00
Simon Pilgrim
bab30569fb [X86][SSE] Pulled out repeated target shuffle decodes into helper functions. NFCI.
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.

The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.

llvm-svn: 260032
2016-02-07 14:33:03 +00:00
Igor Breger
60ac21f165 AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.
Differential Revision: http://reviews.llvm.org/D16813

llvm-svn: 260024
2016-02-07 08:30:50 +00:00
Simon Pilgrim
b7e95cd192 [X86][AVX512] Added support for VPMOVZX shuffle decoding.
llvm-svn: 260007
2016-02-06 19:51:21 +00:00
Simon Pilgrim
a80ad70644 [X86][SSE] Moved shuffle decode CASE macros earlier. NFC.
To allow the helper functions to make use of them.

llvm-svn: 259997
2016-02-06 17:02:15 +00:00
Simon Pilgrim
520933a81a [X86][SSE] Refactored PMOVZX shuffle decoding to use scalar input types
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.

This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.

llvm-svn: 259995
2016-02-06 16:33:42 +00:00
Simon Pilgrim
f1a97ef96e [X86][SSE] Don't replace an existing 32-bit load with its duplicate
If we are already loading a single 32-bit float/integer then just reuse it.

Fix for regression in D16729

llvm-svn: 259991
2016-02-06 15:37:09 +00:00
Simon Pilgrim
354f04bdd5 Comment fix
llvm-svn: 259990
2016-02-06 14:21:49 +00:00
Simon Pilgrim
cc76b8656c [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Simon Pilgrim
da26d272a9 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Elena Demikhovsky
86a7e2549e AVX-512: Fixed a bug in FMA instruction selection on KNL
The FMA instruction was selected from AVX2 set instead of AVX-512

Differential Revision: http://reviews.llvm.org/D16884

llvm-svn: 259792
2016-02-04 15:11:11 +00:00
Michael Zuckerman
d8de4a9888 [AVX512] add vfmadd132ss and vfmadd132sd Intrinsic
Differential Revision: http://reviews.llvm.org/D16589

llvm-svn: 259789
2016-02-04 14:41:08 +00:00
Simon Pilgrim
36bd348c5b [X86] Moved SEXT -> SIGN_EXTEND_VECTOR_INREG combine into helper. NFC.
llvm-svn: 259771
2016-02-04 09:27:19 +00:00
Andrey Turetskiy
93bc15df7c [X86] Use hash table in LEA optimization pass.
Use hash table (key is a memory operand) to store found LEA instructions to reduce compile time.

Differential Revision: http://reviews.llvm.org/D16404

llvm-svn: 259770
2016-02-04 08:57:03 +00:00
Sanjay Patel
58b4f8e215 clean up; NFC
llvm-svn: 259720
2016-02-03 22:37:37 +00:00
Simon Pilgrim
8aa2db1f2d [X86][AVX] Add support for 64-bit VZEXT_LOAD of 256/512-bit vectors to EltsFromConsecutiveLoads
Follow up to D16217 and D16729

This change uncovered an odd pattern where VZEXT_LOAD v4i64 was being lowered to a load of the lower v2i64 (so the 2nd i64 destination element wasn't being zeroed), I can't find any use/reason for this and have removed the pattern and replaced it so only the 1st i64 element is loaded and the upper bits all zeroed. This matches the description for X86ISD::VZEXT_LOAD

Differential Revision: http://reviews.llvm.org/D16768

llvm-svn: 259635
2016-02-03 09:41:59 +00:00
Yunzhong Gao
d9694f67fb Revert r259576: Disable the vzeroupper insertion pass on PS4.
Will re-implement based on review feedback.

llvm-svn: 259615
2016-02-03 01:25:12 +00:00
Yunzhong Gao
3180165799 Disable the vzeroupper insertion pass on PS4.
See comments in test/CodeGen/X86/avx-vzeroupper.ll for more explanation.

Original patch by: Sean Silva

llvm-svn: 259576
2016-02-02 21:39:23 +00:00
Quentin Colombet
d668e1ecc8 [X86] Fix the merging of SP updates in prologue/epilogue insertions.
When the merging was involving LEAs, we were taking the wrong immediate
from the list of operands.

rdar://problem/24446069

llvm-svn: 259553
2016-02-02 20:11:17 +00:00
Eugene Zelenko
0ebce618ad Fix Clang-tidy readability-redundant-control-flow warnings; other minor fixes.
Differential revision: http://reviews.llvm.org/D16793

llvm-svn: 259539
2016-02-02 18:20:45 +00:00
Derek Schuff
c9579c25d0 [MC] Enable eip-relative addressing on x86-64 for X32 ABI
Summary:
Enables eip-based addressing, e.g.,

lea    constant(%eip), %rax
lea    constant(%eip), %eax

in MC, (used for the x32 ABI). EIP-base addressing is also valid in x86_64,
it is left enabled for that architecture as well.

Patch by João Porto

Differential Revision: http://reviews.llvm.org/D16581

llvm-svn: 259528
2016-02-02 17:20:04 +00:00