1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 14:02:52 +02:00
Commit Graph

12915 Commits

Author SHA1 Message Date
Nirav Dave
7a0e387b04 Remove HasFnAttribute guards to getFnAttribute calls
These checks are redundant and can be removed

Reviewers: hans

Subscribers: llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D18564

llvm-svn: 264872
2016-03-30 15:41:12 +00:00
Simon Pilgrim
2a917c41e4 [X86][XOP] BITREVERSE lowering using VPPERM
XOP's VPPERM has some great 'permute operations' that it can do as well as part of shuffling the bytes of a 128-bit vector - in this case we use it to perform BITREVERSE in a single instruction.

llvm-svn: 264870
2016-03-30 14:14:00 +00:00
Chandler Carruth
0c67e1a699 [x86] Fix a horrible bug in our lowering of x86 floating point atomic
operations.

Specifically, we had code that tried to badly approximate reconstructing
all of the possible variations on addressing modes in two x86
instructions based on those in one pseudo instruction. This is not the
first bug uncovered with doing this, so stop doing it altogether.
Instead generically and pedantically copy every operand from the address
over to both new instructions, and strip kill flags from any register
operands.

This fixes a subtle bug seen in the wild where we would mysteriously
drop parts of the addressing mode, causing for example the index
argument in the added test case to just be completely ignored.

Hypothetically, this was an extremely bad miscompile because it actually
caused a predictable and leveragable write of a 64bit quantity to an
unintended offset (the first element of the array intead of whatever
other element was intended). As a consequence, in theory this could even
have introduced security vulnerabilities.

However, this was only something that could happen with an atomic
floating point add. No other operation could trigger this bug, so it
seems extremely unlikely to have occured widely in the wild.

But it did in fact occur, and frequently in scientific applications
which were using relaxed atomic updates of a floating point value after
adding a delta. Those would end up being quite badly miscompiled by
LLVM, which is how we found this. Of course, this often looks like
a race condition in the code, but it was actually a miscompile.

I suspect that this whole RELEASE_FADD thing was a complete mistake.
There is no such operation, and I worry that anything other than add
will get remarkably worse codegeneration. But that's not for this
change....

llvm-svn: 264845
2016-03-30 08:41:59 +00:00
Chandler Carruth
d7d8eed23b [x86] Extract a helper function to compute the full addressing mode from
an x86 MachineInstr's operands. This will be super useful to fix some
bad atomics code in my next commit.

No functionality changed.

llvm-svn: 264819
2016-03-30 03:10:24 +00:00
Manman Ren
620c905661 Swift Calling Convention: add swiftself attribute.
Differential Revision: http://reviews.llvm.org/D17866

llvm-svn: 264754
2016-03-29 17:37:21 +00:00
Elena Demikhovsky
e99dbbd44a AVX-512: fixed a bug in fp_to_uint pattern on KNL
Fixed fp_to_uint instruction selection on KNL.
One pattern was missing for <4 x double> to <4 x i32>

Differential Revision: http://reviews.llvm.org/D18512

llvm-svn: 264701
2016-03-29 06:33:41 +00:00
Simon Pilgrim
3b416c6ffe [X86][SSE] Vectorize a bit (AND/XOR/OR) op if a BUILD_VECTOR has the same op for all their scalar elements.
If all a BUILD_VECTOR's source elements are the same bit (AND/XOR/OR) operation type and each has one constant operand, lower to a pair of BUILD_VECTOR and just apply the bit operation to the vectors.

The constant operands will form a constant vector meaning that we still only have a single BUILD_VECTOR to lower and we will have replaced all the scalarized operations with a single SSE equivalent.

Its not in our interest to start make a general purpose vectorizer from this, but I'm seeing enough of these scalar bit operations from the later legalization/scalarization stages to support them at least.

Differential Revision: http://reviews.llvm.org/D18492

llvm-svn: 264666
2016-03-28 21:33:52 +00:00
Elena Demikhovsky
9018ddedab AVX-512: Fixed ICMP instruction selection for i1 operands
ICMP instruction selection fails on SKX and KNL for i1 operand.
I use XOR to resolve:
(A == B) is equivalent to (A xor B) == 0

Differential Revision: http://reviews.llvm.org/D18511

llvm-svn: 264566
2016-03-28 07:47:58 +00:00
Simon Pilgrim
af0ed3ac43 [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targets
Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization

llvm-svn: 264517
2016-03-26 18:32:13 +00:00
Simon Pilgrim
22edb62412 [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targets
Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264512
2016-03-26 15:44:55 +00:00
Simon Pilgrim
7e5597cd68 [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectors
Currently this is to mainly to prevent scalarization of integer division by constants.

Differential Revision: http://reviews.llvm.org/D18307

llvm-svn: 264511
2016-03-26 15:27:20 +00:00
Simon Pilgrim
ea6b9f8ae7 [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 multiplies.
Only pre-AVX512BW targets need to split v32i8 vectors.

llvm-svn: 264509
2016-03-26 09:44:27 +00:00
Simon Pilgrim
3c89786d54 [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.
LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead.

llvm-svn: 264506
2016-03-26 09:29:04 +00:00
David Majnemer
ff9630a057 [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddr
We forgot to add the second machine operand to our ADJCALLSTACKDOWN,
resulting in crashes in PEI.

This fixes PR27071.

llvm-svn: 264465
2016-03-25 21:49:11 +00:00
Hans Wennborg
45f934c834 [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsize
64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes,
respectively, whereas and/or with 8-bit immediate is only three bytes.

Since these instructions imply an additional memory read (which the CPU could
elide, but we don't think it does), restrict these patterns to minsize functions.

Differential Revision: http://reviews.llvm.org/D18374

llvm-svn: 264440
2016-03-25 18:11:31 +00:00
Simon Pilgrim
824e119cc6 [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerShift. NFC.
LowerShift was using the same code as Lower256IntArith to split 256-bit vectors into 2 x 128-bit vectors, so now we just call Lower256IntArith.

llvm-svn: 264403
2016-03-25 14:17:54 +00:00
Elena Demikhovsky
e73ee2b606 fixed typo
llvm-svn: 264395
2016-03-25 10:08:36 +00:00
Hans Wennborg
9fe6bf47fd X86: Use push-pop for materializing 8-bit immediates for minsize (take 2)
This is the same as r255936, with added logic for avoiding clobbering of the
red zone (PR26023).

Differential Revision: http://reviews.llvm.org/D18246

llvm-svn: 264375
2016-03-25 01:10:56 +00:00
Simon Pilgrim
cc78833907 [X86][XOP] Fixed instruction postfixes to more closely match operands
Suggested by Sanjay in D18189 as the multiple folding options in XOP instructions can be tricky

llvm-svn: 264305
2016-03-24 16:31:30 +00:00
Elena Demikhovsky
40f394e95d AVX-512: Generate KTEST instead of TEST fir i1 vectors
KTEST instruction may be used instead of TEST in this case:

%int_sel3 = bitcast <8 x i1> %sel3 to i8
%res = icmp eq i8 %int_sel3, zeroinitializer
br i1 %res, label %L2, label %L1

Differential Revision: http://reviews.llvm.org/D18444

llvm-svn: 264298
2016-03-24 15:53:45 +00:00
Simon Pilgrim
c3ecf6079f [X86][XOP] Merged 128/256 bit 4op instruction definitions. NFCI.
llvm-svn: 264294
2016-03-24 15:28:02 +00:00
Simon Pilgrim
b4f1778bd5 [X86][XOP] Support for VPPERM byte shuffle instruction
This patch begins adding support for lowering to the XOP VPPERM instruction - adding the X86ISD::VPPERM opcode.

Differential Revision: http://reviews.llvm.org/D18189

llvm-svn: 264260
2016-03-24 11:52:43 +00:00
Paul Robinson
082bed0b87 [PS4] Guarantee an instruction after a 'noreturn' call.
We need the "return address" of a noreturn call to be within the
bounds of the calling function; TrapUnreachable turns 'unreachable'
into a 'ud2' instruction, which has that desired effect.

Differential Revision: http://reviews.llvm.org/D18414

llvm-svn: 264224
2016-03-24 00:10:03 +00:00
Cong Hou
4458d58ad9 Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed.
Currently, AnalyzeBranch() fails non-equality comparison between floating points
on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this
function can modify the branch by reversing the conditional jump and removing
unconditional jump if there is a proper fall-through. However, in the case of
non-equality comparison between floating points, this can turn the branch
"unanalyzable". Consider the following case:

jne.BB1
jp.BB1
jmp.BB2
.BB1:
...
.BB2:
...

AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be
removed:

jne.BB1
jnp.BB2
.BB1:
...
.BB2:
...

However, AnalyzeBranch() cannot analyze this branch anymore as there are two
conditional jumps with different targets. This may disable some optimizations
like block-placement: in this case the fall-through behavior is enforced even if
the fall-through block is very cold, which is suboptimal.

Actually this optimization is also done in block-placement pass, which means we
can remove this optimization from AnalyzeBranch(). However, currently
X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined
negation conditions for them.

In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP
and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them.
Here only the second conditional jump is reversed. This is valid as we only need
them to do this "unconditional jump removal" optimization.


Differential Revision: http://reviews.llvm.org/D11393

llvm-svn: 264199
2016-03-23 21:45:37 +00:00
Sanjay Patel
aee6dc6701 [x86] make peekThroughBitcasts() a helper function
This should be hoisted further up so it can be used in DAGCombiner and other backends,
but I'm limiting the scope in the interest of patch minimalism.

It's not quite NFC because some of the replaced code was using an 'if' check rather
than a 'while' loop, so those cases would only look through a single bitcast.

llvm-svn: 264186
2016-03-23 20:16:37 +00:00
Andrey Turetskiy
200b3a62bd [X86] Introduction of FeatureX87.
Add FeatureX87 in X86 backend to be able to define CPUs which doesn't have x87.

Differential Revision: http://reviews.llvm.org/D13979

llvm-svn: 264148
2016-03-23 11:13:54 +00:00
Joerg Sonnenberger
e032905164 Typo
llvm-svn: 264110
2016-03-22 22:24:52 +00:00
Simon Pilgrim
83dbb9a4a4 [X86][SSE] Reapplied: Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Reapplied with a fix for PR26953 (missing vector widening legalization).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 264062
2016-03-22 16:22:08 +00:00
Simon Pilgrim
3e6ae4b752 [X86][SSE] Tidyup setTargetShuffleZeroElements to match computeZeroableShuffleElements
Based on feedback for D14261

llvm-svn: 263911
2016-03-20 17:43:07 +00:00
Simon Pilgrim
6372fe1697 [X86][SSE] Detect zeroable shuffle elements from different value types
Improve computeZeroableShuffleElements to be able to peek through bitcasts to extract zero/undef values from BUILD_VECTOR nodes of different element sizes to the shuffle mask.

Differential Revision: http://reviews.llvm.org/D14261

llvm-svn: 263906
2016-03-20 15:45:42 +00:00
Igor Breger
83f6b21484 AVX512BW: Enable v32i1/v64i1 BUILD_VECTOR
Differential Revision: http://reviews.llvm.org/D18211

llvm-svn: 263898
2016-03-20 13:09:43 +00:00
Michael Kuperstein
36cf287497 Use a range-based for loop. NFC.
llvm-svn: 263889
2016-03-20 00:16:13 +00:00
Manman Ren
dfc9be9be5 [CXX_FAST_TLS] Disable tail call when calling conventions are mismatched.
Since CXX_FAST_TLS has a bigger set of CSRs, we don't tail call when caller
and callee have mismatched calling conventions.

llvm-svn: 263856
2016-03-18 23:41:51 +00:00
Simon Pilgrim
f9f3d37f61 [X86][SSE] Simplified blend-with-zero combining
We were being too aggressive in trying to combine a shuffle into a blend-with-zero pattern, often resulting in a endless loop of contrasting combines

This patch stops the combine if we already have a blend in place (means we miss some domain corrections)

llvm-svn: 263717
2016-03-17 15:59:36 +00:00
Sanjay Patel
bd226877f7 fix function names; NFC
llvm-svn: 263646
2016-03-16 18:00:09 +00:00
Igor Breger
bf48be46fb AVX512BW: Fix SRA v64i8 lowering. Use PCMPGTM (cmp result in k register) for 512bit vector because PCMPGT supported only for 128/256bit.
Differential Revision: http://reviews.llvm.org/D18204

llvm-svn: 263624
2016-03-16 08:48:26 +00:00
Eric Christopher
ddb99141b3 Temporarily Revert "[X86][SSE] Simplify vector LOAD + EXTEND on
pre-SSE41 hardware" as it seems to be causing crashes during code
generation in halide. PR forthcoming.

This reverts commit r263303.

llvm-svn: 263512
2016-03-14 23:59:57 +00:00
Sanjay Patel
f7ad46820f [DAG] use !isUndef() ; NFCI
llvm-svn: 263453
2016-03-14 18:09:43 +00:00
Sanjay Patel
f22bc14a47 [DAG] use isUndef() ; NFCI
llvm-svn: 263448
2016-03-14 17:28:46 +00:00
Sanjay Patel
f3adc07abf [x86, AVX] replace masked load with full vector load when possible
Converting masked vector loads to regular vector loads for x86 AVX should always be a win.
I raised the legality issue of reading the extra memory bytes on llvm-dev. I did not see any
objections.

1. x86 already does this kind of optimization for multiple scalar loads -> vector load.
2. If other targets have the same flexibility, we could move this transform up to CGP or DAGCombiner.

Differential Revision: http://reviews.llvm.org/D18094

llvm-svn: 263446
2016-03-14 16:54:43 +00:00
Igor Breger
e61fb42d0b AVX512: icmp operation should be always lowered to CMPM (AVX-512) instruction on SKX.
implemented by delena

Differential Revision: http://reviews.llvm.org/D18054

llvm-svn: 263417
2016-03-14 10:26:39 +00:00
Simon Pilgrim
176a4e0ec3 [X86][SSE41] Avoid variable blend for constant v8i16 shifts
The SSE41 v8i16 shift lowering using (v)pblendvb is great for non-constant shift amounts, but if it is constant then we can efficiently reduce the VSELECT to shuffles with the pre-SSE41 lowering.

llvm-svn: 263383
2016-03-13 18:35:59 +00:00
Craig Topper
f34a1d74e9 [X86] Remove many operands that represent memory stores from outs to ins. These operands are the registers and immediates that specify the memory address not the memory itself thus they are inputs.
llvm-svn: 263354
2016-03-13 02:56:31 +00:00
Quentin Colombet
aaf2db6c80 [X86] Make sure we do not clobber RBX with cmpxchg when used as a base pointer.
cmpxchg[8|16]b uses RBX as one of its argument.
In other words, using this instruction clobbers RBX as it is defined to hold one
the input. When the backend uses dynamically allocated stack, RBX is used as a
reserved register for the base pointer. 

Reserved registers have special semantic that only the target understands and
enforces, because of that, the register allocator don’t use them, but also,
don’t try to make sure they are used properly (remember it does not know how
they are supposed to be used).

Therefore, when RBX is used as a reserved register but defined by something that
is not compatible with that use, the register allocator will not fix the
surrounding code to make sure it gets saved and restored properly around the
broken code. This is the responsibility of the target to do the right thing with
its reserved register.

To fix that, when the base pointer needs to be preserved, we use a different
pseudo instruction for cmpxchg that save rbx.
That pseudo takes two more arguments than the regular instruction:
- One is the value to be copied into RBX to set the proper value for the
  comparison.
- The other is the virtual register holding the save of the value of RBX as the
  base pointer. This saving is done as part of isel (i.e., we emit a copy from
  rbx).

cmpxchg_save_rbx <regular cmpxchg args>, input_for_rbx_reg, save_of_rbx_as_bp

This gets expanded into:
rbx = copy input_for_rbx_reg
cmpxchg <regular cmpxchg args>
rbx = save_of_rbx_as_bp

Note: The actual modeling of the pseudo is a bit more complicated to make sure
the interferes that appears after the pseudo gets expanded are properly modeled
before that expansion.

This fixes PR26883.

llvm-svn: 263325
2016-03-12 02:25:27 +00:00
Simon Pilgrim
d62ab3da09 [X86][SSE] Simplify vector LOAD + EXTEND on pre-SSE41 hardware
Improve vector extension of vectors on hardware without dedicated VSEXT/VZEXT instructions.

We already convert these to SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG but can further improve this by using the legalizer instead of prematurely splitting into legal vectors in the combine as this only properly helps for lowering to VSEXT/VZEXT.

Removes a lot of unnecessary any_extend + mask pattern - (Fix for PR25718).

Differential Revision: http://reviews.llvm.org/D17932

llvm-svn: 263303
2016-03-11 22:18:05 +00:00
Simon Pilgrim
9da9e86e49 Fix spelling.
llvm-svn: 263266
2016-03-11 17:31:43 +00:00
Simon Pilgrim
d1894c7f7a [X86][AVX] Fixed issue where a long chain of shuffles could attempt to combine to a single (illegal) PSHUFB instruction.
Its not enough that we test for SSSE3 - that's only OK for 128-bit vectors - we also need to test for AVX2 / AVX512BW for 256/512 bit vector cases.

llvm-svn: 263239
2016-03-11 14:39:10 +00:00
Sanjay Patel
4aba3720bc [x86] don't use a shuffle when a vselect will do; NFCI
Looking at the IR definition of a masked load made me realize
there was no reason to use a shuffle here, so we don't need
to convert the format of the mask at all.

llvm-svn: 263167
2016-03-10 22:35:33 +00:00
Simon Pilgrim
74609b7c8b [X86][SSE] Reapplied: Improve vector ZERO_EXTEND by combining to ZERO_EXTEND_VECTOR_INREG
Generalise the existing SIGN_EXTEND to SIGN_EXTEND_VECTOR_INREG combine to support zero extension as well and get rid of a lot of unnecessary ANY_EXTEND + mask patterns.

Reapplied with a fix for PR26870 (avoid premature use of TargetConstant in ZERO_EXTEND_VECTOR_INREG expansion).

Differential Revision: http://reviews.llvm.org/D17691

llvm-svn: 263159
2016-03-10 20:40:26 +00:00
Michael Kuperstein
a8857efd95 [X86] Correctly select registers to pop into for x86_64
When trying to replace an add to esp with pops, we need to choose dead
registers to pop into. Registers clobbered by the call and not imp-def'd
by it should be safe. Except that it's not enough to check the register
itself isn't defined, we also need to make sure no overlapping registers
are defined either.

This fixes PR26711.

Differential Revision: http://reviews.llvm.org/D18029

llvm-svn: 263139
2016-03-10 18:43:21 +00:00