1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
Commit Graph

36059 Commits

Author SHA1 Message Date
Chad Rosier
cb79de089b [AArch64] Add an assert to ensure we don't scale an offset that can't be scaled.
llvm-svn: 260272
2016-02-09 20:18:07 +00:00
Chad Rosier
dd552205e1 [AArch64] Add a FIXME about invalid KILL markers after the ld/st opt pass.
llvm-svn: 260264
2016-02-09 19:42:19 +00:00
Chad Rosier
8d7cb088ae [AArch64] Remove redundant calls and clang format. NFC.
llvm-svn: 260260
2016-02-09 19:33:42 +00:00
Colin LeMahieu
7d2b0f70e8 [Hexagon] Fixing relocation generation and adding tests.
llvm-svn: 260259
2016-02-09 19:18:02 +00:00
Chad Rosier
b34ba75bdc [AArch64] Hoist now common logic. NFC.
llvm-svn: 260257
2016-02-09 19:17:18 +00:00
Chad Rosier
236a622860 [AArch64] Rename variable to make it clear we're merging here, not pairing.
llvm-svn: 260256
2016-02-09 19:09:22 +00:00
Chad Rosier
4c9c20482f [AArch64] Separage the codegen logic for widening vs. pairing. NFC.
llvm-svn: 260249
2016-02-09 19:02:12 +00:00
Chad Rosier
5c27cebb99 [AArch64] Cleanup to simplify logic when widening vs. pairing loads/stores. NFC.
The logic to pair instructions and merge narrow instructions has become cloogy
and error prone.  This patch beings to unravel these two similar, but distinct
optimizations.

llvm-svn: 260242
2016-02-09 18:10:20 +00:00
Sanjay Patel
9ad4cf304b [x86] make getOneTrueElt() a helper function ; NFC
As mentioned in http://reviews.llvm.org/D16828 , the related masked load transform
will need this logic, so I'm moving it out to make that patch smaller.

llvm-svn: 260240
2016-02-09 17:39:58 +00:00
Chad Rosier
04b8aec790 [AArch64] Rename variable to improve readability. NFC.
llvm-svn: 260228
2016-02-09 15:59:57 +00:00
Chad Rosier
271160c32d [AArch64] Remove stale comment.
llvm-svn: 260226
2016-02-09 15:51:33 +00:00
Simon Pilgrim
9569b015d4 [X86][AVX2] Fix SIGN_EXTEND vector handling on AVX2 targets.
On AVX2 target we are poorly legalizing SIGN_EXTEND ops for which the input's legalized type doesn't have the same number of elements as the destination, resulting in an ANY_EXTEND followed by a SIGN_EXTEND_INREG.

This patch uses the existing SIGN_EXTEND -> SIGN_EXTEND_VECTOR_INREG combine to extend the input to the size of the result and using SIGN_EXTEND_VECTOR_INREG instead.

Differential Revision: http://reviews.llvm.org/D16994

llvm-svn: 260210
2016-02-09 08:19:19 +00:00
Simon Pilgrim
4d4989655e [X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support
As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents.

This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines.

Differential Revision: http://reviews.llvm.org/D16956

llvm-svn: 260168
2016-02-08 23:03:46 +00:00
Dan Gohman
32a15ba966 [WebAssembly] Update the br_if instructions' operand orders to match the spec.
llvm-svn: 260152
2016-02-08 21:50:13 +00:00
Sanjay Patel
69b9326059 [x86] convert masked store of one element to scalar store
Another opportunity to reduce masked stores: in D16691, we decided not to attempt the 'one mask element is set'
transform in InstCombine, but this should be a win for any AVX machine.

Code comments note that this transform could be extended for other targets / cases.

Differential Revision: http://reviews.llvm.org/D16828

llvm-svn: 260145
2016-02-08 21:05:08 +00:00
Tom Stellard
bb35f34026 AMDGPU/SI: Implement a work-around for smrd corrupting vccz bit
Summary:
We will hit this once we have enabled uniform branches.  The
smrd-vccz-bug.ll test will be added with the uniform branch commit.

Reviewers: mareko, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16725

llvm-svn: 260137
2016-02-08 19:49:20 +00:00
Hans Wennborg
36d0cb3fd9 [X86] Don't zero/sign-extend i1, i8, or i16 return values to 32 bits (PR22532)
This matches GCC and MSVC's behaviour, and saves on code size.

We were already not extending i1 return values on x86_64 after r127766. This
takes that patch further by applying it to x86 target as well, and also for i8
and i16.

The ABI docs have been unclear about the required behaviour here. The new i386
psABI [1] clearly states (Table 2.4, page 14) that i1, i8, and i16 return
vales do not need to be extended beyond 8 bits. The x86_64 ABI doc is being
updated to say the same [2].

Differential Revision: http://reviews.llvm.org/D16907

 [1]. https://01.org/sites/default/files/file_attach/intel386-psabi-1.0.pdf
 [2]. https://groups.google.com/d/msg/x86-64-abi/E8O33onbnGQ/_RFWw_ixDQAJ

llvm-svn: 260133
2016-02-08 19:34:30 +00:00
Tim Northover
1dabed8e29 AArch64: match correct order in subtraction pattern.
The accumulator in multiply-and-subtract instructions is actually subtracted
*from* so these patterns were computing the wrong value.

llvm-svn: 260131
2016-02-08 19:33:18 +00:00
Matt Arsenault
eed0ad4e3e AMDGPU: Remove bfi and bfm intrinsics
Nothing is using them.

llvm-svn: 260123
2016-02-08 19:06:01 +00:00
Michael Zuckerman
c705af63bb [AVX512][PROLQ][PROLD] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D16983

llvm-svn: 260101
2016-02-08 15:13:32 +00:00
Dan Gohman
33dc7c1040 [WebAssembly] Add another optimization idea to README.txt.
llvm-svn: 260070
2016-02-08 03:42:36 +00:00
Craig Topper
bc3298f1ee [X86] Change FeatureIFMA string to 'avx512ifma'. Matches gcc and fixes PR26461.
llvm-svn: 260069
2016-02-08 01:23:15 +00:00
Simon Pilgrim
9ecb59bace [X86][SSE] Resolve target shuffle inputs to sentinels to permit more combines
The combineX86ShufflesRecursively only supports unary shuffles, but was missing the opportunity to combine binary shuffles with a zero / undef second input.

This patch resolves target shuffle inputs, converting the shuffle mask elements to SM_SentinelUndef/SM_SentinelZero where possible. It then resolves the updated mask to check if we have created a faux unary shuffle.

Additionally, we now attempt to recursively call combineX86ShufflesRecursively for all input operands (we used to just recurse for unary integer shuffles and unary unpacks) - it safely returns early if its not a target shuffle.

Differential Revision: http://reviews.llvm.org/D16683

llvm-svn: 260063
2016-02-07 22:51:06 +00:00
Simon Pilgrim
8d73f8b49c [X86][SSE] Added support for MOVHPD/MOVLPD + MOVHPS/MOVLPS shuffle decoding.
llvm-svn: 260034
2016-02-07 15:39:22 +00:00
Asaf Badouh
5bbbfafa66 [X86][AVX512] add intrinsics of Scalar FP to integer conversion with rounding mode
Differential Revision: http://reviews.llvm.org/D16629

llvm-svn: 260033
2016-02-07 14:59:13 +00:00
Simon Pilgrim
bab30569fb [X86][SSE] Pulled out repeated target shuffle decodes into helper functions. NFCI.
Pulled out the code used by PSHUFB/VPERMV/VPERMV3 shuffle mask decoding into common helper functions.

The helper functions handle masks coming from BROADCAST/BUILD_VECTOR and ConstantPool nodes respectively.

llvm-svn: 260032
2016-02-07 14:33:03 +00:00
Igor Breger
60ac21f165 AVX512: VPBROADCASTB/W/D/Q from GPR intrinsics implementation.
Differential Revision: http://reviews.llvm.org/D16813

llvm-svn: 260024
2016-02-07 08:30:50 +00:00
Simon Pilgrim
b7e95cd192 [X86][AVX512] Added support for VPMOVZX shuffle decoding.
llvm-svn: 260007
2016-02-06 19:51:21 +00:00
Simon Pilgrim
a80ad70644 [X86][SSE] Moved shuffle decode CASE macros earlier. NFC.
To allow the helper functions to make use of them.

llvm-svn: 259997
2016-02-06 17:02:15 +00:00
Simon Pilgrim
520933a81a [X86][SSE] Refactored PMOVZX shuffle decoding to use scalar input types
First step towards being able to decode AVX512 PMOVZX instructions without a massive bloat in the shuffle decode switch statement.

This should also make it easier to decode X86ISD::VZEXT target shuffles in the future.

llvm-svn: 259995
2016-02-06 16:33:42 +00:00
Simon Pilgrim
f1a97ef96e [X86][SSE] Don't replace an existing 32-bit load with its duplicate
If we are already loading a single 32-bit float/integer then just reuse it.

Fix for regression in D16729

llvm-svn: 259991
2016-02-06 15:37:09 +00:00
Simon Pilgrim
354f04bdd5 Comment fix
llvm-svn: 259990
2016-02-06 14:21:49 +00:00
Evandro Menezes
0e4ef392d8 [AArch64] Add the scheduling model for Exynos-M1
Summary:
Add the core scheduling model for the Samsung Exynos-M1 (ARMv8-A).


Reviewers: jmolloy, rengolin, christof, MinSeongKIM, t.p.northover

Subscribers: aemerson, rengolin, MatzeB

Differential Revision: http://reviews.llvm.org/D16644

llvm-svn: 259958
2016-02-06 00:01:41 +00:00
Jun Bum Lim
62bd130ca7 [AArch64] Refactoring aarch64-ldst-opt. NCF.
Remove narrow load / store instructions from getMatchingPairOpcode(),
and add getMatchingWideOpcode().

llvm-svn: 259914
2016-02-05 20:02:03 +00:00
Matt Arsenault
8009cfb6b5 AMDGPU: Account for LDS alignment
The current situation isn't great, because the amount of padding
requires is determined by the inverse order of the first encountered
use. We should eventually somehow sort these to minimize wasted space.

Another problem is the alignment of kernel arguments isn't
respected. The group_segment_alignment is always emitted as
the default 16, and typed arguments with higher alignments
or an explicitly set alignment are also ignored.

llvm-svn: 259912
2016-02-05 19:47:29 +00:00
Matt Arsenault
3c264fc8c0 AMDGPU: Preserve alignments on new created globals
Also switch to internal linkage, and include the name of the function in
the name.

llvm-svn: 259911
2016-02-05 19:47:23 +00:00
Tom Stellard
350a5c65b3 AMDGPU: Remove some purely R600 functions from AMDGPUInstrInfo
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16862

llvm-svn: 259900
2016-02-05 18:44:57 +00:00
Tom Stellard
9f8aefe66d AMDGPU: Fix ordering of CPU and FS parameters in TargetMachine constructors
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16863

llvm-svn: 259897
2016-02-05 18:29:17 +00:00
Tom Stellard
4948571e05 AMDGPU/SI: Correctly initialize SIInsertWaits pass
Reviewers: arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D16724

llvm-svn: 259894
2016-02-05 17:42:38 +00:00
Dan Gohman
f44d278023 [WebAssembly] Update the select instructions' operand orders to match the spec.
llvm-svn: 259893
2016-02-05 17:14:59 +00:00
Nemanja Ivanovic
3a405a94b2 Fix for PR 26193
This is a simple fix for a PowerPC intrinsic that was incorrectly defined
(the return type was incorrect).

llvm-svn: 259886
2016-02-05 14:50:29 +00:00
Benjamin Kramer
9b60b05c94 Move classes defined in a cpp file into an anonymous namespace.
No functionality change intended.

llvm-svn: 259883
2016-02-05 13:50:53 +00:00
Renato Golin
c3580d7a36 Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3)."
This reverts commit r259812 as it broke AArch64 self-hosting.

llvm-svn: 259881
2016-02-05 12:14:30 +00:00
Nemanja Ivanovic
b7bc445a9f Fix for PR 26356
Using the load immediate only when the immediate (whether signed or unsigned)
can fit in a 16-bit signed field. Namely, from -32768 to 32767 for signed and
0 to 65535 for unsigned. This patch also ensures that we sign-extend under the
right conditions.

llvm-svn: 259840
2016-02-04 23:14:42 +00:00
Chad Rosier
379876e08f [AArch64] Bound the number of instructions we scan when searching for updates.
This only impacts the creation of pre-/post-index instructions.  The bound was
set high enough such that it did not change code generation for SPEC200X.

llvm-svn: 259828
2016-02-04 21:26:02 +00:00
Simon Pilgrim
cc76b8656c [X86][SSE] Select domain for 32/64-bit partial loads for EltsFromConsecutiveLoads
Choose between MOVD/MOVSS and MOVQ/MOVSD depending on the target vector type.

This has a lot fewer test changes than trying to add this to X86InstrInfo::setExecutionDomain.....

llvm-svn: 259816
2016-02-04 19:27:51 +00:00
Chad Rosier
b8b5852fe4 [AArch64] Improve load/store optimizer to handle LDUR + LDR (take 3).
This patch allows the mixing of scaled and unscaled load/stores to form
load/store pairs.

PR24465
http://reviews.llvm.org/D12116
Many thanks to Ahmed and Michael for fixes and code review.

This is a reapplication of r246769 and r259790.  The tramp3d failure was caused
by an incorrect refactoring in the patch.  Specifically, we weren't always
properly clearing the SExtIdx flag.

llvm-svn: 259812
2016-02-04 18:59:49 +00:00
Silviu Baranga
22ab3adc5c [AArch64] Multiply extended 32-bit ints with `[U|S]MADDL'
During instruction selection, the AArch64 backend can recognise the
following pattern and generate an [U|S]MADDL instruction, i.e. a
multiply of two 32-bit operands with a 64-bit result:

(mul (sext i32), (sext i32))
However, when one of the operands is constant, the sign extension
gets folded into the constant in SelectionDAG::getNode(). This means
that the instruction selection sees this:

(mul (sext i32), i64)
...which doesn't match the pattern. Sign-extension and 64-bit
multiply instructions are generated, which are slower than one 32-bit
multiply.

Add a pattern to match this and generate the correct instruction, for
both signed and unsigned multiplies.

Patch by Chris Diamand!

llvm-svn: 259800
2016-02-04 16:47:09 +00:00
Simon Pilgrim
da26d272a9 [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVD load.

Differential Revision: http://reviews.llvm.org/D16729

llvm-svn: 259796
2016-02-04 16:12:56 +00:00
Chad Rosier
fcca55983b Revert "[AArch64] Improve load/store optimizer to handle LDUR + LDR."
This reverts commit r259790. tramp3d-v4 is still having problems.

llvm-svn: 259795
2016-02-04 16:01:40 +00:00