1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 02:52:53 +02:00
Commit Graph

23546 Commits

Author SHA1 Message Date
Simon Pilgrim
369005c7c7 [X86][AVX512] Add DQ+VLX scalar int<->fp tests cases for D43441
llvm-svn: 325804
2018-02-22 16:29:08 +00:00
Stefan Maksimovic
1018fe2e77 [mips] Generate memory dependencies for byVal arguments
There were no memory dependencies made between stores generated
when lowering formal arguments and loads generated when
call lowering byVal arguments which made the Post-RA scheduler
place a load before a matching store.

Make the fixed object stored to mutable so that the load
instructions can have their memory dependencies added

Set the frame object as isAliased which clears the underlying
objects vector in ScheduleDAGInstrs::buildSchedGraph().
This results in addition of all stores as dependenies for loads.

This problem appeared when passing a byVal parameter
coupled with a fastcc function call.

Differential Revision: https://reviews.llvm.org/D37515

llvm-svn: 325782
2018-02-22 13:40:42 +00:00
Simon Dardis
e27e32a4b5 [mips] Regenerate tests for D38128 (NFC)
llvm-svn: 325770
2018-02-22 11:53:01 +00:00
Sjoerd Meijer
eeeedc6071 Recommit: [ARM] f16 constant pool fix
This recommits r325754; the modified and failing test case
actually didn't need any modifications.

llvm-svn: 325765
2018-02-22 10:43:57 +00:00
David Green
86bf0a6446 [ARM] Fix issue with large xor constants.
Fixup to rL325573 for large xor constants.

Thanks to Eli Friedman for the catch.

Differential revision: https://reviews.llvm.org/D43549

llvm-svn: 325761
2018-02-22 09:38:57 +00:00
Sjoerd Meijer
49c327d9d4 Revert r325754 and r325755 (f16 literal pool) because buildbots were unhappy.
llvm-svn: 325756
2018-02-22 08:41:55 +00:00
Sjoerd Meijer
2e623a8f3f Added a test that I forgot to svn add in my previous commit r325754.
llvm-svn: 325755
2018-02-22 08:20:50 +00:00
Sjoerd Meijer
963d510433 [ARM] f16 constant pool fix
This is a follow up of r325012, that allowed half types in constant pools.
Proper alignment was enforced when a big basic block was split up, but not when
a CPE was placed before/after a block; the successor block had the wrong
alignment.

Differential Revision: https://reviews.llvm.org/D43580

llvm-svn: 325754
2018-02-22 08:16:05 +00:00
Nemanja Ivanovic
237f644f7b [PowerPC] Do not produce invalid CTR loop with an FRem
An FRem instruction inside a loop should prevent the loop from being converted
into a CTR loop since this is not an operation that is legal on any PPC
subtarget. This will always be a call to a library function which means the
loop will be invalid if this instruction is in the body.

Fixes PR36292.

llvm-svn: 325739
2018-02-22 03:02:41 +00:00
Simon Pilgrim
b1c2f455d0 [X86][MMX] Generlize MMX_MOVD64rr combines to accept v4i16/v8i8 build vectors as well as v2i32
Also handle both cases where the lower 32-bits of the MMX is undef or zero extended.

llvm-svn: 325736
2018-02-21 23:07:30 +00:00
Simon Pilgrim
79971ea2d5 [X86][MMX] Add MMX_MOVD64rr build vector tests showing undef elements in the lower half
llvm-svn: 325729
2018-02-21 22:10:48 +00:00
Simon Pilgrim
cfbbbe13ff [X86][MMX] Run MMX bitcast test on 32 and 64-bit targets
llvm-svn: 325707
2018-02-21 18:52:16 +00:00
Simon Pilgrim
e11407b82a [X86][MMX] Regenerate MMX MASKMOV test
llvm-svn: 325698
2018-02-21 16:38:08 +00:00
Jonas Paulsson
1ba71f37a4 [Hexagon] Return true in enableMultipleCopyHints().
Enable multiple COPY hints to eliminate more COPYs during register allocation.

Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.

Review: Krzysztof Parzyszek
llvm-svn: 325697
2018-02-21 16:37:45 +00:00
Simon Pilgrim
1367bf131c [X86][MMX] Regenerate MMX arithmetic tests
llvm-svn: 325696
2018-02-21 16:37:10 +00:00
Jonas Devlieghere
79ff112122 [Sparc] Include __tls_get_addr in symbol table for TLS calls to it
Global Dynamic and Local Dynamic call relocations only implicitly
reference __tls_get_addr; there is no connection in the ELF file between
the relocations and the symbol other than the specification for the
relocations' semantics. However, it still needs to be in the symbol
table despite the lack of explicit references to the symbol table entry,
since it needs to be bound at link time for these relocations, otherwise
any objects will fail to link.

For details, see https://sourceware.org/bugzilla/show_bug.cgi?id=22832.

Path by: James Clarke (jrtc27)

Differential revision: https://reviews.llvm.org/D43271

llvm-svn: 325688
2018-02-21 15:25:26 +00:00
Simon Pilgrim
df67c3be6f [X86][MMX] Regenerate MMX PSUB commutation test
llvm-svn: 325685
2018-02-21 15:07:47 +00:00
Simon Pilgrim
b9bcc5c1cf [X86] Regenerate GPR:XMM bitcast test
llvm-svn: 325684
2018-02-21 15:05:47 +00:00
Nicolai Haehnle
ab865ff17a AMDGPU: Do not combine loads/store across physreg defs
Summary:
Since this pass operates on machine SSA form, this should only really
affect M0 in practice.

Fixes various piglit variable-indexing/vs-varying-array-mat4-index-*

Change-Id: Ib2a1dc3a8d7b08225a8da49a86f533faa0986aa8
Fixes: r317751 ("AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4")

Reviewers: arsenm, mareko, rampitec

Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D40343

llvm-svn: 325677
2018-02-21 13:31:35 +00:00
Simon Pilgrim
f41068eb9c [X86][MMX] Add PR29222 test case
llvm-svn: 325675
2018-02-21 12:06:27 +00:00
Simon Pilgrim
5e12a3341d [X86][MMX] Add some MMX build vector tests
llvm-svn: 325674
2018-02-21 12:01:30 +00:00
Craig Topper
bf52c09a2e [X86] Disable CLWB for Cannon Lake
Cannon Lake does not support CLWB, therefore it
does not include all features listed under SKX anymore.

Instead, enumerate all SKX features with the exception of CLWB.

Patch by Gabor Buella

Differential Revision: https://reviews.llvm.org/D43380

llvm-svn: 325654
2018-02-21 00:15:48 +00:00
Simon Dardis
8181753289 [mips] Spectre variant two mitigation for MIPSR2
This patch provides mitigation for CVE-2017-5715, Spectre variant two,
which affects the P5600 and P6600. It implements the LLVM part of
-mindirect-jump=hazard. It is _not_ enabled by default for the P5600.

The migitation strategy suggested by MIPS for these processors is to use
hazard barrier instructions. 'jalr.hb' and 'jr.hb' are hazard
barrier variants of the 'jalr' and 'jr' instructions respectively.

These instructions impede the execution of instruction stream until
architecturally defined hazards (changes to the instruction stream,
privileged registers which may affect execution) are cleared. These
instructions in MIPS' designs are not speculated past.

These instructions are used with the attribute +use-indirect-jump-hazard
when branching indirectly and for indirect function calls.

These instructions are defined by the MIPS32R2 ISA, so this mitigation
method is not compatible with processors which implement an earlier
revision of the MIPS ISA.

Performance benchmarking of this option with -fpic and lld using
-z hazardplt shows a difference of overall 10%~ time increase
for the LLVM testsuite. Certain benchmarks such as methcall show a
substantially larger increase in time due to their nature.

Reviewers: atanasyan, zoran.jovanovic

Differential Revision: https://reviews.llvm.org/D43486

llvm-svn: 325653
2018-02-21 00:06:53 +00:00
Konstantin Zhuravlyov
b376e1565e Revert "[AMDGPU] Increased vector length for global/constant loads."
https://reviews.llvm.org/rL325518

It breaks following OpenCL conformance tests:
  - Basic - parameter_types
  - Basic - vload_private

llvm-svn: 325643
2018-02-20 23:30:21 +00:00
Craig Topper
e0ad66947f [X86] Fix copy/paste mistake in test.
The contents of the test case didnt' match the name of the test case. And they were identical to the test above.

llvm-svn: 325635
2018-02-20 22:33:23 +00:00
Craig Topper
6d7a4fefb4 [SelectionDAG] Support known true/false SimplifySetCC cases for comparing against vector splats of constants.
This is split off from D42948 and includes just the cases that constant fold to true or false. It also includes some refactoring to keep predicate checks together.

This supports things like

(setcc uge X, 0) -> true

Differential Revision: https://reviews.llvm.org/D43489

llvm-svn: 325627
2018-02-20 21:48:14 +00:00
Evandro Menezes
b11fbd7d6f [AArch64] Refactor instructions using SIMD immediates
Get rid of icky goto loops and make the code easier to maintain.  Otherwise,
NFC.

Restore r324903 and fix PR36369.

Differentail revision: https://reviews.llvm.org/D43364

llvm-svn: 325621
2018-02-20 20:31:45 +00:00
Sjoerd Meijer
c8b0ed116a [ARM] Lower BR_CC for f16
This case wasn't handled yet.

Differential Revision: https://reviews.llvm.org/D43508

llvm-svn: 325616
2018-02-20 19:28:05 +00:00
Stanislav Mekhanoshin
f23fe1bf79 [AMDGPU] Removed redundant run lines for fmuladd.f16 test. NFC.
llvm-svn: 325615
2018-02-20 19:19:56 +00:00
Simon Pilgrim
92e6f3ba0f [X86][MMX] Regenerate MMX bitcast test
llvm-svn: 325611
2018-02-20 18:48:29 +00:00
Simon Pilgrim
34a2d7157c [X86][3DNow] Regenerate intrinsics tests
llvm-svn: 325609
2018-02-20 18:44:21 +00:00
Krzysztof Parzyszek
69728227c1 [Hexagon] Handle *Low8 register classes in early if-conversion
llvm-svn: 325606
2018-02-20 18:19:17 +00:00
Craig Topper
df44496118 [X86] Correct SHRUNKBLEND creation to work correctly when there are multiple uses of the condition.
SimplifyDemandedBits forces the demanded mask to all 1s if the node has multiple uses, unless the AssumeSingleUse flag is set.

So previously we were only really likely to simplify something if the condition had a single use. And on the off chance we did simplify with multiple uses the demanded mask being used was all ones so there was no reason to create a shrunkblend.

This patch now checks that the condition is only used by selects first, and then sets the AssumeSingleUse flag for the simplifcation. Then we convert the selects to shrunkblend, and finally replace condition.

Differential Revision: https://reviews.llvm.org/D43446

llvm-svn: 325604
2018-02-20 17:58:17 +00:00
Craig Topper
0558ecc62b [SelectionDAG] Add LegalTypes flag to getShiftAmountTy. Use it to unify and simplify DAGCombiner and simplifySetCC code and fix a bug.
DAGCombiner and SimplifySetCC both use getPointerTy for shift amounts pre-legalization. DAGCombiner uses a single helper function to hide this. SimplifySetCC does it in multiple places.

This patch adds a defaulted parameter to getShiftAmountTy that can make it return getPointerTy for scalar types. Use this parameter to simplify the SimplifySetCC and DAGCombiner.

Additionally, there were two places in SimplifySetCC that were creating shifts using the target's preferred shift amount pre-legalization. If the target uses a narrow type and the type is illegal, this can cause SimplfiySetCC to create a shift with an amount that can't represent all possible shift values for the type. To fix this we should use pointer type there too.

Alternatively we could make getScalarShiftAmountTy for each target return a safe value for large types as proposed in D43445. And maybe we should still do that, but fixing the SimplifySetCC code keeps other targets from tripping over this in the future.

Fixes PR36250.

Differential Revision: https://reviews.llvm.org/D43449

llvm-svn: 325602
2018-02-20 17:41:05 +00:00
Craig Topper
198aed3c9e [X86] Promote 16-bit cmovs to 32-bits
This allows us to avoid an opsize prefix. And forcing some move immediates to i32 avoids a length changing prefix on those instructions.

This mostly replaces the existing combine we had for zext/sext+cmov of constants. I left in a case for sign extending a 32 bit cmov of constants to 64 bits.

Differential Revision: https://reviews.llvm.org/D43327

llvm-svn: 325601
2018-02-20 17:41:00 +00:00
Lei Huang
21235789c0 [PowerPC] Reduce stack frame for fastcc functions by only allocating parameter save area when needed
Current implementation always allocates the parameter save area conservatively
for fastcc functions. There is no reason to allocate the parameter save area if
all the parameters can be passed via registers.

Differential Revision: https://reviews.llvm.org/D42602

llvm-svn: 325581
2018-02-20 15:09:45 +00:00
Krzysztof Parzyszek
9eec84ff0c [Hexagon] Fix alignment calculation of stack objects in Hexagon bit tracker
llvm-svn: 325580
2018-02-20 14:29:43 +00:00
Simon Pilgrim
2e780a63b4 [X86] Regenerate XOR tests
llvm-svn: 325579
2018-02-20 14:08:39 +00:00
David Green
25f3a586cf [ARM] Mark -1 as cheap in xor's for thumb1
We can always convert xor %a, -1 into MVN, even in thumb 1 where the -1
would not otherwise be considered a cheap constant. This prevents the
-1's from being pulled out into constants and potentially hoisted.

Differential Revision: https://reviews.llvm.org/D43451

llvm-svn: 325573
2018-02-20 11:07:35 +00:00
George Rimar
b4d7f82f75 [llvm-mc] - Produce R_X86_64_PLT32 for "call/jmp foo".
For instructions like call foo and jmp foo patch changes
relocation produced from R_X86_64_PC32 to R_X86_64_PLT32.
Relocation can be used as a marker for 32-bit PC-relative branches.
Linker will reduce PLT32 relocation to PC32 if function is defined locally.

Differential revision: https://reviews.llvm.org/D43383

llvm-svn: 325569
2018-02-20 10:17:57 +00:00
Tim Renouf
1c9d5cdeab [AMDGPU] stop buffer_store being moved illegally
Summary:
The machine instruction scheduler was illegally moving a buffer store
past a buffer load with the same descriptor and offset. Fixed by marking
buffer ops as mayAlias and isAliased. This may be overly conservative,
and we may need to revisit.

Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits

Differential Revision: https://reviews.llvm.org/D43332

Change-Id: Iff3173d9e0653e830474546276ab9d30318b8ef7
llvm-svn: 325567
2018-02-20 10:03:38 +00:00
Craig Topper
2377f160f4 [X86] Add 512-bit unmasked pmulhrsw/pmulhw/pmulhuw intrinsics. Remove and auto upgrade 128/256/512 bit masked pmulhrsw/pmulhw/pmulhuw intrinsics.
The 128 and 256 bit versions were already not used by clang. This adds an equivalent unmasked 512 bit version. Then autoupgrades all sizes to use unmasked intrinsics plus select.

llvm-svn: 325559
2018-02-20 07:28:14 +00:00
Amara Emerson
a99b25a021 [AArch64][GlobalISel] When copying from a gpr32 to an fpr16 reg, convert to fpr32 first.
This is a follow on commit to r[x] where we fix the other direction of copy.
For this case, after converting the source from gpr32 -> fpr32, we use a
subregister copy, which is essentially what EXTRACT_SUBREG does in SDAG land.

https://reviews.llvm.org/D43444

llvm-svn: 325550
2018-02-20 05:11:57 +00:00
Craig Topper
fc84bda806 [X86] Mark XOP vpmac* and vpmadc intrinsics as being commutative so that tablegen will generate patterns with the load in operand 0.
This allows loads to be folded during isel without the peephole pass.

llvm-svn: 325548
2018-02-20 03:58:14 +00:00
Craig Topper
6b23b7924a [X86] Use vpmovq2m/vpmovd2m for truncate to vXi1 when possible.
Previously we used vptestmd, but the scheduling data for SKX says vpmovq2m/vpmovd2m is lower latency. We already used vpmovb2m/vpmovw2m for byte/word truncates. So this is more consistent anyway.

llvm-svn: 325534
2018-02-19 22:07:31 +00:00
Craig Topper
deb8d1bdf2 [X86] Stop swapping the operands of AVX512 setge.
We swapped the operands and used setle, but I don't see any reason to do that. I think this is a holdover from SSE where we swap and the invert to use pcmpgt. But with AVX512 we don't want an invert so we won't use pcmpgt. So there's no need to swap.

llvm-svn: 325527
2018-02-19 19:23:35 +00:00
Craig Topper
e96cad0b0e [X86] Reduce the number of isel pattern variations needed for VPTESTM/VPTESTNM matching.
Canonicalize EQ/NE PCMPM to have build vector all zeros on the RHS so we don't have to pattern match it in both locations. This significantly reduces the number of isel patterns needed since we also had to multiply it out with loads being in either operand of the 'and' input node and in the 'and' masking node.

This removes over 24000 bytes from the isel table.

llvm-svn: 325526
2018-02-19 19:23:31 +00:00
Mark Searles
70cba954aa [AMDGPU] Make note of existing waitcnt instrs; this is add-on work related to suppression of redundant waitcnt instrs. It is necessary to make note of these existing waitcnt instrs so that we do not fall into an infinite loop when handling loops. Also, [NFC] some minor code clean-up.
llvm-svn: 325524
2018-02-19 19:19:59 +00:00
Simon Pilgrim
e1f55f9c90 [SelectionDAG] ComputeKnownBits - add support for SMIN+SMAX clamp patterns
If we have a clamp pattern, SMIN(SMAX(X, LO),HI) or SMAX(SMIN(X, HI),LO) then we can deduce that the number of signbits (zeros/ones) will be at least the minimum of the LO and HI constants.

ComputeKnownBits equivalent of D43338.

Differential Revision: https://reviews.llvm.org/D43463

llvm-svn: 325521
2018-02-19 18:08:16 +00:00
Mark Searles
bf29d8d265 [AMDGPU] Increased vector length for global/constant loads.
Summary: GCN ISA supports instructions that can read 16 consecutive dwords from memory through the scalar data cache; loadstoreVectorizer should take advantage of the wider vector length and pack 16/8 elements of dwords/quadwords.

Author: FarhanaAleen

Reviewed By: rampitec

Subscribers: llvm-commits, AMDGPU

Differential Revision: https://reviews.llvm.org/D43275

llvm-svn: 325518
2018-02-19 16:42:49 +00:00