1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

175762 Commits

Author SHA1 Message Date
Simon Pilgrim
73db599280 [X86] Add PR40483 test cases
Demonstrate failure to merge ISD::ADD(x,y)/X86ISD::ADD(x,y) + ISD::SUB(x,y)/X86ISD::SUB(x,y) equivalent ops

llvm-svn: 354758
2019-02-24 21:13:29 +00:00
Simon Pilgrim
54af6d75eb [X86] Combine zext(packus(x),packus(y)) -> concat(x,y) (PR39637)
Its proving tricky to combine shuffles across multiple vector sizes, so for now I'm adding this more specific combine - the pattern is common enough to be worth it as a first step.

llvm-svn: 354757
2019-02-24 19:57:52 +00:00
Craig Topper
3fff3f791a [X86] Fix tls variable lowering issue with large code model
Summary:
The problem here is the lowering for tls variable. Below is the DAG for the code.
SelectionDAG has 11 nodes:

t0: ch = EntryToken
      t8: i64,ch = load<(load 8 from `i8 addrspace(257)* null`, addrspace 257)> t0, Constant:i64<0>, undef:i64
        t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
      t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
    t12: i64 = add t8, t11
  t4: i32,ch = load<(dereferenceable load 4 from @x)> t0, t12, undef:i64
t6: ch = CopyToReg t0, Register:i32 %0, t4
And when mcmodel is large, below instruction can NOT be folded.

  t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]
t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64
So "t11: i64,ch = load<(load 8 from got)> t0, t10, undef:i64" is lowered to " Morphed node: t11: i64,ch = MOV64rm<Mem:(load 8 from got)> t10, TargetConstant:i8<1>, Register:i64 $noreg, TargetConstant:i32<0>, Register:i32 $noreg, t0"

When llvm start to lower "t10: i64 = X86ISD::WrapperRIP TargetGlobalTLSAddress:i64<i32* @x> 0 [TF=10]", it fails.

The patch is to fold the load and X86ISD::WrapperRIP.

Fixes PR26906

Patch by LuoYuanke

Reviewers: craig.topper, rnk, annita.zhang, wxiao3

Reviewed By: rnk

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58336

llvm-svn: 354756
2019-02-24 19:33:37 +00:00
Craig Topper
bbd0f9664b [X86][SSE] Use pblendw for v4i32/v2i64 during isel.
Summary:

Previously we used BLENDPS/BLENDPD but that puts the blend in the FP domain. Under optsize, the two address instruction pass can cause blendps/blendpd to commute to blendps/blendpd. But we probably shouldn't do that if the original type was a integer. So use pblendw instead.

Reviewers: spatel, RKSimon

Reviewed By: RKSimon

Subscribers: jdoerfert, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58574

llvm-svn: 354755
2019-02-24 19:23:41 +00:00
Craig Topper
5e22d95682 [X86] Correct some ADC/SBB with immediate scheduler data for Broadwell and Skylake.
Summary:
The AX/EAX/RAX with immediate forms are 2 uops just like the AL with immediate.

The modrm form with r8 and immediate is a single uop just like r16/r32/r64 with immediate.

Reviewers: RKSimon, andreadb

Reviewed By: RKSimon

Subscribers: gbedwell, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58581

llvm-svn: 354754
2019-02-24 19:23:39 +00:00
Craig Topper
bbe3f8278e [LegalizeTypes][AArch64][X86] Make type legalization of vector (S/U)ADD/SUB/MULO follow getSetCCResultType for the overflow bits. Make UnrollVectorOverflowOp properly convert from scalar boolean contents to vector boolean contents
Summary:
When promoting the over flow vector for these ops we should use the target's desired setcc result type. This way a v8i32 result type will use a v8i32 overflow vector instead of a v8i16 overflow vector. A v8i16 overflow vector will cause LegalizeDAG/LegalizeVectorOps to have to use v8i32 and truncate to v8i16 in its expansion. By doing this in type legalization instead, we get the truncate into the DAG earlier and give DAG combine more of a chance to optimize it.

We also have to fix unrolling to use the scalar setcc result type for the scalarized operation, and convert it to the required vector element type after the scalar operation. We have to observe the vector boolean contents when doing this conversion. The previous code was just taking the scalar result and putting it in the vector. But for X86 and AArch64 that would have only put a the boolean value in bit 0 of the element and left all other bits in the element 0. We need to ensure all bits in the element are the same. I'm using a select with constants here because that's what setcc unrolling in LegalizeVectorOps used.

Reviewers: spatel, RKSimon, nikic

Reviewed By: nikic

Subscribers: javed.absar, kristof.beyls, dmgreen, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58567

llvm-svn: 354753
2019-02-24 19:23:36 +00:00
Sanjay Patel
2f589b7798 [InstCombine] add test for icmp+add fold; NFC
llvm-svn: 354750
2019-02-24 17:31:15 +00:00
Simon Pilgrim
86ceac2cf2 [X86][AVX] Rename lowerShuffleByMerging128BitLanes to lowerShuffleAsLanePermuteAndRepeatedMask. NFC.
Name better matches the other similar 'lane permute' and 'repeated mask' functions we have.

llvm-svn: 354749
2019-02-24 17:30:06 +00:00
Sanjay Patel
ab450c2c65 [InstCombine] canonicalize add/sub with bool
add A, sext(B) --> sub A, zext(B)

We have to choose 1 of these forms, so I'm opting for the
zext because that's easier for value tracking.

The backend should be prepared for this change after:
D57401
rL353433

This is also a preliminary step towards reducing the amount
of bit hackery that we do in IR to optimize icmp/select.
That should be waiting to happen at a later optimization stage.

The seeming regression in the fuzzer test was discussed in:
D58359

We were only managing that fold in instcombine by luck, and
other passes should be able to deal with that better anyway.

llvm-svn: 354748
2019-02-24 16:57:45 +00:00
Sanjay Patel
b8c79b6f63 [InstCombine] regenerate checks; NFC
llvm-svn: 354747
2019-02-24 16:11:58 +00:00
Sanjay Patel
15f35365a5 [CGP] add special-cases to form unsigned add with overflow (PR40486)
There's likely a missed IR canonicalization for at least 1 of these
patterns. Otherwise, we wouldn't have needed the pattern-matching
enhancement in D57516.

Note that -- unlike usubo added with D57789 -- the TLI hook for
this transform defaults to 'on'. So if there's any perf fallout
from this, targets should look at how they're lowering the uaddo
node in SDAG and/or override that hook.

The x86 diffs suggest that there's some missing pattern-matching
for forming inc/dec.

This should fix the remaining known problems in:
https://bugs.llvm.org/show_bug.cgi?id=40486
https://bugs.llvm.org/show_bug.cgi?id=31754

llvm-svn: 354746
2019-02-24 15:31:27 +00:00
Simon Pilgrim
47517e0964 Fix "enumeral and non-enumeral type in conditional expression" gcc7 warning. NFCI.
llvm-svn: 354745
2019-02-24 13:31:52 +00:00
Heejin Ahn
57c726900a [WebAssembly] Rename a variable in CFGStackify (NFC)
llvm-svn: 354744
2019-02-24 08:30:06 +00:00
Heejin Ahn
173f0b2ea6 [WebAssembly] Merge two identical switch case routines into one (NFC)
llvm-svn: 354743
2019-02-24 08:19:55 +00:00
Philip Reames
20a4291077 [Hexagon, SystemZ] Be super conservative about atomics
As requested during review of D57601, be equally conservative for atomic MMOs as for volatile MMOs in all in tree backends. At the moment, all atomic MMOs are also volatile, but I'm about to change that.

Reviewed as part of https://reviews.llvm.org/D58490, with other backends still pending review.  

llvm-svn: 354740
2019-02-24 00:45:09 +00:00
Duncan P. N. Exon Smith
e4b7b09efd VFS: Avoid some unnecessary std::string copies
Thread Twine a little deeper through the VFS to avoid unnecessarily
constructing the same std::string twice in a parameter sequence:

    Twine -> std::string -> StringRef -> std::string

Changing a few parameters from StringRef to Twine avoids the early call
to `Twine::str()`.

llvm-svn: 354739
2019-02-23 23:48:47 +00:00
Craig Topper
078d42ee7f [TwoAddressInstructionPass] After commuting an instruction and before trying to look for more commutable operands, resample the number of operands.
The new instruciton might have less operands than the original instruction. If we don't resample, the next loop iteration might read an operand that doesn't exist.

X86 can commute blends to movss/movsd which reduces from 4 operands to 3. This happened in the test case that caused r354363 & company to be reverted. A reduced version of that has been committed here.

Really this whole checking for more commutable operands is a little fragile. It assumes that the new instructions operands are the same order and positions as the original except for the pair that was swapped. I don't know of anything that breaks this assumption today, but I've left a fixme. Fixing this will likely require an interface change.

llvm-svn: 354738
2019-02-23 21:41:44 +00:00
Craig Topper
9fa1560bf9 Recommit r354363 "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
And its follow ups r354511, r354640.

A follow patch will fix the issue that caused it to be reverted.

llvm-svn: 354737
2019-02-23 21:41:42 +00:00
Craig Topper
4b040b1663 Recommit r354647 and r354648 "[LegalizeTypes] When promoting the result of EXTRACT_SUBVECTOR, also check if the input needs to be promoted. Use that to determine the element type to extract"
r354648 was a follow up to fix a regression "[X86] Add a DAG combine for (aext_vector_inreg (aext_vector_inreg X)) -> (aext_vector_inreg X) to fix a regression from my previous commit."

These were reverted in r354713 as their context depended on other patches that were reverted for a bug.

llvm-svn: 354734
2019-02-23 19:51:32 +00:00
Nikita Popov
55e119a4fc [WebAssembly] Fix select of and (PR40805)
Fixes https://bugs.llvm.org/show_bug.cgi?id=40805 introduced by
patterns added in D53676.

I'm removing the patterns entirely here, as they are not correct
in the general case. If necessary something more specific can be
added in the future.

Differential Revision: https://reviews.llvm.org/D58575

llvm-svn: 354733
2019-02-23 18:59:01 +00:00
Simon Pilgrim
c4075a2be7 [X86][AVX] combineInsertSubvector - remove concat_vectors(load(x),load(x)) --> sub_vbroadcast(x)
D58053/rL354340 added this to EltsFromConsecutiveLoads directly

llvm-svn: 354732
2019-02-23 18:53:03 +00:00
Simon Pilgrim
809b1cc493 Fix MSVC constant truncation warnings. NFCI.
llvm-svn: 354731
2019-02-23 18:49:02 +00:00
Simon Pilgrim
02bf1c71b9 [X86][AVX] concat_vectors(scalar_to_vector(x),scalar_to_vector(x)) --> broadcast(x)
For AVX1, limit this to i32/f32/i64/f64 loading cases only.

llvm-svn: 354730
2019-02-23 18:34:05 +00:00
Simon Pilgrim
c64afdacb4 [X86][AVX] Shuffle->Permute+Blend if we have one v4f64/v4i64 shuffle input in place
Even on AVX1 we can pretty cheaply (VPERM2F128+VSHUFPD) permute a single v4f64/v4i64 input (on AVX2 its just a single VPERMPD), followed by a BLENDPD.

llvm-svn: 354729
2019-02-23 17:10:47 +00:00
Simon Dardis
66e9da0efb [MIPS] Fix a incorrect test. (NFC)
This test is incorrect as it should be using the microMIPSR6 instruction to
return, not the microMIPS version.

llvm-svn: 354726
2019-02-23 15:56:32 +00:00
Craig Topper
91b9e78849 [X86] Sign extend the 8-bit immediate when commuting blend instructions to match isel.
Conversion from ConstantSDNode to MachineInstr sign extends immediates from their APInt representation to int64_t.

This commit makes sure we do the same for commuting. The tests changes show how this improves CSE. This issue was made worse by the MachineCSE using commuteInstruction to undo a commute. So we virtually guarantee the sign extend from isel would be lost.

The improved CSE also occurred with r354363, but that was reverted. I'm working to undo the revert, but wanted to get this fix in while it was easy to see the results.

llvm-svn: 354724
2019-02-23 08:34:10 +00:00
Michael Trent
3711b06634 objdump fails to parse Mach-O binaries with n_desc bearing stabs
Summary:
The objdump Mach-O parser uses MachOObjectFile::checkSymbolTable() to
verify the symbol table is in a legal state before dereferencing the
offsets in the table. This routine missed a test for N_STAB symbols
when validating the two-level name space library ordinal for undefined
symbols. If the binary in question contained a value in the n_desc high
byte that is larger than the list of loaded dylibs, checkSymbolTable()
will flag the library ordinal as being out of range. Most of the time
the n_desc field is set to 0 or to small values, but old final linked
binaries exist with N_STAB symbols bearing non-trivial n_desc fields. 

The change here is simply to verify a symbol is not an N_STAB symbol
before consulting the values of n_other or n_desc.

rdar://44977336

Reviewers: lhames, pete, ab

Reviewed By: pete

Subscribers: llvm-commits, rupprecht

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58568

llvm-svn: 354722
2019-02-23 06:19:56 +00:00
Daniel Sanders
14dada385a Try again to fix memory leak in r354692
The previous one didn't fix everything.

llvm-svn: 354719
2019-02-23 03:25:37 +00:00
Jordan Rupprecht
2aa95e5635 [NFC] Fix typos: preceeding -> preceding
llvm-svn: 354715
2019-02-23 01:28:32 +00:00
Reid Kleckner
183ab9f47e Revert r354363 & co "[X86][SSE] Generalize X86ISD::BLENDI support to more value types"
r354363 caused https://crbug.com/934963#c1, which has a plain C reduced
test case.

I also had to revert some dependent changes:
- r354648
- r354647
- r354640
- r354511

llvm-svn: 354713
2019-02-23 01:19:42 +00:00
Daniel Sanders
a2df94ba8e Fix memory leak in r354692
llvm-svn: 354712
2019-02-23 01:13:35 +00:00
Craig Topper
474596aa31 [LegalizeTypes] Use PromoteTargetBoolean in PromoteIntOp_ADDSUBCARRY instead of reimplementing it. NFCI
llvm-svn: 354710
2019-02-23 00:38:19 +00:00
Craig Topper
4ae7ed2eba [X86] Enable custom splitting of v8i64/v16i32 sext/zext for avx/avx2 when input type will be promoted by the type legalize to 128-bits.
If the the input type will be promoted to 128 bits its better to put a sign_extend_inreg/and in the 128 bit register before the split occurs. Otherwise we end up doing it on each half in the wider register.

Some of the overflow arithmetic tests are regressions, but I think we can make some improvement using getSetccResultType in DAG combine and/or type legalization.

llvm-svn: 354709
2019-02-23 00:35:02 +00:00
Craig Topper
37bfabbeeb [X86] Add a few test cases for a v8i64 sext/zext from an illegal type that needs to be promoted to 128 bits.
If v8i64 isn't a legal type but v4i64 is, these will be split and then each half will get their input promoted and become an any_extend_vector_inreg/punpckhwd + any_extend + and/sign_extend_inreg.

If we instead recognize the input will be promoted we can emit the and/sign_extend_inreg first in a 128 bit register. Then we can sign_extend/zero_extend one half and pshufd+sign_extend/zero_extend the other half.

llvm-svn: 354708
2019-02-23 00:34:58 +00:00
Sam Clegg
681632d27a [WebAssembly] Update CodeGen test expectations after rL354697. NFC
llvm-svn: 354705
2019-02-23 00:07:39 +00:00
Konstantin Zhuravlyov
2aca58e35e Revert "AMDGPU/NFC: Cleanup subtarget predicates"
It breaks one of our downstream merges, so revert it
temporarily while investigating failures downstream

llvm-svn: 354700
2019-02-22 23:21:06 +00:00
Sanjay Patel
58e20b9b0d [CGP] add tests for uaddo increment/decrement; NFC
llvm-svn: 354699
2019-02-22 23:19:34 +00:00
Sam Clegg
669e962bdd [WebAssembly] Remove unneeded MCSymbolRefExpr variants
We record the type of the symbol (event/function/data/global) in the
MCWasmSymbol and so it should always be clear how to handle a relocation
based on the symbol itself.

The exception is a function which still needs the special @TYPEINDEX
then the relocation contains the signature rather than the address
of the functions.

Differential Revision: https://reviews.llvm.org/D58472

llvm-svn: 354697
2019-02-22 22:29:34 +00:00
Sam Clegg
a3d97c3207 [WebAssembly] MC: Handle aliases of aliases
Differential Revision: https://reviews.llvm.org/D58417

llvm-svn: 354694
2019-02-22 21:41:42 +00:00
David Greene
c8d9305d8b [CMake] Honor LLVM_EXTERNAL_<proj>_SOURCE_DIR
When LLVM_ENABLE_PROJECTS is set, CMake assumes the project
directories are all side-by-side. This is not always the case and
there's no reason to expect it if LLVM_EXTERNAL_<proj>_SOURCE_DIR is
set. Honor that setting if it exists and allow the build configuration
to continue.

Differential Revision: https://reviews.llvm.org/D49672

llvm-svn: 354693
2019-02-22 21:19:48 +00:00
Daniel Sanders
6638a8d664 Restore ability for C++ API users to Enable IPRA.
Summary:
Prior to r310876 one of our out-of-tree targets was enabling IPRA by modifying
the TargetOptions::EnableIPRA. This no longer works on current trunk since the
useIPRA() hook overrides any values that are set in advance. This patch adjusts
the behaviour of the hook so that API users and useIPRA() can both enable it
but useIPRA() cannot disable it if the API user already enabled it.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: wdng, mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D38043

llvm-svn: 354692
2019-02-22 20:59:07 +00:00
Sanjay Patel
07886f07c6 [CGP] move overflow intrinsic insertion to common location; NFCI
We need to enhance the uaddo matching to handle special-cases
as seen in PR40486 and PR31754. That means we won't necessarily
have a def-use pattern, so we'll need to check dominance to
determine where to place the intrinsic (as we already do for
usubo). This preliminary patch is just rearranging the code,
so the planned follow-up to improve uaddo will be more clear.

llvm-svn: 354689
2019-02-22 20:20:24 +00:00
Matt Arsenault
a13af32519 MIR: Preserve incoming frame index numbers
Don't skip incrementing the frame index number
if the object is dead. Instructions can still be
referencing the old frame index number, and this
doesn't attempt to remap those. The resulting
MIR then fails to load because the use instructions
use a higher frame index number than recorded
list of stack objects.

I'm not sure it's possible to craft a testcase
with the existing set of passes. It requires
selectively marking some stack objects
dead in an essentially random order.
StackSlotColoring condenses towards
the low indexes. This avoids a regression in a
future AMDGPU commit when some frame indexes
are lowered separately from PEI.

llvm-svn: 354688
2019-02-22 19:30:38 +00:00
Matt Arsenault
8c9e3b0e07 CodeGen: Make RegAllocRegistry a template class
Will allow re-using the machinery for independent
sets of register allocators.

This will allow AMDGPU to use separate command line
options for the allocator to use for SGPRs separate
from VGPRs.

llvm-svn: 354687
2019-02-22 19:16:52 +00:00
Matt Arsenault
268284573c AMDGPU: Use removeAllRegUnitsForPhysReg
llvm-svn: 354686
2019-02-22 19:03:36 +00:00
Matt Arsenault
6386605787 LiveIntervals: Add removeAllRegUnitsForPhysReg
Convenience wrapper for removing the reg units of
a physical register.

llvm-svn: 354685
2019-02-22 19:03:31 +00:00
Sam Clegg
ff57137091 [WebAssembly] Remove debug statement submitted in rL354657
Subscribers: dschuff, jgravelle-google, hiraditya, aheejin, sunfish, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58549

llvm-svn: 354684
2019-02-22 19:00:03 +00:00
Mitch Phillips
57292f5f70 [GN] Updated build file to allow GN builds to succeed at ToT.
llvm-svn: 354683
2019-02-22 18:45:41 +00:00
Guozhi Wei
df3e0b152f [MBP] Factor out function hasViableTopFallthrough and enhancement
This patch factor out the function hasViableTopFallthrough from rotateLoop. It is also enhanced. Original code checks only if there is a block can be placed before current loop top. This patch also checks if the loop top is the most possible successor of its predecessor. The attached test case shows its effect.

Differential Revision: https://reviews.llvm.org/D58393

llvm-svn: 354682
2019-02-22 18:04:37 +00:00
Nirav Dave
2f6186d4a3 Disable big-endian constant store merges from rL354676.
llvm-svn: 354677
2019-02-22 16:20:34 +00:00