1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00
Commit Graph

43390 Commits

Author SHA1 Message Date
Simon Pilgrim
1abbd42308 [X86][SSE] Added missing vector logic intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them.

llvm-svn: 309715
2017-08-01 17:51:20 +00:00
Craig Topper
a8dd496fff [X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask
Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36129

llvm-svn: 309706
2017-08-01 17:18:14 +00:00
Simon Pilgrim
b385b3e7e3 [X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class

llvm-svn: 309701
2017-08-01 16:47:48 +00:00
Simon Pilgrim
be72c0a440 [X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules
llvm-svn: 309699
2017-08-01 16:18:25 +00:00
Craig Topper
acb202221c [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32.
We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though.

This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need.

This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch.

Differential Revision: https://reviews.llvm.org/D35978

llvm-svn: 309693
2017-08-01 15:31:24 +00:00
Strahinja Petrovic
6c965b443f [Mips] Fix for BBIT octeon instruction
This patch enables control flow optimization for
variations of BBIT instruction. In this case
optimization removes unnecessary branch after
BBIT instruction.

Differential Revision: https://reviews.llvm.org/D35359

llvm-svn: 309679
2017-08-01 13:42:45 +00:00
Krzysztof Parzyszek
cfa276ba08 [Hexagon] Convert HVX vector constants of i1 to i8
Certain operations require vector of i1 values. However, for Hexagon
architecture compatibility, they need to be represented as vector of i8.

Patch by Suyog Sarda.

llvm-svn: 309677
2017-08-01 13:12:53 +00:00
Tom Stellard
773b728b26 AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35916

llvm-svn: 309675
2017-08-01 12:38:33 +00:00
Craig Topper
6e16535e63 [AVX-512] Add unmasked subvector inserts and extract to the execution domain tables.
llvm-svn: 309632
2017-07-31 22:07:29 +00:00
Konstantin Belochapka
15b13b5b03 [X86][MMX] Added custom lowering action for MMX SELECT (PR30418)
Fix for pr30418 - error in backend: Cannot select: t17: x86mmx = select_cc t2, Constant:i64<0>, t7, t8, seteq:ch
Differential Revision: https://reviews.llvm.org/D34661

llvm-svn: 309614
2017-07-31 20:11:49 +00:00
Craig Topper
9ecce29130 [AVX-512] Remove patterns that select vmovdqu8/16 for unmasked loads. Prefer vmovdqa64/vmovdqu64 instead.
These were taking priority over the aligned load instructions since there is no vmovda8/16. I don't think there is really a difference between aligned and unaligned on newer cpus so I don't think it matters which instructions we use.

But with this change we reduce the size of the isel table a little and we allow the aligned information to pass through to the evex->vec pass and produce the same output has avx/avx2 in some cases.

I also generally dislike patterns rooted in a bitcast which these were.

Differential Revision: https://reviews.llvm.org/D35977

llvm-svn: 309589
2017-07-31 17:35:44 +00:00
Simon Pilgrim
3c8fbe6ce1 Strip trailing whitespace. NFCI.
llvm-svn: 309584
2017-07-31 17:09:27 +00:00
Simon Pilgrim
fee6b30d2b Fix typo in comment.
llvm-svn: 309583
2017-07-31 17:06:55 +00:00
Aditya Nandakumar
d21361559b [GISel]: Support Widening G_ICMP's destination operand.
Updated AArch64 to widen destination to s32.
https://reviews.llvm.org/D35737

Reviewed by Tim

llvm-svn: 309579
2017-07-31 17:00:16 +00:00
Amaury Sechet
15f2f805f5 Do not recombine FMA when that is not needed.
Summary: As per title. This creates useless recombines.

Reviewers: jyknight, nemanjai, mkuper, spatel, RKSimon, zvi, bkramer

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D33848

llvm-svn: 309578
2017-07-31 16:56:25 +00:00
Florian Hahn
668b7d8a52 Exclude more unused functions from release build.
llvm-svn: 309576
2017-07-31 16:44:28 +00:00
Alexey Bataev
55309303be [Cost] Rename getReductionCost() to getArithmeticReductionCost(), NFC.
llvm-svn: 309563
2017-07-31 14:19:32 +00:00
Florian Hahn
3d431b22ac Guard print() functions only used by dump() functions.
Summary:
Since  r293359, most dump() function are only defined when
`!defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)` holds. print() functions
only used by dump() functions are now unused in release builds,
generating lots of warnings. This patch only defines some print()
functions if they are used.

Reviewers: MatzeB

Reviewed By: MatzeB

Subscribers: arsenm, mzolotukhin, nhaehnle, llvm-commits

Differential Revision: https://reviews.llvm.org/D35949

llvm-svn: 309553
2017-07-31 10:07:49 +00:00
Guy Blank
c243f50073 [X86][AVX512] Add masked MOVS[S|D] patterns
Added patterns to recognize AND 1 on the mask of a scalar masked
move is not needed since only the lower bit is relevant for the
instruction.

Differential Revision:
https://reviews.llvm.org/D35897

llvm-svn: 309546
2017-07-31 08:26:14 +00:00
Hiroshi Inoue
0930805caf [PowerPC] Change method names; NFC
Changed method names based on the discussion in https://reviews.llvm.org/D34986:
getInt64 -> selectI64Imm,
getInt64Count -> selectI64ImmInstrCount.

llvm-svn: 309541
2017-07-31 06:27:09 +00:00
Craig Topper
a9ba3eaba7 [X86] Add pattern to use bzhi for 64-bit 'and' with a mask when there is a load involved.
We already had a pattern without load, but with a load we were falling back to a regular 'and' due to pattern complexity priority.

llvm-svn: 309535
2017-07-31 05:55:54 +00:00
Coby Tayree
a7b1431e75 [x86][inline-asm][ms-compat] legalize the use of "jc/jz short <op>"
MS ignores the keyword "short" when used after a jc/jz instruction, LLVM ought to do the same.
Test: D35893

Differential Revision: https://reviews.llvm.org/D35892

llvm-svn: 309509
2017-07-30 11:12:47 +00:00
Craig Topper
301ff9a0bb [X86] Add addsub intrinsics to the intrinsic lowering table so we have a single set of isel patterns.
llvm-svn: 309502
2017-07-30 06:02:59 +00:00
Florian Hahn
40a1bbcfdf [AArch64] Tie source and destination operands for AESMC/AESIMC.
Summary:
Most CPUs implementing AES fusion require instruction pairs of the form
    AESE Vn, _
    AESMC Vn, Vn
and
    AESD Vn, _
    AESIMC Vn, Vn

The constraint is added to AES(I)MC instructions which use the result of
an AES(E|D) instruction by using AES(I)MCTrr pseudo instructions, which
constraint source and destination registers to be the same.

A nice side effect of this change is that now all possible pairs are
scheduled back-to-back on the exynos-m1 for the misched-fusion-aes.ll
test case.

I had to update aes_load_store. The version I added initially was very
reduced and with the new constraint, AESE/AESMC could not be scheduled
back-to-back. I updated the test to be more realistic and still expose
the same scheduling problem as the initial test case.

Reviewers: t.p.northover, rengolin, evandro, kristof.beyls, silviu.baranga

Reviewed By: t.p.northover, evandro

Subscribers: aemerson, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35299

llvm-svn: 309495
2017-07-29 20:35:28 +00:00
Florian Hahn
87ae49923a [AArch64] Use 8 bytes as preferred function alignment on Cortex-A53.
Summary:
This change gives a 0.25% speedup on execution time, a 0.82% improvement
in benchmark scores and a 0.20% increase in binary size on a Cortex-A53.
These numbers are the geomean results on a wide range of benchmarks from
the test-suite and a range of proprietary suites.

Reviewers: t.p.northover, aadg, silviu.baranga, mcrosier, rengolin

Reviewed By: rengolin

Subscribers: grimar, davide, aemerson, rengolin, javed.absar, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35568

llvm-svn: 309494
2017-07-29 20:04:54 +00:00
Simon Pilgrim
9f39c8ed74 [SelectionDAG][X86] CombineBT - more aggressively determine demanded bits
This patch is in 2 parts:

1 - replace combineBT's use of SimplifyDemandedBits (hasOneUse only) with SelectionDAG::GetDemandedBits to more aggressively determine the lower bits used by BT.

2 - update SelectionDAG::GetDemandedBits to support ANY_EXTEND - if the demanded bits are only in the non-extended portion, then peek through and demand from the source value and then ANY_EXTEND that if we found a match.

Differential Revision: https://reviews.llvm.org/D35896

llvm-svn: 309486
2017-07-29 14:50:25 +00:00
Tom Stellard
aec2949441 AMDGPU: Remove deadcode from AMDGPUInstPrinter
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36034

llvm-svn: 309477
2017-07-29 03:56:53 +00:00
Tom Stellard
50637dd0c1 AMDGPU: Move INDIRECT_BASE_ADDR definition out of common files
Summary: This is only used by R600.

Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35926

llvm-svn: 309476
2017-07-29 03:44:07 +00:00
Jessica Paquette
09730f1b79 [MachineOutliner] NFC: Change IsTailCall to a call class + frame class
This commit

- Removes IsTailCall and replaces it with a target-defined unsigned
- Refactors getOutliningCallOverhead and getOutliningFrameOverhead so that they don't use IsTailCall
- Adds a call class + frame class classification to OutlinedFunction and Candidate respectively

This accomplishes a couple things.

Firstly, we don't need the notion of *tail call* in the general outlining algorithm.

Secondly, we now can have different "outlining classes" for each candidate within a set of candidates.
This will make it easy to add new ways to outline sequences for certain targets and dynamically choose
an appropriate cost model for a sequence depending on the context that that sequence lives in.

Ultimately, this should get us closer to being able to do something like, say avoid saving the link
register when outlining AArch64 instructions.

llvm-svn: 309475
2017-07-29 02:55:46 +00:00
Matt Arsenault
fa41204583 AMDGPU: Make areMemAccessesTriviallyDisjoint more aware of segment flat
Checking the encoding is insufficient since now there can
be global or scratch instructions.

llvm-svn: 309472
2017-07-29 01:26:21 +00:00
Matt Arsenault
00e223b7a3 AMDGPU: Teach isLegalAddressingMode about global_* instructions
Also refine the flat check to respect flat-for-global feature,
and constant fallback should check global handling, not
specifically MUBUF.

llvm-svn: 309471
2017-07-29 01:12:31 +00:00
Matt Arsenault
686b41774f AMDGPU: Start selecting global instructions
llvm-svn: 309470
2017-07-29 01:03:53 +00:00
Eugene Zelenko
9f5f9de143 [Hexagon] Fix some Clang-tidy modernize-use-using and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 309469
2017-07-29 00:56:56 +00:00
Adrian Prantl
303aa6a2d1 Remove the unused offset from DBG_VALUE (NFC)
Followup to r309426.
rdar://problem/33580047

llvm-svn: 309450
2017-07-28 23:00:45 +00:00
Krzysztof Parzyszek
7fb96365c5 [Hexagon] Formatting changes, NFC
llvm-svn: 309442
2017-07-28 21:52:21 +00:00
Matt Arsenault
54fe21d18f AMDGPU: Look through a bitcast user of an out argument
This allows handling of a lot more of the interesting
cases in Blender. Most of the large functions unlikely
to be inlined have this pattern.

This is a special case for what clang emits for OpenCL 3
element vectors. Annoyingly, these are emitted as
<3 x elt>* pointers, but accessed as <4 x elt>* operations.
This also needs to handle cases where a struct containing
a single vector is used.

llvm-svn: 309419
2017-07-28 19:06:16 +00:00
Matt Arsenault
b3d988ed66 AMDGPU: Add pass to replace out arguments
It is better to return arguments directly in registers
if we are making a call rather than introducing expensive
stack usage. In one of sample compile from one of
Blender's many kernel variants, this fires on about
~20 different functions. Future improvements may be to
recognize simple cases where the pointer is indexing a small
array. This also fails when the store to the out argument
is in a separate block from the return, which happens in
a few of the Blender functions. This should also probably
be using MemorySSA which might help with that.

I'm not sure this is correct as a FunctionPass, but
MemoryDependenceAnalysis seems to not work with
a ModulePass.

I'm also not sure where it should run.I think it should
run  before DeadArgumentElimination, so maybe either
EP_CGSCCOptimizerLate or EP_ScalarOptimizerLate.

llvm-svn: 309416
2017-07-28 18:40:05 +00:00
Tim Northover
0b9cfde62b GlobalISel: map 128-bit values to an FPR by default.
Eventually we may want to allow a pair of GPRs but absolutely nothing in the
entire world is ready for that yet.

llvm-svn: 309404
2017-07-28 17:11:01 +00:00
Matt Arsenault
2442068eec AMDGPU: Annotate implicitarg.ptr usage
We need to pass something to functions for this to work.
It isn't derivable just from the kernarg segment pointer
because the implicit arguments are placed after the
kernel arguments.

Also fixes missing test for the intrinsic.

llvm-svn: 309398
2017-07-28 15:52:08 +00:00
Joel Jones
a494f4ebf4 [AArch64] Standardize suffixes for LSE Atomics mnemonics (NFCI)
This NFC changeset standardizes the suffixes used for LSE Atomics
instructions.

It changes the existing suffixes - 'b', 'h', 's', 'd' - to the existing
standard 'B', 'H', 'W' and 'X'.

This changeset is the result of the code review discussion for D35319.

Patch by: steleman

Differential Revision: https://reviews.llvm.org/D35927

llvm-svn: 309384
2017-07-28 14:09:24 +00:00
Strahinja Petrovic
ee49acb6b6 [ARM] Add the option to directly access TLS pointer
This patch enables choice for accessing thread local
storage pointer (like '-mtp' in gcc).

Differential Revision: https://reviews.llvm.org/D34408

llvm-svn: 309381
2017-07-28 12:54:57 +00:00
Jessica Paquette
ffc8e4d730 [MachineOutliner] NFC: Split up getOutliningBenefit
This is some more cleanup in preparation for some actual
functional changes. This splits getOutliningBenefit into
two cost functions: getOutliningCallOverhead and
getOutliningFrameOverhead. These functions return the
number of instructions that would be required to call
a specific function and the number of instructions
that would be required to construct a frame for a
specific funtion. The actual outlining benefit logic
is moved into the outliner, which calls these functions.

The goal of refactoring getOutliningBenefit is to:

- Get us closer to getting rid of the IsTailCall flag

- Further split up "target-specific" things and
"general algorithm" things

llvm-svn: 309356
2017-07-28 03:21:58 +00:00
Matthias Braun
76bb6f704f ARMFrameLowering: Only set ExtraCSSpill for actually unused registers.
The code assumed that unclobbered/unspilled callee saved registers are
unused in the function. This is not true for callee saved registers that are
also used to pass parameters such as swiftself.

rdar://33401922

llvm-svn: 309350
2017-07-28 01:36:32 +00:00
Reid Kleckner
7622646303 [X86] Fix latent bug in sibcall eligibility logic
The X86 tail call eligibility logic was correct when it was written, but
the addition of inalloca and argument copy elision broke its
assumptions. It was assuming that fixed stack objects were immutable.

Currently, we aim to emit a tail call if no arguments have to be
re-arranged in memory. This code would trace the outgoing argument
values back to check if they are loads from an incoming stack object.
If the stack argument is immutable, then we won't need to store it back
to the stack when we tail call.

Fortunately, stack objects track their mutability, so we can just make
the obvious check to fix the bug.

This was http://crbug.com/749826

llvm-svn: 309343
2017-07-28 00:58:35 +00:00
Adrian Prantl
c38c834cb8 Remove unused function from AArch64 backend (NFC)
llvm-svn: 309336
2017-07-27 23:52:06 +00:00
Ahmed Bougacha
93df3fb674 [X86] Don't lie about legality to TLI's demanded bits.
Like r309323, X86 had a typo where it passed the wrong flags to TLO.

Found by inspection; I haven't been able to tickle this into having
observable behavior.  I don't think it does, given that X86 doesn't have
custom demanded bits logic, and the generic logic doesn't have a lot of
exposure to illegal constructs.

llvm-svn: 309325
2017-07-27 21:28:59 +00:00
Ahmed Bougacha
ad53a4e70b [AArch64] Remove outdated comment. NFC.
There hasn't been a ternary since r231987.

llvm-svn: 309324
2017-07-27 21:27:58 +00:00
Ahmed Bougacha
bc1b60a38c [AArch64] Fix legality info passed to demanded bits for TBI opt.
The (seldom-used) TBI-aware optimization had a typo lying dormant since
it was first introduced, in r252573:  when asking for demanded bits, it
told TLI that it was running after legalize, where the opposite was
true.

This is an important piece of information, that the demanded bits
analysis uses to make assumptions about the node.  r301019 added such an
assumption, which was broken by the TBI combine.

Instead, pass the correct flags to TLO.

llvm-svn: 309323
2017-07-27 21:27:25 +00:00
Florian Hahn
b38e4bb456 [ARM] Add use-misched feature, to enable the MachineScheduler.
Summary:
This change makes it easier to experiment with the MachineScheduler in
the ARM backend and also makes it very explicit which CPUs use the
MachineScheduler (currently only swift and cyclone).



Reviewers: MatzeB, t.p.northover, javed.absar

Reviewed By: MatzeB

Subscribers: aemerson, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D35935

llvm-svn: 309316
2017-07-27 19:56:44 +00:00
Dinar Temirbulatov
a9bac28124 [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35839

llvm-svn: 309298
2017-07-27 17:47:01 +00:00