1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

44263 Commits

Author SHA1 Message Date
Matt Arsenault
8e264191d7 AMDGPU: Fix invalid copies when copying i1 to phys reg
Insert a VReg_1 virtual register so the i1 workaround pass
can handle it.

llvm-svn: 300113
2017-04-12 21:58:23 +00:00
Stanislav Mekhanoshin
51dba4fa40 [AMDGPU] Generate range metadata for workitem id
If workgroup size is known inform llvm about range returned by local
id  and local size queries.

Differential Revision: https://reviews.llvm.org/D31804

llvm-svn: 300102
2017-04-12 20:48:56 +00:00
Piotr Padlewski
cd98794503 Remove readnone from invariant.group.barrier
Summary:
Readnone attribute would cause CSE of two barriers with
the same argument, which is invalid by example:

    struct Base {
          virtual int foo() { return 42; }
    };

    struct Derived1 : Base {
          int foo() override { return 50; }
    };

    struct Derived2 : Base {
          int foo() override { return 100; }
    };

    void foo() {
        Base *x = new Base{};
        new (x) Derived1{};
        int a = std::launder(x)->foo();
        new (x) Derived2{};
        int b = std::launder(x)->foo();
    }

Here 2 calls of std::launder will produce @llvm.invariant.group.barrier,
which would be merged into one call, causing devirtualization
to devirtualize second call into Derived1::foo() instead of
Derived2::foo()

Reviewers: chandlerc, dberlin, hfinkel

Subscribers: llvm-commits, rsmith, amharc

Differential Revision: https://reviews.llvm.org/D31531

llvm-svn: 300101
2017-04-12 20:45:12 +00:00
Sanjay Patel
013822ac22 [InstCombine] fix wrong undef handling when converting select to shuffle
As discussed in:
https://bugs.llvm.org/show_bug.cgi?id=32486
...the canonicalization of vector select to shufflevector does not hold up
when undef elements are present in the condition vector. 

Try to make the undef handling clear in the code and the LangRef.

Differential Revision: https://reviews.llvm.org/D31980

llvm-svn: 300092
2017-04-12 18:39:53 +00:00
Peter Collingbourne
47c4d3700d llvm-lto2: Add a dump-symtab subcommand.
This allows us to test the symbol table APIs for LTO input files.

Differential Revision: https://reviews.llvm.org/D31920

llvm-svn: 300086
2017-04-12 18:27:00 +00:00
Craig Topper
5d10384cce [InstCombine] Teach SimplifyDemandedInstructionBits that even if we reach an instruction that has multiple uses, if we know all the bits for the demanded bits for this context we can go ahead and create a constant.
Currently if we reach an instruction with multiples uses we know we can't do any optimizations to that instruction itself since we only have the demanded bits for one of the users. But if we know all of the bits are zero/one for that one user we can still go ahead and create a constant to give to that user.

This might then reduce the instruction to having a single use and allow additional optimizations on the other path.

This picks up an additional case that r300075 didn't catch.

Differential Revision: https://reviews.llvm.org/D31552

llvm-svn: 300084
2017-04-12 18:17:46 +00:00
Renato Golin
4d592d0072 [SystemZ] Fix more target specific tests
llvm-svn: 300081
2017-04-12 18:03:09 +00:00
Renato Golin
941c1f67e7 [SystemZ] Fix target specific tests
llvm-svn: 300078
2017-04-12 17:14:46 +00:00
Dmitry Preobrazhensky
6e59517a49 [AMDGPU][MC] Added support for several VI-specific opcodes (s_wakeup, etc)
Added support for VI:

- s_endpgm_saved
- s_wakeup
- s_rfe_restore_b64
- v_perm_b32

Enabled for VI:

- v_mov_fed_b32
- v_mov_fed_b32_e64

See bug 32593: https://bugs.llvm.org//show_bug.cgi?id=32593

Reviewers: artem.tamazov, vpykhtin

Differential Revision: https://reviews.llvm.org/D31931

llvm-svn: 300076
2017-04-12 17:10:07 +00:00
Craig Topper
566511cc61 Teach SimplifyDemandedUseBits that adding or subtractings 0s from every bit below the highest demanded bit can be simplified
If we are adding/subtractings 0s below the highest demanded bit we can just use the other operand and remove the operation.

My primary motivation is observing that we can call ShrinkDemandedConstant for the add/sub and create a 0 constant, rather than removing the add completely. In the case I saw, we modified the constant on an add instruction to a 0, but the add is not put into the worklist. So we didn't revisit it until the next InstCombine iteration. This caused an IR modification to remove add and a subsequent iteration to be ran.

With this change we get bypass the add in the first iteration and prevent the second iteration from changing anything.

Differential Revision: https://reviews.llvm.org/D31120

llvm-svn: 300075
2017-04-12 16:49:59 +00:00
Dmitry Preobrazhensky
51f67824d0 [AMDGPU][MC] Corrected parsing of v_cmp_class* and v_cmpx_class*
Fixed bug 32565: https://bugs.llvm.org//show_bug.cgi?id=32565

Reviewers: vpykhtin

Differential Revision: https://reviews.llvm.org/D31820

llvm-svn: 300073
2017-04-12 16:31:18 +00:00
Dmitry Preobrazhensky
47505ee81b [AMDGPU][MC] Corrected encoding of V_MQSAD_U32_U8 for CI
Corrected encoding of V_MQSAD_U32_U8 for CI

See bug 32552: https://bugs.llvm.org//show_bug.cgi?id=32552

Reviewers: vpykhtin

Differential Revision: https://reviews.llvm.org/D31810

llvm-svn: 300070
2017-04-12 15:36:09 +00:00
Sanjay Patel
3e67e52a7a [InstCombine] morph an existing instruction instead of creating a new one
One potential way to make InstCombine (very slightly?) faster is to recycle instructions 
when possible instead of creating new ones. It's not explicitly stated AFAIK, but we don't
consider this an "InstSimplify". We could, however, make a new layer to house transforms 
like this if that makes InstCombine more manageable (just throwing out an idea; not sure 
how much opportunity is actually here).

Differential Revision: https://reviews.llvm.org/D31863

llvm-svn: 300067
2017-04-12 15:11:33 +00:00
Dmitry Preobrazhensky
845e80dca8 [AMDGPU][MC] Corrected ds_wrxchg2* to support two offsets
Fixed bug 28227: https://bugs.llvm.org//show_bug.cgi?id=28227

Reviewers: vpykhtin

Differential Revision: https://reviews.llvm.org/D31808

llvm-svn: 300066
2017-04-12 14:29:45 +00:00
Jonas Paulsson
beec3fe279 Fix a RUN line in new test.
Use '2>&1 |' and not '|&' to pipe debug output to FileCheck

Hopefully handles a "shell parser error" on
llvm-clang-x86_64-expensive-checks-win

test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll

llvm-svn: 300064
2017-04-12 14:25:08 +00:00
Jonas Paulsson
eedad8c536 [SLPVectorizer] Pass the right type argument to getCmpSelInstrCost()
In getEntryCost(), make the scalar type for a compare instruction that of the
operands, not i1. This is needed in order to call getCmpSelInstrCost() for a
compare in a sensible way, the same way as the LoopVectorizer does.

New test: test/Transforms/SLPVectorizer/SystemZ/SLP-cmp-cost-query.ll

Review: Matthew Simpson
https://reviews.llvm.org/D31601

llvm-svn: 300061
2017-04-12 13:29:25 +00:00
Jonas Paulsson
a16794b9f4 [LoopVectorizer] Improve handling of branches during cost estimation.
The cost for a branch after vectorization is very different depending on if
the vectorizer will if-convert the block (branch is eliminated), or if
scalarized and predicated blocks will be produced (branch duplicated before
each block). There is also the case of remaining scalar branches, such as the
back-edge branch.

This patch handles these cases differently with TTI based cost estimates.

Review: Matthew Simpson
https://reviews.llvm.org/D31175

llvm-svn: 300058
2017-04-12 13:13:15 +00:00
Igor Breger
3cf58dd1e2 [GlobalIsel][X86] support G_CONSTANT selection.
Summary: [GlobalISel][X86] support G_CONSTANT selection. Add regbank select tests.

Reviewers: zvi, guyblank

Reviewed By: guyblank

Subscribers: llvm-commits, dberris, rovka, kristof.beyls

Differential Revision: https://reviews.llvm.org/D31974

llvm-svn: 300057
2017-04-12 12:54:54 +00:00
Jonas Paulsson
b15d643e4a [LoopVectorizer, TTI] New method supportsEfficientVectorElementLoadStore()
Since SystemZ supports vector element load/store instructions, there is no
need for extracts/inserts if a vector load/store gets scalarized.

This patch lets Target specify that it supports such instructions by means of
a new TTI hook that defaults to false.

The use for this is in the LoopVectorizer getScalarizationOverhead() method,
which will with this patch produce a smaller sum for a vector load/store on
SystemZ.

New test: test/Transforms/LoopVectorize/SystemZ/load-store-scalarization-cost.ll

Review: Adam Nemet
https://reviews.llvm.org/D30680

llvm-svn: 300056
2017-04-12 12:41:37 +00:00
Dmitry Preobrazhensky
1c55b0524e [AMDGPU][MC] Corrected src0 size for s_cbranch_join
Fix for bug 28159: https://bugs.llvm.org//show_bug.cgi?id=28159

Reviewers: vpykhtin, arsenm

Differential Revision: https://reviews.llvm.org/D31595

llvm-svn: 300055
2017-04-12 12:40:19 +00:00
Jonas Paulsson
06c4fb73cd [SystemZ] Updated test fp-cast.ll
This did not get included in the previous commit for SystemZ cost functions.

llvm-svn: 300053
2017-04-12 12:11:41 +00:00
Jonas Paulsson
90b172efa0 [SystemZ] TargetTransformInfo cost functions implemented.
getArithmeticInstrCost(), getShuffleCost(), getCastInstrCost(),
getCmpSelInstrCost(), getVectorInstrCost(), getMemoryOpCost(),
getInterleavedMemoryOpCost() implemented.

Interleaved access vectorization enabled.

BasicTTIImpl::getCastInstrCost() improved to check for legal extending loads,
in which case the cost of the z/sext instruction becomes 0.

Review: Ulrich Weigand, Renato Golin.
https://reviews.llvm.org/D29631

llvm-svn: 300052
2017-04-12 11:49:08 +00:00
Sam Kolton
0f0e788e26 [AMDGPU] SDWA: make pass global
Summary: Remove checks for basic blocks.

Reviewers: vpykhtin, rampitec, arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye

Differential Revision: https://reviews.llvm.org/D31935

llvm-svn: 300040
2017-04-12 09:36:05 +00:00
Daniel Sanders
ff52355b6a [globalisel][tablegen] Add experimental support for OperandWithDefaultOps, PredicateOperand, and OptionalDefOperand
Summary:
As far as instruction selection is concerned, all three appear to be same thing.

Support for these operands is experimental since AArch64 doesn't make use
of them and the in-tree targets that do use them (AMDGPU for
OperandWithDefaultOps, AMDGPU/ARM/Hexagon/Lanai for PredicateOperand, and ARM
for OperandWithDefaultOps) are not using tablegen-erated GlobalISel yet.

Reviewers: rovka, aditya_nandakumar, t.p.northover, qcolombet, ab

Reviewed By: rovka

Subscribers: inglorion, aemerson, rengolin, mehdi_amini, dberris, kristof.beyls, igorb, tpr, llvm-commits

Differential Revision: https://reviews.llvm.org/D31135

llvm-svn: 300037
2017-04-12 08:23:08 +00:00
Bjorn Pettersson
4613d4bf95 [LoadCombine] Avoid analysing dead basic blocks
Summary:
Dead basic blocks may be forming a loop, for which SSA form is
fulfilled, but with a circular def-use chain. LoadCombine could
enter an infinite loop when analysing such dead code. This patch
solves the problem by simply avoiding to analyse all basic blocks
that aren't forward reachable, from function entry, in LoadCombine.

Fixes https://bugs.llvm.org/show_bug.cgi?id=27065

Reviewers: mehdi_amini, chandlerc, grosser, Bigcheese, davide

Reviewed By: davide

Subscribers: dberlin, zzheng, bjope, grandinj, Ka-Ka, materi, jholewinski, llvm-commits, mzolotukhin

Differential Revision: https://reviews.llvm.org/D31032

llvm-svn: 300034
2017-04-12 08:07:55 +00:00
Serguei Katkov
dbe48d9fbb [BPI] Refactor post domination calculation and simple fix for ColdCall
Collection of PostDominatedByUnreachable and PostDominatedByColdCall have been
split out of heuristics itself. Update of the data happens now for each basic
block (before update for PostDominatedByColdCall might be skipped if
unreachable or matadata heuristic handled this basic block).

This separation allows re-ordering of heuristics without loosing
the post-domination information.

Reviewers: sanjoy, junbuml, vsk, chandlerc, reames

Reviewed By: chandlerc

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31701

llvm-svn: 300029
2017-04-12 05:42:14 +00:00
Kannan Narayanan
fd367b26fb [AMDGPU] Add a new pass to insert waitcnts. Leave under an option for testing.
Based on comments in https://reviews.llvm.org/D31161.

llvm-svn: 300023
2017-04-12 03:25:12 +00:00
Bob Haarman
3d8c625236 ThinLTOBitcodeWriter: keep comdats together, rename if leader is renamed
Summary:
COFF requires that every comdat contain a symbol with the same name as
the comdat. ThinLTOBitcodeWriter renames symbols, which may cause this
requirement to be violated. This change avoids such violations by
renaming comdats if their leaders are renamed. It also keeps comdats
together when splitting modules.

Reviewers: pcc, mehdi_amini, tejohnson

Reviewed By: pcc

Subscribers: rnk, Prazek, llvm-commits

Differential Revision: https://reviews.llvm.org/D31963

llvm-svn: 300019
2017-04-12 01:43:07 +00:00
Matt Arsenault
8aa0f7f3d2 AMDGPU: Insert wait at start of callee functions
llvm-svn: 300000
2017-04-11 22:29:31 +00:00
Matt Arsenault
c16c71ecdb AMDGPU: Refactor SIMachineFunctionInfo slightly
Prepare for handling non-entry functions.

llvm-svn: 299999
2017-04-11 22:29:28 +00:00
Matt Arsenault
80352fe61c AMDGPU: Refactor argument lowering
Split into smaller functions and prepare for handling
non-entry functions.

llvm-svn: 299998
2017-04-11 22:29:24 +00:00
Matt Arsenault
b91fb9442d AMDGPU: Fix folding reg_sequence into copy to phys reg
This was producing an illegal reg_sequence defining
a physical register with virtual register inputs.

llvm-svn: 299997
2017-04-11 22:29:19 +00:00
Evgeniy Stepanov
9127f017fc [asan] Give global metadata private linkage.
Internal linkage preserves names like "__asan_global_foo" which may
account to 2% of unstripped binary size.

llvm-svn: 299995
2017-04-11 22:28:13 +00:00
Zvi Rackover
4167b52690 InstSimplify: A shuffle of a splat is always the splat itself
Summary:
Fold:
 shuffle (splat-shuffle), undef, M --> splat-shuffle

Reviewers: spatel, RKSimon, craig.topper

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31527

llvm-svn: 299990
2017-04-11 21:37:02 +00:00
Zvi Rackover
af67f021b7 [DAGCombine] Add more test cases for shuffle of splat. NFC.
Tests added contain splat-masks with undef elements.

llvm-svn: 299988
2017-04-11 21:16:59 +00:00
Easwaran Raman
083f7e69ba [x86] Relax the check in areLoadsFromSameBasePtr
Check if the scale operand is identical (doesn't have to be 1) and
do not check the chaain operand.

Differential revision: https://reviews.llvm.org/D31833

llvm-svn: 299986
2017-04-11 21:05:02 +00:00
Anna Thomas
b2455558f1 [LV] Avoid vectorizing first order recurrence when phi uses are outside loop
In the vectorization of first order recurrence, we vectorize such
that the last element in the vector will be the one extracted to pass into the
scalar remainder loop. However, this is not true when there is a phi (other
than the primary induction variable) is used outside the loop.
In such a case, we need the value from the second last iteration (i.e.
the phi value), not the last iteration (which would be the phi update).
I've added a test case for this. Also see PR32396.

A follow up patch would generate the correct code gen for such cases,
and turn this vectorization on.

Differential Revision: https://reviews.llvm.org/D31910

Reviewers: mssimpso
llvm-svn: 299985
2017-04-11 21:02:00 +00:00
Sanjay Patel
0b66966e47 [InstSimplify] add tests for chains of shuffles; NFC
llvm-svn: 299984
2017-04-11 20:54:57 +00:00
Daniel Berlin
e485ce96aa MemorySSA: Move to Analysis, from Transforms/Utils. It's used as
Analysis, it has Analysis passes, and once NewGVN is made an Analysis,
this removes the cross dependency from Analysis to Transform/Utils.
NFC.

llvm-svn: 299980
2017-04-11 20:06:36 +00:00
Justin Bogner
04d43ed37a MIR: Allow parsing of empty machine functions
If you run llc -stop-after=codegenprepare and feed the resulting MIR
to llc -start-after=codegenprepare, you'll have an empty machine
function since we haven't run any isel yet. Of course, this only works
if the MIRParser believes you that this is okay.

This is essentially a revert of r241862 with a fix for the problem it
was papering over.

llvm-svn: 299975
2017-04-11 19:32:41 +00:00
Davide Italiano
1805846aae [X86] Create the correct ADC/SBB SDNode when lowering add.
Differential Revision:  https://reviews.llvm.org/D31911

llvm-svn: 299973
2017-04-11 19:11:20 +00:00
Andrea Di Biagio
f84e42ec43 [AddDiscriminators] Assign discriminators to MemIntrinsic calls.
Before this patch, pass AddDiscriminators always avoided to assign
discriminators to intrinsic calls. This was done mainly for two reasons:
 1) We wanted to minimize the number of based discriminators used.
 2) We wanted to avoid non-deterministic discriminator assignment for
    different debug levels.

Unfortunately, that approach was problematic for MemIntrinsic calls.
MemIntrinsic calls can be split by SROA into loads and stores, and each new
load/store instruction would obtain the debug location from the original
intrinsic call.
If we don't assign a discriminator to MemIntrinsic calls, then we cannot
correctly set the discriminator for the newly created loads and stores.
This may have a negative impact on the basic block weight computation
performed by the SampleLoader.

This patch fixes the issue by letting MemIntrinsic calls have a discriminator.

Differential Revision: https://reviews.llvm.org/D31900

llvm-svn: 299972
2017-04-11 19:07:30 +00:00
Craig Topper
7fd8c77e0f [InstCombine] Add testcases for (B&A)^A -> ~B & A and (B|A)^A -> B & ~A
llvm-svn: 299971
2017-04-11 18:50:48 +00:00
Anna Thomas
e3e490fb9f [LV] Move first order recurrence test to common folder. NFC
llvm-svn: 299969
2017-04-11 18:31:42 +00:00
Peter Collingbourne
6517513ff1 llvm-lto2: Move the LTO::run() action behind a subcommand.
Move LTO::run() to a "run" subcommand so that we can introduce new subcommands
for testing different parts of the LTO implementation.

This doesn't use llvm::cl subcommands because it doesn't appear to be currently
possible to pass an argument not associated with a subcommand to a subcommand
(e.g. -lto-use-new-pm, -mcpu=yonah).

Differential Revision: https://reviews.llvm.org/D31410

llvm-svn: 299967
2017-04-11 18:12:00 +00:00
Yaxun Liu
f386491b0f [AMDGPU] Add A5 to data layout for amdgiz environment
Differential Revision: https://reviews.llvm.org/D31589

llvm-svn: 299964
2017-04-11 17:18:13 +00:00
Reid Kleckner
ce574c8dd0 [PDB] Emit index/offset pairs for TPI and IPI streams
Summary:
This lets PDB readers lookup type record data by type index in O(log n)
time. It also enables makes `cvdump -t` work on PDBs produced by LLD.
cvdump will not dump a PDB that doesn't have an index-to-offset table.

The table is sorted by type index, and has an entry every 8KB. Looking
up a type record by index is a binary search of this table, followed by
a scan of at most 8KB.

Reviewers: ruiu, zturner, inglorion

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31636

llvm-svn: 299958
2017-04-11 16:26:15 +00:00
Sanjay Patel
8663bd080f revert r299851 - [InstCombine] fix matching of or-of-icmps constants (PR32524)
This is a candidate culprit for multiple bot fails, so reverting pending investigation.

llvm-svn: 299955
2017-04-11 15:57:32 +00:00
Geoff Berry
6b04aef4a3 [GVNHoist] Re-enable GVNHoist by default
Turn GVNHoist back on by default now that PR32153 has been fixed.

llvm-svn: 299944
2017-04-11 14:36:30 +00:00
Keno Fischer
474eb26f5c [StripDeadDebug/DIFinder] Track inlined SPs
Summary:
In rL299692 I improved strip-dead-debug-info's ability to drop CUs that are not
referenced from the current module. However, in doing so I neglected to realize
that some SPs could be referenced entirely from inlined functions. It appears
I was not the only one to make this mistake, because DebugInfoFinder, doesn't
find those SPs either. Fix this in DebugInfoFinder and then use that to make
sure not to drop those CUs in strip-dead-debug-info.

Reviewers: aprantl

Reviewed By: aprantl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D31904

llvm-svn: 299936
2017-04-11 13:32:11 +00:00