1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 03:53:04 +02:00
Commit Graph

144035 Commits

Author SHA1 Message Date
Keno Fischer
4db92211d7 [ExecutionDepsFix] Improve clearance calculation for loops
Summary:
In revision rL278321, ExecutionDepsFix learned how to pick a better
register for undef register reads, e.g. for instructions such as
`vcvtsi2sdq`. While this revision improved performance on a good number
of our benchmarks, it unfortunately also caused significant regressions
(up to 3x) on others. This regression turned out to be caused by loops
such as:

PH -> A -> B (xmm<Undef> -> xmm<Def>) -> C -> D -> EXIT
      ^                                  |
      +----------------------------------+

In the previous version of the clearance calculation, we would visit
the blocks in order, remembering for each whether there were any
incoming backedges from blocks that we hadn't processed yet and if
so queuing up the block to be re-processed. However, for loop structures
such as the above, this is clearly insufficient, since the block B
does not have any unknown backedges, so we do not see the false
dependency from the previous interation's Def of xmm registers in B.

To fix this, we need to consider all blocks that are part of the loop
and reprocess them one the correct clearance values are known. As
an optimization, we also want to avoid reprocessing any later blocks
that are not part of the loop.

In summary, the iteration order is as follows:
Before: PH A B C D A'
Corrected (Naive): PH A B C D A' B' C' D'
Corrected (w/ optimization): PH A B C A' B' C' D

To facilitate this optimization we introduce two new counters for each
basic block. The first counts how many of it's predecssors have
completed primary processing. The second counts how many of its
predecessors have completed all processing (we will call such a block
*done*. Now, the criteria to reprocess a block is as follows:
    - All Predecessors have completed primary processing
    - For x the number of predecessors that have completed primary
      processing *at the time of primary processing of this block*,
      the number of predecessors that are done has reached x.

The intuition behind this criterion is as follows:
We need to perform primary processing on all predecessors in order to
find out any direct defs in those predecessors. When predecessors are
done, we also know that we have information about indirect defs (e.g.
in block B though that were inherited through B->C->A->B). However,
we can't wait for all predecessors to be done, since that would
cause cyclic dependencies. However, it is guaranteed that all those
predecessors that are prior to us in reverse postorder will be done
before us. Since we iterate of the basic blocks in reverse postorder,
the number x above, is precisely the count of the number of predecessors
prior to us in reverse postorder.

Reviewers: myatsina
Differential Revision: https://reviews.llvm.org/D28759

llvm-svn: 293571
2017-01-30 23:37:03 +00:00
Sanjay Patel
c1ccdd6039 [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2) for vectors with splat constants
llvm-svn: 293570
2017-01-30 23:35:52 +00:00
Derek Schuff
85b3f75e18 [WebAssembly] Add wasm support for llvm-readobj
Create a WasmDumper subclass of ObjDumper to support Webassembly binary
files.

Patch by Sam Clegg

Differential Revision: https://reviews.llvm.org/D27355

llvm-svn: 293569
2017-01-30 23:30:52 +00:00
Matt Arsenault
5865985330 NVPTX: Trivial cleanups of NVPTXInferAddressSpaces
- Move DEBUG_TYPE below includes
- Change unknown address space constant to be consistent with other
  passes
- Grammar fixes in debug output

llvm-svn: 293567
2017-01-30 23:27:11 +00:00
Sanjay Patel
22524fa47b [InstCombine] add vector test for (X <<nsw C1) >>s C2 --> X <<nsw (C1 - C2); NFC
llvm-svn: 293566
2017-01-30 23:26:17 +00:00
Eugene Zelenko
8881587629 [Mips] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC).
llvm-svn: 293565
2017-01-30 23:21:32 +00:00
Benjamin Kramer
43a199dbb5 [ICP] Fix bool conversion warning and actually write out the reason instead of dropping it.
llvm-svn: 293564
2017-01-30 23:11:29 +00:00
Matt Arsenault
dd128e79e5 NVPTX: Refactor NVPTXInferAddressSpaces to check TTI
Add a new TTI hook for getting the generic address space value.

llvm-svn: 293563
2017-01-30 23:02:12 +00:00
Sanjay Patel
94f616d48b [InstCombine] enable more lshr(shl X, C1), C2 folds for vectors with splat constants
llvm-svn: 293562
2017-01-30 23:01:05 +00:00
Simon Pilgrim
886429cd36 [X86][SSE] Fix unsigned <= 0 warning in assert. NFCI.
Thanks to @mkuper

llvm-svn: 293561
2017-01-30 22:58:44 +00:00
Simon Pilgrim
adb2f463ad [X86][SSE] Generalize the number of decoded shuffle inputs. NFCI.
combineX86ShufflesRecursively can still only handle a maximum of 2 shuffle inputs but everything before it now supports any number of shuffle inputs.

This will be necessary for combining OR(SHUFFLE, SHUFFLE) patterns.

llvm-svn: 293560
2017-01-30 22:48:49 +00:00
Dehao Chen
da8eca39f5 Expose isLegalToPromot as a global helper function so that SamplePGO pass can call it for legality check.
Summary: SamplePGO needs to check if it is legal to promote a target before it actually promotes it.

Reviewers: davidxl

Reviewed By: davidxl

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D29306

llvm-svn: 293559
2017-01-30 22:46:37 +00:00
Dehao Chen
60bcbaae33 Revert r292979 which causes compile time failure.
llvm-svn: 293557
2017-01-30 22:26:05 +00:00
Sanjay Patel
2612342500 [InstCombine] add tests for more shift-shift patterns; NFC
llvm-svn: 293555
2017-01-30 22:24:36 +00:00
Eli Friedman
8bace60d14 Fix line endings.
llvm-svn: 293554
2017-01-30 22:04:23 +00:00
Tom Stellard
09bec64e31 AMDGPU: Fix release build broken by r293551
llvm-svn: 293553
2017-01-30 22:02:58 +00:00
Artem Tamazov
3c6b7c9aea Reapply [AMDGPU][mc][tests][NFC] Add coverage/smoke tests for Gfx7 and Gfx8.
llvm-svn: 293552
2017-01-30 21:59:21 +00:00
Tom Stellard
f2ec17e0e6 Re-commit AMDGPU/GlobalISel: Add support for simple shaders
Fix build when global-isel is disabled and fix a warning.

Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP.

Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm

Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris

Differential Revision: https://reviews.llvm.org/D26730

llvm-svn: 293551
2017-01-30 21:56:46 +00:00
Tim Northover
526c055c63 GlobalISel: correctly translate invoke when callee is a register.
This should fix the GlobalISel verifier.

llvm-svn: 293550
2017-01-30 21:45:21 +00:00
Stanislav Mekhanoshin
0a8e20606c [AMDGPU] Internalize non-kernel symbols
Since we have no call support and late linking we can produce code
only for used symbols. This saves compilation time, size of the final
executable, and size of any intermediate dumps.

Run Internalize pass early in the opt pipeline followed by global
DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option.

Differential Revision: https://reviews.llvm.org/D29214

llvm-svn: 293549
2017-01-30 21:05:18 +00:00
Kevin Enderby
25010f7ab3 Change the llvm-obdump(1) behavior with the -macho flag and inappropriate file types.
To better match the old darwin otool(1) behavior, when llvm-obdump(1) is used
with the -macho option and the input file is not an object file simply print
the file name and this message:

foo: is not an object file

and continue on to process other input files.  Also in this case don’t exit
non-zero.  This should help in some OSS projects' with autoconf scripts
that are expecting the old darwin otool(1) behavior.

rdar://26828015

llvm-svn: 293547
2017-01-30 20:53:17 +00:00
Tim Northover
b663bb327a GlobalISel: account for differing exception selector sizes.
For some reason the exception selector register must be a pointer (that's
assumed by SDag); on the other hand, it gets moved into an IR-level type which
might be entirely different (i32 on AArch64). IRTranslator needs to be aware of
this.

llvm-svn: 293546
2017-01-30 20:52:42 +00:00
Tim Northover
a4f8ce1645 GlobalISel: tidy up def/use test. NFC.
llvm-svn: 293545
2017-01-30 20:52:37 +00:00
Matt Arsenault
9a3ae5f806 LSR: Don't drop address space when type doesn't match
For targets with different addressing modes in each address space,
if this is dropped querying isLegalAddressingMode later with this
will give a nonsense result, breaking the isLegalUse assertions.

This is a candidate for the 4.0 release branch.

llvm-svn: 293542
2017-01-30 19:50:17 +00:00
Tim Northover
018849ece7 GlobalISel: translate memset & memmove.
llvm-svn: 293541
2017-01-30 19:33:07 +00:00
Matt Arsenault
293a680e93 AMDGPU: Undo sub x, c -> add x, -c canonicalization
This is worse if the original constant is an inline immediate.

This should also be done for 64-bit adds, but requires fixing
operand folding bugs first.

llvm-svn: 293540
2017-01-30 19:30:24 +00:00
Krzysztof Parzyszek
105c8a9574 [RDF] Add support for regmasks
llvm-svn: 293538
2017-01-30 19:16:30 +00:00
Tim Northover
6ce5837b4d GlobalISel: permit unused vregs without a register-class after ISel.
This can happen if earlier combining has removed all uses of some VReg, which
is fine and shouldn't flag an error.

llvm-svn: 293537
2017-01-30 19:12:50 +00:00
Benjamin Kramer
a4ba5f3efe Fix the GCC build.
This is fairly ugly, but apparently GCC still doesn't understand C++11.

llvm-svn: 293535
2017-01-30 19:05:09 +00:00
Michael Kuperstein
12880016cc Turn a TableGen FastISelEmitter warning into an error.
Tablegen emitted a warning when the fast isel emitter created dead
code by emitting a pattern that has no predicate before a pattern
that has one.

This should be an error but was originally only a warning because the X86
backend had a buggy definition that unintentionally caused this to be hit
(PR21575). That has been fixed a while ago (r222094), so it's safe to
upgrade the warning to an error.

llvm-svn: 293534
2017-01-30 19:03:26 +00:00
Simon Pilgrim
373f776641 [X86][XOP] Fix test name
llvm-svn: 293533
2017-01-30 18:59:25 +00:00
Simon Pilgrim
a7086940dd Use SelectionDAG::getBuildVector helper function where possible. NFCI.
llvm-svn: 293532
2017-01-30 18:53:45 +00:00
Benjamin Kramer
799a9da4e4 [IR] Remove global constructor from Function.cpp
llvm-svn: 293528
2017-01-30 18:49:24 +00:00
Benjamin Kramer
dccedede36 [MC] Remove global constructors from MCSectionMachO.cpp.
llvm-svn: 293526
2017-01-30 18:46:26 +00:00
Matt Arsenault
335594976a AMDGPU: Run AMDGPUCodeGenPrepare after inlining
With leaf functions, this makes nonsensical decisions
based on the uniformity of the arguments.

llvm-svn: 293525
2017-01-30 18:40:29 +00:00
Sanjay Patel
c49aeb010f [InstCombine] enable (X >>?exact C1) << C2 --> X >>?exact (C1-C2) for vectors with splat constants
llvm-svn: 293524
2017-01-30 18:40:23 +00:00
Justin Bogner
4634635ce4 SDAG: Update ChainNodesMatched during UpdateChains if a node is replaced
Previously, we would hit UB (or the ISD::DELETED_NODE assert) if we
happened to replace a node during UpdateChains, because it would be
left in the list we were iterating over. This nulls out the pointer
when that happens so that we can avoid the issue.

Fixes llvm.org/PR31710

llvm-svn: 293522
2017-01-30 18:29:46 +00:00
Simon Pilgrim
6afbcd6ac7 Use SelectionDAG::getBuildVector/getSplatBuildVector helper functions where possible. NFCI.
llvm-svn: 293520
2017-01-30 18:20:42 +00:00
Sanjay Patel
d5bd701535 [InstCombine] add vector splat tests for (X >>?exact C1) << C2 --> X >>?exact (C1-C2); NFC
llvm-svn: 293517
2017-01-30 18:17:14 +00:00
Marcos Pividori
ce226d28f2 [libFuzzer] Implement TmpDir() for Windows.
Differential Revision: https://reviews.llvm.org/D28977

llvm-svn: 293516
2017-01-30 18:14:53 +00:00
Daniel Berlin
4e849c7a11 NewGVN: Instead of changeToUnreachable, insert an instruction SimplifyCFG will turn into unreachable when it runs
llvm-svn: 293515
2017-01-30 18:12:56 +00:00
Matt Arsenault
58721b2662 AMDGPU: Make i32 uaddo/usubo legal
llvm-svn: 293514
2017-01-30 18:11:38 +00:00
Matt Arsenault
6748c8f28b DAG: Fold fneg into compare with constant into the constant
fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred)

InstCombine already does this.

llvm-svn: 293512
2017-01-30 17:57:28 +00:00
Benjamin Kramer
d087048e1f [Orc] Add missing include.
llvm-svn: 293511
2017-01-30 17:54:57 +00:00
Krzysztof Parzyszek
0a1a944d47 [RDF] Extract the physical register information into a separate class
llvm-svn: 293510
2017-01-30 17:46:56 +00:00
Tom Stellard
d839aa304c Revert "AMDGPU/GlobalISel: Add support for simple shaders"
This reverts commit r293503.

Revert while I investigate some of the buildbot failures.

llvm-svn: 293509
2017-01-30 17:42:41 +00:00
Sanjay Patel
b2beed4c14 [InstCombine] use auto with obvious type; NFC
llvm-svn: 293508
2017-01-30 17:38:55 +00:00
Sanjay Patel
36f9e16e69 [InstCombine] enable (X <<nsw C1) >>s C2 --> X <<nsw (C1-C2) for vectors with splat constants
llvm-svn: 293507
2017-01-30 17:19:32 +00:00
David Blaikie
77e367a5a4 unique_ptrify some containers in GlobalISel::RegisterBankInfo
To simplify/clarify memory ownership, make leaks (as one was found/fixed
recently) harder to write, etc.

(also, while I was there - removed a duplicate lookup in a container)

llvm-svn: 293506
2017-01-30 17:13:56 +00:00
Matt Arsenault
8603018829 AMDGPU: Fix atomic_inc/atomic_dec + ds_swizzle not being divergent
llvm-svn: 293504
2017-01-30 17:09:47 +00:00