1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-18 10:32:48 +02:00
Commit Graph

219380 Commits

Author SHA1 Message Date
David Sherwood
cdd50ed2ff [LoopVectorize] Don't interleave scalar ordered reductions for inner loops
Consider the following loop:

  void foo(float *dst, float *src, int N) {
    for (int i = 0; i < N; i++) {
      dst[i] = 0.0;
      for (int j = 0; j < N; j++) {
        dst[i] += src[(i * N) + j];
      }
    }
  }

When we are not building with -Ofast we may attempt to vectorise the
inner loop using ordered reductions instead. In addition we also try
to select an appropriate interleave count for the inner loop. However,
when choosing a VF=1 the inner loop will be scalar and there is existing
code in selectInterleaveCount that limits the interleave count to 2
for reductions due to concerns about increasing the critical path.
For ordered reductions this problem is even worse due to the additional
data dependency, and so I've added code to simply disable interleaving
for scalar ordered reductions for now.

Test added here:

  Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll

Differential Revision: https://reviews.llvm.org/D106646
2021-07-27 17:41:01 +01:00
Anna Thomas
3ad8a0b712 Update reduction test. Remove standalone test file
Based on post commit review comments at 68ffed12b.
2021-07-27 12:35:04 -04:00
Matt Arsenault
3b4453b968 AMDGPU: Update tests for lower i1 change
I forgot to squash the test updates for b32d3d9e81cdd9275d19cd2a396c461edc9e7189
2021-07-27 12:14:58 -04:00
Matt Arsenault
8979bda8e8 AMDGPU: Treat IMPLICIT_DEF like a constant lanemask source
This is partially a workaround. SILowerI1Copies does not understand
unstructured loops. This would result in inserting instructions to
merge a mask register in the same block where it was defined in an
unstructured loop.
2021-07-27 11:44:38 -04:00
Thomas Lively
bb4e957ebf [WebAssembly] Codegen for extmul SIMD instructions
Replace the clang builtins and LLVM intrinsics for the SIMD extmul instructions
with normal codegen patterns.

Differential Revision: https://reviews.llvm.org/D106724
2021-07-27 08:41:30 -07:00
Anirudh Prasad
2b967460b4 [SystemZ][z/OS] Initial code to generate assembly files on z/OS
- This patch consists of the bare basic code needed in order to generate some assembly for the z/OS target.
- Only the .text and the .bss sections are added for now.
- The relevant MCSectionGOFF/Symbol interfaces have been added. This enables us to print out the GOFF machine code sections.
- This patch enables us to add simple lit tests wherever possible, and contribute to the testing coverage for the z/OS target
- Further improvements and additions will be made in future patches.

Reviewed By: tmatheson

Differential Revision: https://reviews.llvm.org/D106380
2021-07-27 11:29:15 -04:00
Anna Thomas
23f3f90569 Strip undef implying attributes when moving calls
When hoisting/moving calls to locations, we strip unknown metadata. Such calls are usually marked `speculatable`, i.e. they are guaranteed to not cause undefined behaviour when run anywhere. So, we should strip attributes that can cause immediate undefined behaviour if those attributes are not valid in the context where the call is moved to.

This patch introduces such an API and uses it in relevant passes. See
updated tests.

Fix for PR50744.

Reviewed By: nikic, jdoerfert, lebedev.ri

Differential Revision: https://reviews.llvm.org/D104641
2021-07-27 10:57:05 -04:00
Tres Popp
05308b27ea Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI."
This reverts commit 1cfecf4fc4278afb0005923f6dff595cd372da5c.

This commit broke LLVM code generated through XLA by removing a
conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD

This is not a perfect revert. The new function is left as other uses of
it exist now.
2021-07-27 16:55:50 +02:00
Tres Popp
d809395fde Revert "Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI.""
This reverts commit d7bbb1230a94cb239aa4a8cb896c45571444675d.

There were follow up uses of a deleted method and I didn't run the
tests. Undo the revert, so I can do it properly.
2021-07-27 16:48:31 +02:00
Tres Popp
9d32182a3a Revert "[X86][AVX] Add getBROADCAST_LOAD helper function. NFCI."
This reverts commit 1cfecf4fc4278afb0005923f6dff595cd372da5c.

This commit broke LLVM code generated through XLA by removing a
conditional on Ld->getExtensionType() == ISD::NON_EXTLOAD
2021-07-27 16:22:25 +02:00
Jeremy Morse
adf33c8470 [DebugInfo][InstrRef] Correctly update DBG_PHIs during instr scheduling
Avoid several crashes when DBG_INSTR_REF and DBG_PHI instructions are fed
to the instruction scheduler. DBG_INSTR_REFs should be treated like
DBG_LABELs, and just ignored for the purpose of scheduling [0].

DBG_PHIs however behave much more like DBG_VALUEs: they refer to register
operands, and if some register defs get shuffled around during instruction
scheduling, there's a risk that the debug instr will refer to the wrong
value. There's already a facility for updating DBG_VALUEs to reflect this;
add DBG_PHI to the list of instructions that it will update.

[0] Suboptimal, but it's what instr scheduling does right now.

Differential Revision: https://reviews.llvm.org/D106663
2021-07-27 15:12:46 +01:00
Tres Popp
64d9d45951 Handle unused variable when assertions are disabled 2021-07-27 15:43:06 +02:00
Anna Thomas
b8af1d6561 [IVDescriptors] Fix bug in checkOrderedReduction
The Exit instruction passed in for checking if it's an ordered reduction need not be
an FPAdd operation. We need to bail out at that point instead of
assuming it is an FPAdd (and hence has two operands). See added testcase.
It crashes without the patch because the Exit instruction is a phi with
exactly one operand.
This latent bug was exposed by 95346ba which added support for
multi-exit loops for vectorization.

Reviewed-By: kmclaughlin
Differential Revision: https://reviews.llvm.org/D106843
2021-07-27 09:31:44 -04:00
Chris Jackson
fe743a5b25 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
This reapplies commit 76f3ffb2b285998f02639db8fd42fb0de8a540d0 that was
reverted due to buildbot failures.

- Update lit tests with REQUIRES condition.
- Abandon salvage attempt if SCEVUnknown::getValue() returns nullptr.

Differential Revision: https://reviews.llvm.org/D105207
2021-07-27 14:22:09 +01:00
Jeremy Morse
47a51741f3 [DebugInfo][InstrRef] Handle llvm.frameaddress intrinsics gracefully
When working out which instruction defines a value, the
instruction-referencing variable location code has a few special cases for
physical registers:
 * Arguments are never defined by instructions,
 * Constant physical registers always read the same value, are never def'd

This patch adds a third case for the llvm.frameaddress intrinsics: you can
read the framepointer in any block if you so choose, and use it as a
variable location, as shown in the added test.

This rather violates one of the assumptions behind instruction referencing,
that LLVM-ir shouldn't be able to read from an arbitrary register at some
arbitrary point in the program. The solution for now is to just emit a
DBG_PHI that reads the register value: this works, but if we wanted to do
something clever with DBG_PHIs in the future then this would probably get
in the way. As it stands, this patch avoids a crash.

Differential Revision: https://reviews.llvm.org/D106659
2021-07-27 13:44:37 +01:00
Chris Jackson
e13e00f7b3 [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
This reverts commit 76f3ffb2b285998f02639db8fd42fb0de8a540d0 because
of a failure on sanitixer-X86-64-linux-autoconf.
2021-07-27 13:36:56 +01:00
Chris Jackson
2e863e863d [DebugInfo][LoopStrengthReduction] SCEV-based salvaging for LSR
This patch extends salvaging of debuginfo in the Loop Strength Reduction
(LSR) pass by translating Scalar Evaluations (SCEV) into DIExpressions.
The method is as follows:
- Cache dbg.value intrinsics that are salvageable.
- Obtain a loop Induction Variable (IV) from ScalarExpressionExpander or
  the loop header.
- Translate the IV SCEV into an expression that recovers the current
  loop iteration count. Combine this with the dbg.value's location
  op SCEV to create a DIExpression that salvages the value.

Review by: jmorse

Differential Revision: https://reviews.llvm.org/D105207
2021-07-27 13:00:36 +01:00
Chen Zheng
b5cbfeed77 [PowerPC] add more testcases for ld_splat; nfc 2021-07-27 11:45:26 +00:00
Fraser Cormack
ac97ba67e8 [LangRef][NFC] Fix variable name in llvm.maxnum docs 2021-07-27 12:04:28 +01:00
Simon Pilgrim
a99adc793b [X86] Add PR37025 test coverage 2021-07-27 12:09:25 +01:00
Sander de Smalen
e1c8650040 [LV] Disable Scalable VFs when tail folding is enabled b/c of low tripcount.
The loop vectorizer may decide to use tail folding when the trip-count
is low. When that happens, scalable VFs are no longer a candidate,
since tail folding/predication is not yet supported for scalable vectors.

This can be re-enabled in a future patch.

Reviewed By: kmclaughlin

Differential Revision: https://reviews.llvm.org/D106657
2021-07-27 11:37:21 +01:00
Jay Foad
3bc8cd6a0b [GlobalISel] Constant fold G_SITOFP and G_UITOFP in CSEMIRBuilder
Differential Revision: https://reviews.llvm.org/D104528
2021-07-27 11:27:58 +01:00
Fraser Cormack
cf6bdfc026 [SelectionDAG] Support scalable splats in U(ADD|SUB)SAT combines
This patch builds on top of D106575 in which scalable-vector splats were
supported in `ISD::matchBinaryPredicate`. It teaches the DAGCombiner how
to perform a variety of the pre-existing saturating add/sub combines on
scalable-vector types.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106652
2021-07-27 10:52:34 +01:00
Fraser Cormack
5afacc5171 [RISCV] Add support for vector saturating add/sub operations
This patch adds support for lowering the saturating vector add/sub
intrinsics to RVV instructions, for both fixed-length and
scalable-vector forms alike.

Note that some of the DAG combines are still not triggering for the
scalable-vector tests. These require a bit more work in the DAGCombiner
itself.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106651
2021-07-27 10:04:14 +01:00
David Green
c2bd628550 [NFC] Reflow some debug messages. 2021-07-27 10:11:51 +01:00
Cullen Rhodes
1ae2c0fb16 [AArch64][SME] Add zero instruction
This patch adds the zero instruction for zeroing a list of 64-bit
element ZA tiles. The instruction takes a list of up to eight tiles
ZA0.D-ZA7.D, which must be in order, e.g.

  zero {za0.d,za1.d,za2.d,za3.d,za4.d,za5.d,za6.d,za7.d}
  zero {za1.d,za3.d,za5.d,za7.d}

The assembler also accepts 32-bit, 16-bit and 8-bit element tiles which
are mapped to corresponding 64-bit element tiles in accordance with the
architecturally defined mapping between different element size tiles,
e.g.

  * Zeroing ZA0.B, or the entire array name ZA, is equivalent to zeroing
    all eight 64-bit element tiles ZA0.D to ZA7.D.
  * Zeroing ZA0.S is equivalent to zeroing ZA0.D and ZA4.D.

The preferred disassembly of this instruction uses the shortest list of
tile names that represent the encoded immediate mask, e.g.

  * An immediate which encodes 64-bit element tiles ZA0.D, ZA1.D, ZA4.D and
    ZA5.D is disassembled as {ZA0.S, ZA1.S}.
  * An immediate which encodes 64-bit element tiles ZA0.D, ZA2.D, ZA4.D and
    ZA6.D is disassembled as {ZA0.H}.
  * An all-ones immediate is disassembled as {ZA}.
  * An all-zeros immediate is disassembled as an empty list {}.

This patch adds the MatrixTileList asm operand and related parsing to support
this.

Depends on D105570.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D105575
2021-07-27 08:35:45 +00:00
Fraser Cormack
b7fc1f8c1c [RISCV] Add tests showing missed vector saturating add/sub combines
These will be optimized by upcoming patches. The tests are primarily not
being optimized due to the lack of support for saturating vector
arithmetic in the RISC-V backend.

On top of that, however, a large percentage of the scalable-vector tests
are also lacking support in the DAGCombiner: either in
`ISD::matchBinaryPredicate` or due to checks specifically for
`BUILD_VECTOR` and not `SPLAT_VECTOR`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D106649
2021-07-27 09:11:05 +01:00
David Green
4d9d8789ad [ARM] Implement isLoad/StoreFromStackSlot for MVE stack stores accesses
This implements the isLoadFromStackSlot and isStoreToStackSlot for MVE
MVE_VSTRWU32 and MVE_VLDRWU32 functions. They behave the same as many
other loads/stores, expecting a FI in Op1 and zero offset in Op2. At the
same time this alters VLDR_P0_off and VSTR_P0_off to use the same code
too, as they too should be returning VPR in Op0, take a FI in Op1 and
zero offset in Op2.

Differential Revision: https://reviews.llvm.org/D106797
2021-07-27 09:11:58 +01:00
Rosie Sumpter
565fcd6a48 [LoopFlatten] Use SCEV and Loop APIs to identify increment and trip count
Replace pattern-matching with existing SCEV and Loop APIs as a more
robust way of identifying the loop increment and trip count. Also
rename 'Limit' as 'TripCount' to be consistent with terminology.

Differential Revision: https://reviews.llvm.org/D106580
2021-07-27 08:42:59 +01:00
Lang Hames
ee2584c4c2 [docs] Update release notes with all LLVM-C API changes
Patch by Mats Larsen. Thanks Mats!

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D106764
2021-07-27 17:33:20 +10:00
Esme-Yi
e8ead0fbb2 [Debug-Info][llvm-dwarfdump] Don't try to dump location
list for attributes that don't have the loclist class.

Summary: The overflow error occurs when we try to dump
location list for those attributes that do not have the
loclist class, like DW_AT_count and DW_AT_byte_size.
After re-reviewed the entire list, I sorted those
attributes into two parts, one for dumping location list
and one for dumping the location expression.

Reviewed By: probinson

Differential Revision: https://reviews.llvm.org/D105613
2021-07-27 07:28:59 +00:00
LLVM GN Syncbot
10658f9d9e [gn build] Port 2487db1f2862 2021-07-27 06:54:07 +00:00
Lang Hames
d08c2fabb8 [ORC] Require ExecutorProcessControl when constructing an ExecutionSession.
Wrapper function call and dispatch handler helpers are moved to
ExecutionSession, and existing EPC-based tools are re-written to take an
ExecutionSession argument instead.

Requiring an ExecutorProcessControl instance simplifies existing EPC based
utilities (which only need to take an ES now), and should encourage more
utilities to use the EPC interface. It also simplifies process termination,
since the session can automatically call ExecutorProcessControl::disconnect
(previously this had to be done manually, and carefully ordered with the
rest of JIT tear-down to work correctly).
2021-07-27 16:53:49 +10:00
Johannes Doerfert
e9073c9b78 [OpenMP] Try to simplify all loads in device code
Eliminating loads/stores in the device code is worth the extra effort,
especially for the new device runtime.

At the same time we do not compute AAExecutionDomain for non-device code
anymore, there is no point.

Differential Revision: https://reviews.llvm.org/D106845
2021-07-27 01:44:15 -05:00
Johannes Doerfert
756885168c [Attributor][FIX] Copy all members in the assignment operator
Also improve debug output slightly.
2021-07-27 01:44:13 -05:00
Johannes Doerfert
7e9917e379 [Attributor] Utilize the InstSimplify interface to simplify instructions
When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.

Depends on D106189

Differential Revision: https://reviews.llvm.org/D106190
2021-07-27 00:56:23 -05:00
Johannes Doerfert
86cba7bf5e [InstSimplify] Expose generic interface for replaced operand simplification
Users, especially the Attributor, might replace multiple operands at
once. The actual implementation of simplifyWithOpReplaced is able to
handle that just fine, the interface was simply not allowing to replace
more than one operand at a time. This is exposing a more generic
interface without intended changes for existing code.

Differential Revision: https://reviews.llvm.org/D106189
2021-07-27 00:56:12 -05:00
Johannes Doerfert
35ac57a3f1 [Attributor] Update check lines for all AMDGPU attributor tests
I thought there was only one when I pushed
cdb4cfe8b3ce2b0c50d4855ec260eab07fe63611, these should be all (in the
CodeGen/AMDGPU folder).
2021-07-27 00:55:26 -05:00
Johannes Doerfert
2b8eaac9b5 [Attributor][FIX] Update AMDGPU attributor test
The test contains UB and should be improved, for now we update the check
lines pass it.
2021-07-27 00:23:47 -05:00
Chuanqi Xu
700878ee36 [Coroutine] Record the elided coroutines
Reviewed By: lxfind

Differential Revision: https://reviews.llvm.org/D105606
2021-07-27 13:14:09 +08:00
Tom Stellard
ffb4271d86 Merge all the llvm-exegesis unit tests into a single binary
These tests access private symbols in the backends, so they cannot link
against libLLVM.so and must be statically linked.  Linking these tests
can be slow and with debug builds the resulting binaries use a lot of
disk space.

By merging them into a single test binary means we now only need to
statically link 1 test instead of 6, which helps reduce the build
times and saves disk space.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D106464
2021-07-26 22:07:07 -07:00
wlei
eca1a76d00 [CSSPGO] Tweak ICP threshold in top-down inliner
This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future.

Reviewed By: wenlei, hoy, wmi

Differential Revision: https://reviews.llvm.org/D106588
2021-07-26 21:49:20 -07:00
Johannes Doerfert
4ce16022d3 [Local] Do not introduce a new llvm.trap before unreachable
This is the second attempt to remove the `llvm.trap` insertion after
https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6
reverted the first one. It is not clear what the exact issue was back
then and it might already be gone by now, it has been >5 years after
all.

Replaces D106299.

Differential Revision: https://reviews.llvm.org/D106308
2021-07-26 23:33:36 -05:00
Johannes Doerfert
95f415bd34 [Attributor] Delete dead stores
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188
2021-07-26 23:33:36 -05:00
Johannes Doerfert
5ee3942ae7 [Attributor] Introduce getPotentialCopiesOfStoredValue and use it
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185
2021-07-26 23:33:36 -05:00
Mehdi Amini
60670bde11 Build libSupport with -Werror=global-constructors (NFC)
Ensure that libSupport does not carry any static global initializer.
libSupport can be embedded in use cases where we don't want to load all
cl::opt unless we want to parse the command line.
ManagedStatic can be used to enable lazy-initialization of globals.

The -Werror=global-constructors is only added on platform that have
support for the flag and for which std::mutex does not have a global
destructor. This is ensured by having CMake trying to compile a file
with a global mutex before adding the flag to libSupport.
2021-07-27 04:27:18 +00:00
Craig Topper
eb0ba2928d [AArch64] Fix -Wparentheses warning with gcc 5.4. NFC 2021-07-26 21:08:56 -07:00
Jun Ma
ce23772b94 [NFC][InstCombine] Fix typo 2021-07-27 11:33:10 +08:00
Lang Hames
52c4e0dd10 [llvm-jitlink] Don't hardcode LLVM version number into the runtime path.
This should unbreak builders that were failing due to different patch numbers.
2021-07-27 13:04:50 +10:00
Mitch Phillips
fcd0279b17 Revert "[GlobalISel] Add scalar widening for G_MERGE_VALUES destination"
This reverts commit 0a37163d1d855a2db41e1f46ddbc3f4570bd7ca6.

Reason: Broke the sanitizer msan bots. More details are available in the
original Phabricator review: https://reviews.llvm.org/D106814.
2021-07-26 19:52:12 -07:00