1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

213522 Commits

Author SHA1 Message Date
Simon Pilgrim
cf39147d1e [DAG] MergeInnerShuffle with BinOps - sometimes accept undef mask elements
If the inner shuffle already contains undef elements, then accept them in the merged shuffle as well.

This helps some X86 HADD/SUB patterns where slow targets were ending up with HADD/SUB because the (un)merged shuffles were stuck either side of the ADD/SUB - meaning we ended up with a total cost much higher than the "2*shuffle+add" that a slow target usually expands a HADD/SUB to.
2021-04-01 14:33:00 +01:00
Alexey Bataev
4a573da3f8 [SLP]Remove else after return, NFC.` 2021-04-01 05:33:01 -07:00
Dmitry Preobrazhensky
985d51d2a5 [AMDGPU][MC][GFX10][GFX90A] Corrected _e32/_e64 suffices
Fixed bugs https://bugs.llvm.org//show_bug.cgi?id=49643, https://bugs.llvm.org//show_bug.cgi?id=49644, https://bugs.llvm.org//show_bug.cgi?id=49645.

Differential Revision: https://reviews.llvm.org/D99413
2021-04-01 14:21:00 +03:00
Simon Pilgrim
834f9c5710 [X86][SSE] Fold HOP(HOP(X,X),HOP(Y,Y)) -> HOP(PERMUTE(HOP(X,Y)),PERMUTE(HOP(X,Y))
For slow-hop targets, attempt to merge HADD/SUB pairs used in chains.
2021-04-01 11:54:10 +01:00
Simon Pilgrim
3a797eccba [X86][SSE] Enable (F)HADD/SUB handling to SimplifyMultipleUseDemandedVectorElts
Attempt to bypass unused horiz-op operands.

This is very similar to the PACKSS/PACKUS handling - we should try to merge these.
2021-04-01 11:54:09 +01:00
Simon Pilgrim
96cbf2a08f [X86][SSE] Add isHorizOp helper function. NFCI. 2021-04-01 11:54:09 +01:00
Dmitry Preobrazhensky
f7f94d099c [AMDGPU][MC] Added flag to identify VOP instructions which have a single variant
By convention, VOP1/2/C instructions which can be promoted to VOP3 have _e32 suffix while promoted instructions have _e64 suffix. Instructions which have a single variant should have no _e32/_e64 suffix. Unfortunately there was no simple way to identify single variant instructions - it was implemented by a hack. See bug https://bugs.llvm.org/show_bug.cgi?id=39086.

This fix simplifies handling of single VOP instructions by adding a dedicated flag.

Differential Revision: https://reviews.llvm.org/D99408
2021-04-01 13:53:12 +03:00
Florian Hahn
ab06955ff7 [SLP] Add test cases for missing SLP vectorization on AArch64. 2021-04-01 11:48:16 +01:00
David Sherwood
f9929db64e [NFC] Add tests for scalable vectorization of loops with large stride acesses
This patch just adds tests that we can vectorize loop such as these:

  for (i = 0; i < n; i++)
    dst[i * 7] += 1;

and

  for (i = 0; i < n; i++)
    if (cond[i])
      dst[i * 7] += 1;

using scalable vectors, where we expect to use gathers and scatters in the
vectorized loop. The vector of pointers used for the gather is identical
to those used for the scatter so there should be no memory dependences.

Tests are added here:

  Transforms/LoopVectorize/AArch64/sve-large-strides.ll

Differential Revision: https://reviews.llvm.org/D99192
2021-04-01 10:25:06 +01:00
Yevgeny Rouban
5d014af6d9 [LoopFlatten] Do not report CFG analyses as up-to-date
Removes CFGAnalyses from the preserved analyses set
returned by LoopFlattenPass::run().

Reviewed By: Dave Green, Ta-Wei Tu

Differential Revision: https://reviews.llvm.org/D99700
2021-04-01 15:52:36 +07:00
Sam Parker
a0fbbf61ec [WebAssembly] Invert branch condition on xor input
A frequent pattern for floating point conditional branches use an xor
to invert the input for the branch. Instead we can fold away the xor
by swapping the branch target instead.

Differential Revision: https://reviews.llvm.org/D99171
2021-04-01 09:23:28 +01:00
Max Kazantsev
2231fbd3ed [NFC] Undo some erroneous renamings
Some vars renamed by mistake during auto-replacements. Undoing them.
2021-04-01 13:10:10 +07:00
Max Kazantsev
467293274d [NFC] Disambiguate LI in GVN
Name GVN uses name 'LI' for two different unrelated things:
LoadInst and LoopInfo. This patch relates the variables with
former meaning into 'Load' to disambiguate the code.
2021-04-01 12:40:35 +07:00
Chen Zheng
1c951bda95 [debug-info] support new tuning debugger type DBX for XCOFF DWARF
Based on this debugger type, for now, we plan to:
1: use inline string by default for XCOFF DWARF
2: generate no column info for debug line table.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D99400
2021-04-01 00:11:30 -04:00
KAWASHIMA Takahiro
dd33b4b9e5 [GVN] Propagate llvm.access.group metadata of loads
Before this change, the `llvm.access.group` metadata was dropped
when moving a load instruction in GVN. This prevents vectorizing
a C/C++ loop with `#pragma clang loop vectorize(assume_safety)`.
This change propagates the metadata as well as other metadata if
it is safe (the move-destination basic block and source basic
block belong to the same loop).

Differential Revision: https://reviews.llvm.org/D93503
2021-04-01 10:00:48 +09:00
KAWASHIMA Takahiro
1bdd822573 [GVN][NFC] Pre-commit test for D93503 2021-04-01 10:00:48 +09:00
qixingxue
df8bbdf70e [GVN][NFC] Refactor analyzeLoadFromClobberingWrite
This commit adjusts the order of two swappable if statements to
make code cleaner.

Reviewed By: lattner, nikic
Differential Revision: https://reviews.llvm.org/D99648
2021-04-01 08:35:35 +08:00
Philip Reames
5a0375405f [ValueTracking] Handle non-zero ashr/lshr recurrences
If we know we don't shift out bits (e.g. exact), all we need to know is that input is non-zero.
2021-03-31 16:48:32 -07:00
Philip Reames
e571959eb5 [tests] Add tests for ashr/lshr recurrences in isKnownNonZero 2021-03-31 16:48:32 -07:00
Philip Reames
1966c159a4 Add debug printers for KnownBits [nfc] 2021-03-31 15:36:07 -07:00
Simonas Kazlauskas
c1d491f5a6 Support {S,U}REMEqFold before legalization
This allows these optimisations to apply to e.g. `urem i16` directly
before `urem` is promoted to i32 on architectures where i16 operations
are not intrinsically legal (such as on Aarch64). The legalization then
later can happen more directly and generated code gets a chance to avoid
wasting time on computing results in types wider than necessary, in the end.

Seems like mostly an improvement in terms of results at least as far as x86_64 and aarch64 are concerned, with a few regressions here and there. It also helps in preventing regressions in changes like {D87976}.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D88785
2021-04-01 01:35:41 +03:00
Craig Topper
ae21ac30d0 [RISCV] Add UnsupportedSchedZfh multiclass to reduce duplicate lines from RISCVSchedRocket.td and RISCVSchedSiFive7.td. NFC 2021-03-31 15:06:14 -07:00
YangKeao
bc3527f3c1 [X86] add dwarf annotation for inline stack probe
While probing stack, the stack register is moved without dwarf
information, which could cause panic if unwind the backtrace.
This commit only add annotation for the inline stack probe case.
Dwarf information for the loop case should be done in another
patch and need further discussion.

Reviewed By: nagisa

Differential Revision: https://reviews.llvm.org/D99579
2021-04-01 00:32:50 +03:00
Thomas Preud'homme
737d322a2a [test, InferFunctionAttrs] Fix use of var defined in CHECK-NOT
LLVM test Transforms/InferFunctionAttrs/annotate contains two RUN
invokation (UNKNOWN and NVPTX lines) which involve a CHECK-NOT directive
with a variable not defined by the enabled CHECK prefixes. This commit
fixes that by:

- enabling CHECK prefix for unknown target with specialisation when it
  differs from other targets
- checking for absence of bcmp with any attribute for NVPTX

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D99589
2021-03-31 22:06:34 +01:00
Roman Lebedev
5b224b04d8 [NFC][LoopRotation] Count the number of instructions hoisted/cloned into preheader 2021-03-31 23:27:36 +03:00
Philip Reames
4739f740ea Revert "Make TableGenGlobalISel an object library"
This reverts commit 2c3cf62d4a26de85aab180bb43a579c913b17f3e.

Causes build failures on x86_64, will respond to commit thread with link errors.
2021-03-31 13:27:00 -07:00
Aaron Puchert
936867007c Make TableGenGlobalISel an object library
That's how it was originally intended but that wasn't possible because
we still needed to support older CMake versions.

The problem here is that the sources in TableGenGlobalISel are meant to
be linked into both llvm-tblgen and TableGenTests (a unit test), but not
be part of LLVM proper. So they shouldn't be an ordinary LLVM component.
Because they are used in llvm-tblgen, they can't draw in the LLVM dylib
dependency, but then we'd have to do the same thing in TableGenTests to
make sure we don't link both a static Support library and another copy
through the LLVM dylib.

With an object library we're just reusing the object files and don't
have to care about dependencies at all.

Reviewed By: beanz

Differential Revision: https://reviews.llvm.org/D74588
2021-03-31 22:20:56 +02:00
Philip Reames
b4fffe7b16 [tests] Exercise cases where SCEV can use trip counts to refine ashr/lshr recurrences 2021-03-31 12:48:50 -07:00
Alexey Bataev
43bf28eee8 [SLP]Update test checks, NFC 2021-03-31 12:35:58 -07:00
Craig Topper
1e557f9750 [SelectionDAG] Remove unneeded vector resize from the end of FoldConstantArithmetic. NFC
There's an assert right before that makes sure the size already matches.
Earlier in this function's life, scalars and vectors shared more
code.
2021-03-31 12:33:10 -07:00
George Mitenkov
b6034064b6 [ConstantFolding] Fixing addo/subo with undef
When folding addo/subo with undef, the current
convention is to use { -1, false } for addo and
{ 0, false } for subo. This was fixed for InstSimplify in
https://reviews.llvm.org/rGf094d65beaa492e845b03561eddd75b5be653a01,
but not in ConstantFolding.

Reviewed By: nikic, lebedev.ri

Differential Revision: https://reviews.llvm.org/D99564
2021-03-31 21:47:29 +03:00
Alexey Bataev
897d734376 [SLP]Add a test for the bug in getVectorElementSize(), NFC. 2021-03-31 11:40:44 -07:00
Huihui Zhang
7272e7d7a3 [LoopVectorize] Use SetVector to track uniform uses to prevent non-determinism.
Use SetVector instead of SmallPtrSet to track values with uniform use. Doing this
can help avoid non-determinism caused by iterating over unordered containers.

This bug was found with reverse iteration turning on,
--extra-llvm-cmake-variables="-DLLVM_REVERSE_ITERATION=ON".
Failing LLVM test consecutive-ptr-uniforms.ll .

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D99549
2021-03-31 11:21:07 -07:00
Thomas Lively
a9b1f02ebd [WebAssembly] Implement i64x2 comparisons
Removes the prototype builtin and intrinsic for i64x2.eq and implements that
instruction as well as the other i64x2 comparison instructions in the final SIMD
spec. Unsigned comparisons were not included in the final spec, so they still
need to be scalarized via a custom lowering.

Differential Revision: https://reviews.llvm.org/D99623
2021-03-31 10:46:17 -07:00
Juneyoung Lee
3e94a07500 [ValueTracking] Add with.overflow intrinsics to poison analysis functions
This is a patch teaching ValueTracking that `s/u*.with.overflow` intrinsics do not
create undef/poison and they propagate poison.
I couldn't write a nice example like the one with ctpop; ValueTrackingTest.cpp were simply updated
to check these instead.
This patch helps reducing regression while fixing https://llvm.org/pr49688 .

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99671
2021-04-01 02:41:38 +09:00
Philip Reames
4da7b4cf4f [SCEV] Handle unreachable binop when matching shift recurrence
This fixes an issue introduced with my change d4648e, and reported in pr49768.

The root problem is that dominance collapses in unreachable code, and that LoopInfo explicitly only models reachable code.  Since the recurrence matcher doesn't filter by reachability (and can't easily because not all consumers have domtree), we need to bailout before assuming that finding a recurrence implies we found a loop.
2021-03-31 10:33:34 -07:00
Craig Topper
ba24a9b976 [X86] Improve SMULO/UMULO codegen for vXi8 vectors.
The default expansion creates a MUL and either a MULHS/MULHU. Each
of those separately expand to sequences that use one or more
PMULLW instructions as well as additional instructions to
extend the types to vXi16. The MULHS/MULHU expansion computes the
whole 16-bit product, but only keeps the high part.

We can improve the lowering of SMULO/UMULO for some cases by using the MULHS/MULHU
expansion, but keep both the high and low parts. And we can use
those parts to calculate the overflow.

For AVX512 we might have vXi1 overflow outputs. We can improve those by using
vpcmpeqw to produce a k register if AVX512BW is enabled. This is a little better
than truncating the high result to use vpcmpeqb. If we don't have avx512bw we
can extend up to v16i32 to use vpcmpeqd to produce a k register.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97624
2021-03-31 10:13:50 -07:00
Shimin Cui
af509b77f3 [PowerPC] [MLICM] Enable hoisting of caller preserved registers on AIX
On ppc64 linux , MachineLICM will hoist caller preserved registers, including TOC loads of the global variable address, out of loops. This is to enable this on AIX for both ppc64 and ppc32.

Differential Revision: https://reviews.llvm.org/D99076
2021-03-31 12:46:25 -04:00
Craig Topper
37baf7a26c [X86] Improve optimizeCompareInstr for signed comparisons after BMI/TBM instructions
We previously couldn't optimize out a TEST if the branch/setcc/cmov
used the overflow flag. This patches allows the TEST to be removed
if the flag producing instruction is known to clear the OF flag.
Thats what the TEST instruction would have done so that should be
equivalent.

Need to add test cases. I'll try to get back to this if I have bandwidth.

Fixes PR48768.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D94856
2021-03-31 09:45:29 -07:00
Wael Yehia
1925a6cfe8 [LTO][Legacy] Decouple option parsing from LTOCodeGenerator
in this patch we add a new libLTO API to specify debug options independent of an lto_code_gen_t.
This allows clients to pass codegen flags (through libLTO) which otherwise today are ignored.

Reviewed By: steven_wu

Differential Revision: https://reviews.llvm.org/D92611
2021-03-31 16:43:26 +00:00
Craig Topper
c662f4964b [RISCV] Add RISCVISD opcodes for CLZW and CTZW.
Our CLZW isel pattern is quite easily broken by surrounding code
preventing it from matching sometimes. This usually results in
failing to remove the and X, 0xffffffff inserted by type
legalization. The add with -32 that type legalization also inserts
will often gets combined into other add/sub nodes. That doesn't
usually result in extra code when we don't use clzw.

CTTZ seems to be less fragile, but I wanted to keep it consistent
with CTLZ.

Reviewed By: asb, HsiangKai

Differential Revision: https://reviews.llvm.org/D99317
2021-03-31 09:40:07 -07:00
Jay Foad
b26eea9c23 [AMDGPU] Add some image tests with enable-prt-strict-null disabled. NFC. 2021-03-31 17:27:20 +01:00
Jay Foad
a35262e5d6 [AMDGPU] Use a common check prefix for some image tests. NFC. 2021-03-31 17:27:20 +01:00
Craig Topper
588b65d9ad [RISCV] Add isel patterns to select vsub_vx intrinsic to vadd.vi if it uses a small enough immediate
Also modify the simm5_plus1 check because Imm-1 is UB if Imm happens
to be INT64_MIN. I don't think the compiler would optimize based on that in this
usage, but it could fail UBSan or -ftrapv.

Reviewed By: HsiangKai, frasercrmck

Differential Revision: https://reviews.llvm.org/D99637
2021-03-31 09:26:41 -07:00
Arthur Eubanks
2e601b2808 [llvm-jitlink] Fix -Wunused-function on Windows
Reviewed By: sgraenitz

Differential Revision: https://reviews.llvm.org/D99604
2021-03-31 09:26:09 -07:00
Heejin Ahn
ac42de94f4 [WebAssembly] Raname a test and fix comments
D99627 fixed a decoding bug, not an encoding bug. This renames the test
to correct it and fix comments.

Reviewed By: dschuff

Differential Revision: https://reviews.llvm.org/D99644
2021-03-31 09:13:08 -07:00
Sanjay Patel
276f54740e [InstCombine] fold abs(srem X, 2)
This is a missing optimization based on an example in:
https://llvm.org/PR49763

As noted there and the test here, we could add a more
general fold if that is shown useful.

https://alive2.llvm.org/ce/z/xEHdTv
https://alive2.llvm.org/ce/z/97dcY5
2021-03-31 11:29:20 -04:00
Sanjay Patel
b2b2a53d04 [InstCombine] add tests for srem+abs; NFC 2021-03-31 11:29:20 -04:00
Bradley Smith
64f8889766 [AArch64][SVE] Add tests for UREM/SREM using fixed SVE types
Differential Revision: https://reviews.llvm.org/D99265
2021-03-31 16:09:55 +01:00
Sander de Smalen
772162c240 [SVE] Fix LoopVectorizer test scalalable-call.ll
This marks FSIN and other operations to EXPAND for scalable
vectors, so that they are not assumed to be legal by the cost-model.

Depends on D97470

Reviewed By: dmgreen, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D97471
2021-03-31 14:52:49 +01:00