1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

214855 Commits

Author SHA1 Message Date
Hongtao Yu
92586868ec [CSSPGO] Fix an AV caused by a block that has only pseudo pseudo instructions.
Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D101415
2021-04-27 17:54:34 -07:00
Han Zhu
41672a71c6 [loop-idiom] Hoist loop memcpys to loop preheader
For a simple loop like:
```
struct S {
  int x;
  int y;
  char b;
};

unsigned foo(S* __restrict__ a, S* b, int n) {
  for (int i = 0; i < n; i++)
    a[i] = b[i];

  return sizeof(a[0]);
}
```
We could eliminate the loop and convert it to a large memcpy of 12*n bytes. Currently this is not handled. Output of `opt -loop-idiom -S < memcpy_before.ll`
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %0 = bitcast %struct.S* %arrayidx2 to i8*
  %1 = bitcast %struct.S* %arrayidx to i8*
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 dereferenceable(12) %0, i8* nonnull align 4 dereferenceable(12) %1, i64 12, i1 false)
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }

```
The loop idiom pass currently only handles load and store instructions. Since struct S is too big to fit in a register, the loop body contains a memcpy intrinsic.

With this change, re-run `opt -loop-idiom -S < memcpy_before.ll`. The loop memcpy is promoted to loop preheader. For this trivial case, the loop is dead and will be removed by another pass.
```
%struct.S = type { i32, i32, i8 }

define dso_local i32 @_Z3fooP1SS0_i(%struct.S* noalias nocapture %a, %struct.S* nocapture readonly %b, i32 %n) local_unnamed_addr {
entry:
  %a1 = bitcast %struct.S* %a to i8*
  %b2 = bitcast %struct.S* %b to i8*
  %cmp7 = icmp sgt i32 %n, 0
  br i1 %cmp7, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  %0 = zext i32 %n to i64
  %1 = mul nuw nsw i64 %0, 12
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* align 4 %a1, i8* align 4 %b2, i64 %1, i1 false)
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  ret i32 12

for.body:                                         ; preds = %for.body, %for.body.preheader
  %i.08 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i32 %i.08 to i64
  %arrayidx = getelementptr inbounds %struct.S, %struct.S* %b, i64 %idxprom
  %arrayidx2 = getelementptr inbounds %struct.S, %struct.S* %a, i64 %idxprom
  %2 = bitcast %struct.S* %arrayidx2 to i8*
  %3 = bitcast %struct.S* %arrayidx to i8*
  %inc = add nuw nsw i32 %i.08, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
}

; Function Attrs: argmemonly nofree nosync nounwind willreturn
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* noalias nocapture writeonly, i8* noalias nocapture readonly, i64, i1 immarg) #0

attributes #0 = { argmemonly nofree nosync nounwind willreturn }
```

Reviewed By: zino

Differential Revision: https://reviews.llvm.org/D97667
2021-04-27 17:37:51 -07:00
David Tenty
1a6d5eb0da [AIX] Add %pluginext and update tests to use proper pluginext
As a follow on to D96282, since bug point passes is built as a module the proper file extension to use is LLVM_PLUGIN_EXT, rather than SHLIBEXT. Using SHLIBEXT causes the tests to load a non-existent file on AIX. We also adjust the PluginsTest unittest  to use LLVM_PLUGIN_EXT for similar reasons.

This change should hopefully make little difference to other platforms, since generally `SHLIBEXT=LTDL_SHLIB_EXT=CMAKE_SHARED_LIBRARY_SUFFIX` and `LLVM_PLUGIN_EXT=CMAKE_SHARED_LIBRARY_SUFFIX` on every platform except AIX.

Reviewed By: hubert.reinterpretcast

Differential Revision: https://reviews.llvm.org/D101412
2021-04-27 20:34:54 -04:00
Jim Radford
8377f04809 [CMake][llvm] avoid conflict w/ (and use when available) new builtin check_linker_flag
Match the API for the new check_linker_flag and use it directly when
available, leaving the old code as a fallback.

Differential Revision: https://reviews.llvm.org/D100901
2021-04-27 16:41:28 -07:00
Alexander Shaposhnikov
8d7527eb07 Revert "[llvm-objcopy][MachO] Add support for LC_THREAD/LC_UNIXTHREAD"
This reverts commit 4dfddf715b94857998601aa79c25e4f327d44dfa
since it breaks some build bots (e.g. clang-ppc64be-linux)
2021-04-27 16:19:59 -07:00
Heejin Ahn
0a60d3f451 [WebAssembly] Error when wasm EH is used with Emscripten EH/SjLj
- Error out when both Emscripten EH and wasm EH are used together, i.e.,
  both `-enable-emscripten-cxx-exceptions` and `-exception-model=wasm`
  are given together. This will not happen if you use Emscripten, but
  this can happen when you call `llc` manually with wrong set of
  arguments.
- Currently we don't yet support using wasm EH with Emscripten SjLj.
  Unlike `-enable-emscripten-cxx-exceptions` which is turned on only
  when you use `emcc -s DISABLE_EXCEPTION_CATCHING=0`,
  `-enable-emscripten-sjlj` is turned on by Emscripten by default. So we
  error out only when it is turned on and `setjmp` or `longjmp` is
  actually used.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D101403
2021-04-27 16:07:53 -07:00
Alexander Shaposhnikov
501b460768 [llvm-objcopy][MachO] Add support for LC_THREAD/LC_UNIXTHREAD
Add support for LC_THREAD/LC_UNIXTHREAD
(these load commands can be copied over without any modifications).

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D101384
2021-04-27 15:54:51 -07:00
Joseph Huber
b73245a627 [OpenMP] Remove legacy pass manager run lines
Summary:
Two tests in OpenMPOpt currently fail using the legacy pass manager. Remove
these run lines to prevent tests from failing.
2021-04-27 18:03:28 -04:00
Craig Topper
50539da1c1 [SelectionDAG] Use a VTSDNode to store the saturation width for FP_TO_SINT_SAT/FP_TO_UINT_SAT
Previously we used an i32 constant to store the saturation width, but i32 isn't
legal on RISCV64. This wasn't a big deal to fix, but it is extra work for the
type legalizer.

This patch uses a VTSDNode to store the type similar to SEXT_INREG. This makes
it opaque to the type legalizer.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101262
2021-04-27 14:38:42 -07:00
Craig Topper
d473172d42 [RISCV] Select 5 bit immediate for VSETIVLI during isel rather than peepholing in the custom inserter.
This adds a special operand type that is allowed to be either
an immediate or register. By giving it a unique operand type the
machine verifier will ignore it.

This perturbs a lot of tests but mostly it is just slightly different
instruction orders. Something bad did happen to some min/max reduction
tests. We're spilling vector registers when we weren't before.

Reviewed By: khchen

Differential Revision: https://reviews.llvm.org/D101246
2021-04-27 14:38:16 -07:00
Reid Kleckner
2cd68dac41 [NFC][SimplifyCFG] Precommit SimplifyCFG tests from D29428 2021-04-28 00:35:44 +03:00
Roman Lebedev
6624460bfc [NFC][SimplifyCFG] Autogenerate check lines in few more tests 2021-04-28 00:35:44 +03:00
Arthur Eubanks
6c8d16d78d [ConstFold] Use const-folded operands in more places
Previously we were const folding operands but not passing them.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101394
2021-04-27 14:30:19 -07:00
Dávid Bolvanský
5efc03c236 [DSE] Added testcases for 11896, NFC 2021-04-27 22:56:10 +02:00
Han Zhu
3d67f27527 [loop-idiom][NFC] Extract processLoopStoreOfLoopLoad into a helper function
Differential Revision: https://reviews.llvm.org/D100979
2021-04-27 13:42:30 -07:00
Nikita Popov
e7bbddfa9f [SCEV] Handle uge/ugt predicates in applyLoopGuards()
These can be handled the same way as ule/ult, just using umax
instead of umin. This is useful in cases where the umax prevents
the upper bound from overflowing.

Differential Revision: https://reviews.llvm.org/D101196
2021-04-27 22:41:05 +02:00
Alexey Bataev
2f428c693f [SLP]Add a test for possibly vectorized tiny tree, NFC. 2021-04-27 13:39:02 -07:00
Nikita Popov
0fd5fddf4c [SCEV] Improve loop guard tests (NFC)
Invert the branch order to make the predicate more obvious.
Add tests with two predicates, to show that rewrites are
combined.
2021-04-27 22:34:56 +02:00
Arthur Eubanks
08189b14ea [test] Fix some func-attrs tests under the legacy PM
The new PM doesn't visit declarations in CGSCC passes. These tests
aren't testing that detail, so just run them against the new PM.
2021-04-27 13:07:56 -07:00
Sanjay Patel
a7d424d173 [InstCombine] fold clamp to 2 values from min/max intrinsics
The "select" versions of these folds is also missing and can
cause infinite loops as shown in:
https://llvm.org/PR48900
...but it seems easier to match these as max/min as a first fix.

https://alive2.llvm.org/ce/z/wv-_dT
2021-04-27 15:35:49 -04:00
Sanjay Patel
c9aa2ef02f [InstCombine] add tests for clamp patterns using min/max intrinsics; NFC 2021-04-27 15:35:49 -04:00
Andy Kaylor
64bce9007c [Dependence Analysis] Fix ExactSIV producing wrong analysis
Patch by Artem Radzikhovskyy!

Symptom: ExactSIV test produced incorrect analysis of dependencies see LIT tests
Bug: At the end of the algorithm when determining dependence direction original author forgot to divide intermediate results by gcd and round result toward zero

Although this bug can be fixed with significantly fewer changes I opted to write the code in such a way that reflects the original algorithm that Banerjee proposed, for easier reference in the future. This surprisingly results in shorter code, and fewer quotient and max/min calculations.

Changes Summary:

- fixed findGCD to return valid x and y so that they match the function description where: ax - by = gcd(a,b)
- Fixed ExactSIV test, to produce proper results
- Documented the extension of Banerjee's algorithm that the original code author introduced. Banerjee's original algorithm only tested whether Dst depends on Src, the extension also allows us to test whether Src depends on Dst, in one pass.
- ExactRDIV test worked fine. Since it uses findGCD(), it needed to be updated.Since ExactRDIV test has very few changes from the core algorithm of ExactSIV I modified the test to have consistent format as ExactSIV.
- Updated the LIT tests to be testing for correct values.

Differential Revision: https://reviews.llvm.org/D100331
2021-04-27 12:24:00 -07:00
Jay Foad
63b9806743 [AMDGPU] GCNHazardRecognizer: ignore all meta instructions
This is hopefully NFC, but should be more robust in ignoring all
instructions that should be ignored, instead of just some of them.

Differential Revision: https://reviews.llvm.org/D101372
2021-04-27 20:17:15 +01:00
Roman Lebedev
c2509ff480 [NFC][SimplifyCFG] Autogenerate check lines in many test files
These are potentially being affected by an upcoming patch.
2021-04-27 22:05:42 +03:00
David Green
5f03ae3ed7 [ARM] Recognize VIDUP from BUILDVECTORs of additions
This adds a pattern to recognize VIDUP from BUILD_VECTOR of incrementing
adds. This can come up from either geps or adds, and came up recently in
D100550. We are just looking for a BUILD_VECTOR where each lane is an
add of the first lane with N*i, where i is the lane and N is one of 1,
2, 4, or 8, supported by the VIDUP instruction.

Differential Revision: https://reviews.llvm.org/D101263
2021-04-27 19:33:24 +01:00
David Green
529c8f9a7d [ARM] Additional VIDUP tests. NFC 2021-04-27 19:33:24 +01:00
Alexey Bataev
eadc8ecdd1 [COST][X86]Improve cost model for reverse shuffle v32i16/v64i8 in AVX512F.
Improved cost model for reverse shuffle on AVX512F for types
v32i16/v64i8.

Differential Revision: https://reviews.llvm.org/D100974
2021-04-27 11:14:21 -07:00
Roman Lebedev
b865b8ef48 [NFC][Verifier] Fixup token PHINode test cases
It would still pass in non-assert build,
but with asserts it would now crash.

I haven't checked, but hopefully `not`'s `--crash` argument
should be enough to support both paths.
2021-04-27 21:09:43 +03:00
Ahmed Bougacha
4a04d21325 [docs] Replace Apple representative to security group.
Differential Revision: https://reviews.llvm.org/D100864
2021-04-27 11:00:49 -07:00
Roman Lebedev
0c820c996f [NFC][IR] PHINode: ... and assert in another ctor too 2021-04-27 20:52:44 +03:00
Roman Lebedev
90e0e1e078 [NFC][IR] PHINode: assert we aren't trying to create token-typed PHI
Verifier will complain, but by then it may be too late,
because we might have never reached it because
we already crashed with some bogus bug.
It is best to catch this the moment it happens.
2021-04-27 20:49:42 +03:00
Anirudh Prasad
bf78623d49 [SystemZ][z/OS] Remove register prefixes when printing out the register.
- This patch is the first part in enforcing prefix-less registers for the HLASM dialect in z/OS
- This patch removes the "%[r|f|v]" prefix while printing registers
- To achieve this, the `AssemblerDialect` field of MAI was used
- There is also a bit of refactoring done to ensure code repetition is reduced.
- Currently the LLVM assembler for SystemZ/z/OS accepts both prefixed registers and prefix-less registers. A subsequent follow-up patch will restrict the SystemZAsmParser to only accept prefix-less registers.

Crediting @kianm as an author as well.

Reviewed By: uweigand, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D101308
2021-04-27 13:47:32 -04:00
Craig Topper
6cf780d2b7 [TableGen] Add predicate checks to isel patterns for default HwMode.
As discussed in D100691 and based on D100889.

I removed the ModeChecks cache which provides little value. Reduced
from three loops to two. Used ArrayRef to pass the Predicate to
AppendPattern to avoid needing to construct a vector for single
mode. Used SmallVector to avoid heap allocation constructing
DefaultCheck for the in tree targets the use it.

Reviewed By: kparzysz

Differential Revision: https://reviews.llvm.org/D101240
2021-04-27 10:46:51 -07:00
Nick Desaulniers
01023814c6 [CodeGenOptions] make StackProtectorGuardOffset signed
GCC supports negative values for -mstack-protector-guard-offset=, this
should be a signed value. Pre-req to D100919.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101325
2021-04-27 10:12:58 -07:00
LLVM GN Syncbot
869b0c1aeb [gn build] Port 241c2da4064c 2021-04-27 16:56:33 +00:00
Victor Huang
cca8b9f1d5 [AIX][Power10] Restrict prefixed instructions from crossing the 64byte boundary
This patch adds the support to restrict prefixed instruction from
crossing the 64 byte boundary:
- Add the infrastructure to register a custom XCOFF streamer
- Add a custom XCOFF streamer for PowerPC to allow us to
  intercept instructions as they are being emitted and align all 8 byte
  instructions to a 64 byte boundary if required by adding a 4 byte nop.

Reviewed By: stefanp

Differential Revision: https://reviews.llvm.org/D101107
2021-04-27 11:55:18 -05:00
Nico Weber
8951199ee3 [llvm, clang] Remove stdlib includes from .h files without std::
Found files not containing `std::` with:

    INCL="algorithm|array|list|map|memory|queue|set|string|utility|vector|unordered_map|unordered_set"
    git ls-files llvm/include/llvm | grep '\.h$' | xargs grep -L std:: | \
        xargs grep -El "#include <($INCL)>$" > to_process.txt
    git ls-files clang/include/clang | grep '\.h$' | xargs grep -L std:: | \
        xargs grep -El "#include <($INCL)>$" >> to_process.txt

Then removed these headers from those files with

    INCL_ESCAPED="$(echo $INCL|sed 's/|/\\|/g')"
    cat to_process.txt | xargs sed -i "/^#include <\($INCL_ESCAPED\)>$/d"
    cat to_process.txt | xargs sed -i '/^$/N;/^\n$/D'

No behavior change.

Differential Revision: https://reviews.llvm.org/D101378
2021-04-27 12:41:39 -04:00
Christian Kühnel
6d2a6bfee4 [doc] added documentation for pre-merge testing
fixes https://github.com/google/llvm-premerge-checks/issues/275

Differential Revision: https://reviews.llvm.org/D100936
2021-04-27 16:53:16 +02:00
David Sherwood
ab72b625ed Revert "[LoopVectorize] Simplify scalar cost calculation in getInstructionCost"
This reverts commit 4afeda9157cffd2daa83f8075d73f1e11ea34c81.
2021-04-27 15:46:03 +01:00
Simon Pilgrim
79c123d251 Revert rG9b7a0a50355d5 - Revert "[X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)"
Still causing some sanitizer buildbot failures.
2021-04-27 15:39:20 +01:00
David Sherwood
d25250f6ac [LoopVectorize] Simplify scalar cost calculation in getInstructionCost
This patch simplifies the calculation of certain costs in
getInstructionCost when isScalarAfterVectorization() returns a true value.
There are a few places where we multiply a cost by a number N, i.e.

  unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
  return N * TTI.getArithmeticInstrCost(...

After some investigation it seems that there are only these cases that occur
in practice:

1. VF is a scalar, in which case N = 1.
2. VF is a vector. We can only get here if: a) the instruction is a
GEP/bitcast/PHI with scalar uses, or b) this is an update to an induction
variable that remains scalar.

I have changed the code so that N is assumed to always be 1. For GEPs
the cost is always 0, since this is calculated later on as part of the
load/store cost. PHI nodes are costed separately and were never previously
multiplied by VF. For all other cases I have added an assert that none of
the users needs scalarising, which didn't fire in any unit tests.

Only one test required fixing and I believe the original cost for the scalar
add instruction to have been wrong, since only one copy remains after
vectorisation.

I have also added a new test for the case when a pointer PHI feeds directly
into a store that will be scalarised as we were previously never testing it.

Differential Revision: https://reviews.llvm.org/D99718
2021-04-27 15:26:15 +01:00
Simon Pilgrim
7257ef16c9 [X86] Add support for reusing ZF etc. from locked XADD instructions (PR20841)
XADD has the same EFLAGS behaviour as ADD

Reapplies rG2149aa73f640 (after it was reverted at rG535df472b042) - AFAICT rG029e41ec9800 should ensure we correctly tag the LXADD* ops as load/stores - I haven't been able to repro the sanitizer buildbot fails locally so this is a speculative commit.
2021-04-27 15:01:13 +01:00
Jay Foad
39f247cd42 [AMDGPU] Minor refactoring in AMDGPUUnifyDivergentExitNodes. NFC.
Make unifyReturnBlockSet a member function so we don't have to pass TTI
around as an argument.
2021-04-27 14:21:51 +01:00
Simon Pilgrim
d5cacc9334 [X86] Ensure multiclass ATOMIC_RMW_BINOP is tagged as MayLoad and MayStore
These are RMW ops and should be tagged as both loads and stores.
2021-04-27 14:11:22 +01:00
Alexey Bataev
2b896fc0d5 [SLP]Improved isGatherShuffledEntry, NFC.
Reworked isGatherShuffledEntry function, simplified and moved
common code to the lambda (it shall go away when non-power-2 patch will
be landed).
2021-04-27 05:59:46 -07:00
Florian Hahn
0cb75ec25f [LV,LAA] Add test cases with pointer phis in loops.
Pre-commits tests for D101286.
2021-04-27 13:49:32 +01:00
Petar Avramovic
29902ee1b4 AMDGPU/GlobalISel: Fix negative offset folding for buffer_load
Buffer_load does unsigned offset calculations. Don't fold
operands of 32-bit add that are likely to cause unsigned add
overflow (common case is when one of the operands is negative).

Differential Revision: https://reviews.llvm.org/D91336
2021-04-27 14:45:22 +02:00
Petar Avramovic
b0f45068ce AMDGPU/GlobalISel: Add test for buffer_load with negative offset
Pre-commit test for D91336.
2021-04-27 14:45:21 +02:00
Florian Hahn
eed27336cf [LV] Hoist code to get vector loop latch (NFC).
Address suggestion from D99294.
2021-04-27 13:30:17 +01:00
Sanjay Patel
5d538c6cd1 [IndVars] avoid crash in LFTR when assuming an add recurrence
The test is a crasher reduced from:
https://llvm.org/PR49993

linearFunctionTestReplace() assumes that we have an add recurrence,
so check for that as a condition of matching a loop counter.

Differential Revision: https://reviews.llvm.org/D101291
2021-04-27 08:26:02 -04:00