1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

16381 Commits

Author SHA1 Message Date
Nikita Popov
a0c7f8edc6 [MemCpyOpt] Add test scaffolding for MSSA based MemCpyOpt
This adds an -enable-memcpyopt-memoryssa option that currently does
nothing apart from requiring MSSA as a dependency. The tests are
split to run both with the option disabled and enabled. I went with
this rather than the separate directory DSE uses, as I found it
convenient to have a direct side-by-side comparison of differences.

Differential Revision: https://reviews.llvm.org/D89206
2020-10-13 21:45:05 +02:00
Nikita Popov
0f9751aa58 [MemCpyOpt] Fix MemorySSA preservation
moveUp() moves instructions, so we should move the corresponding
memory accesses as well. We should also move the store instruction
itself: Even though we'll end up removing it later, this gives us
a correct MemoryDef to replace.

The implementation is somewhat more complicated than it should be,
because we also handle the case where P does not have a memory
access due to a degnerate AA pipeline. Hopefully, the need for this
will go away in the future, when the rest of the pass is based on
MSSA.

Differential Revision: https://reviews.llvm.org/D88778
2020-10-13 21:39:09 +02:00
Nikita Popov
5617405db4 [MemCpyOpt] Don't shorten memset if memcpy operands may be the same
If the memcpy operands are the same (which is allowed since D86815)
then the memcpy is effectively a no-op and the partially overlapping
memset is not dead.

Differential Revision: https://reviews.llvm.org/D89192
2020-10-13 21:19:19 +02:00
Nikita Popov
a8bea0172a [MemCpyOpt] Don't shorten memset if destination observable through unwinding
MemCpyOpt can shorten a memset if it is later partially overwritten
by a memcpy. It checks that the destination is not read in between,
but we also need to make sure that the destination cannot be observed
via unwinding.

Differential Revision: https://reviews.llvm.org/D89190
2020-10-13 21:12:19 +02:00
Sjoerd Meijer
52915d0f00 [InstructionSimplify] Precommit tests for D89317. NFC. 2020-10-13 15:40:33 +01:00
Simon Pilgrim
ac2fbd0695 [InstCombine] visitTrunc - pass through undefs for trunc(shift(trunc/ext(x),c)) patterns
Based on the recent patches D88475 and D88429 where we are losing undef values due to extension/comparisons.

I've added a Constant::mergeUndefsWith method that merges the undef scalar/elements from another Constant into a specific Constant.

Differential Revision: https://reviews.llvm.org/D88687
2020-10-13 14:35:18 +01:00
Florian Hahn
afe8317d71 [ConstraintElimination] Add add/sub/and/or test cases.
This adds a new set of tests for upcoming constraint elimination
changes.
2020-10-13 13:18:32 +01:00
Simon Pilgrim
6df62d9e30 [InstCombine] Support uniform vector splats in ((((X >> C) & CC) + Y) << C) folds.
Add support for uniform vector splats (no undefs).
2020-10-13 09:28:39 +01:00
Max Kazantsev
5d36bfd596 [Test] Use generated auto-checks to make further changes more visible 2020-10-13 15:16:32 +07:00
Roman Lebedev
47d13f688a Reland "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
This relands commit 1c021c64caef83cccb719c9bf0a2554faa6563af which was
reverted in commit 17cec6a11a12f815052d56a17ef738cf246a2d9a because
an assertion was being triggered, since `BuildConstantFromSCEV()`
wasn't updated to handle the case where the constant we want to truncate
is actually a pointer. I was unsuccessful in coming up with a test case
where we'd end there with constant zext/sext of a pointer,
so i didn't handle those cases there until there is a test case.

Original commit message:

While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 23:02:55 +03:00
Arthur Eubanks
8e4adad70b [Inliner][NPM] Fix various tests under NPM
alloca-dbgdeclare-merge.ll:
alloca-merge-align.ll:
array_merge.ll:
 NPM inliner does not merge allocas

delete-call.ll:
 NPM inliner does not delete readonly calls

externally_available.ll:
 NPM inliner does not delete available_externally functions

inline-cold-callee.ll:
inline-hot-callee.ll:
 inline-hot-callee.ll has a comment saying it only applies to legacy PM,
 I assume same for inline-cold-callee.ll

devirtualize-2.ll:
inline-hot-callsite:
monster_scc.ll:
pr22285.ll:
 already has legacy and new PM RUN lines

inline-cold.ll:
 profile-summary required to see callee as cold

prof-update-sample.ll:
 profile-summary required to update branch_weights

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D89093
2020-10-12 10:41:40 -07:00
Hans Wennborg
bcad3ef754 Revert 1c021c64c "[SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown"
> While we indeed can't treat them as no-ops, i believe we can/should
> do better than just modelling them as `unknown`. `inttoptr` story
> is complicated, but for `ptrtoint`, it seems straight-forward
> to model it just as a zext-or-trunc of unknown.
>
> This may be important now that we track towards
> making inttoptr/ptrtoint casts not no-op,
> and towards preventing folding them into loads/etc
> (see D88979/D88789/D88788)
>
> Reviewed By: mkazantsev
>
> Differential Revision: https://reviews.llvm.org/D88806

It caused the following assert during Chromium builds:

  llvm/lib/IR/Constants.cpp:1868:
  static llvm::Constant *llvm::ConstantExpr::getTrunc(llvm::Constant *, llvm::Type *, bool):
  Assertion `C->getType()->isIntOrIntVectorTy() && "Trunc operand must be integer"' failed.

See code review for a link to a reproducer.

This reverts commit 1c021c64caef83cccb719c9bf0a2554faa6563af.
2020-10-12 18:39:35 +02:00
Dávid Bolvanský
6ca1daa81f [Tests] Regenerate test checks; NFC 2020-10-12 17:55:00 +02:00
Simon Pilgrim
371ddf42dc [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw (REAPPLIED)
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Reapplied after the shift canonicalization in rG02295e6d1a15 which removed the need to flip the shift values.

Differential Revision: https://reviews.llvm.org/D88783
2020-10-12 16:06:41 +01:00
Simon Pilgrim
030fbbba7c Revert rGb97093e520036f8 - "[InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw"
This reverts commit b97093e520036f88c5b39e572966f1c8c387661e.

Funnel shift argument commutation isn't working correctly
2020-10-12 11:38:52 +01:00
Roman Lebedev
e731a4fcca [SCEV] Model ptrtoint(SCEVUnknown) cast not as unknown, but as zext/trunc/self of SCEVUnknown
While we indeed can't treat them as no-ops, i believe we can/should
do better than just modelling them as `unknown`. `inttoptr` story
is complicated, but for `ptrtoint`, it seems straight-forward
to model it just as a zext-or-trunc of unknown.

This may be important now that we track towards
making inttoptr/ptrtoint casts not no-op,
and towards preventing folding them into loads/etc
(see D88979/D88789/D88788)

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D88806
2020-10-12 11:04:03 +03:00
Roman Lebedev
794d4b6ead [InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.

As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.

I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..

On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.

However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.

To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.

See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.

(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)

Reviewed By: nlopes, nikic

Differential Revision: https://reviews.llvm.org/D88979
2020-10-11 20:24:28 +03:00
David Green
15eddfee76 [LV] Tail folded inloop reductions.
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:

  x = select(mask, vecop, zero)
  v = vecreduce.add(x)
  c = add chain, v

Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.

Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.

Differential Revision: https://reviews.llvm.org/D84451
2020-10-11 16:58:34 +01:00
Nikita Popov
992463d714 [MemCpyOpt] Add lifetime may alias test (NFC)
Test the case where a lifetime intrinsic may alias the memcpy
source. Other cases test must or no alias.
2020-10-11 17:08:28 +02:00
David Green
76941ccb01 [LV] Extra predicated inloop reduction tests. NFC 2020-10-11 15:06:21 +01:00
Nikita Popov
d785af869c [MemCpyOpt] Add additional byval tests (NFC)
Test read/write clobbers and the the non-local case.
2020-10-11 15:22:31 +02:00
Sanjay Patel
f2e6883077 [InstCombine] allow vector splats for add+xor --> shifts 2020-10-11 09:04:24 -04:00
Sanjay Patel
372d448d9b [InstCombine] add one-use check to add+xor transform
As shown in the affected test, we could increase instruction
count without this limitation. There's another test with extra
use that shows we still convert directly to a real "sext" if
possible.
2020-10-11 09:04:24 -04:00
Sanjay Patel
846ed8e1d0 [InstCombine] add tests with extra uses for add+xor transform; NFC 2020-10-11 09:04:24 -04:00
Sanjay Patel
cc5fc0ded3 [InstCombine] add/adjust tests for add+xor -> shifts; NFC 2020-10-11 09:04:24 -04:00
Simon Pilgrim
9d1902bc07 [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Differential Revision: https://reviews.llvm.org/D88783
2020-10-11 10:37:20 +01:00
Xun Li
bcb2d04b33 [Coroutines] Refactor/Rewrite Spill and Alloca processing
This patch is a refactoring of how we process spills and allocas during CoroSplit.
In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas.
And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points.

This approach is fundamentally confusing, and unfortunately, incorrect.
First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately.
Doing so simplify lots of code and makes the logic more clear and easier to reason about.

Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap.
There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship.
For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call.
Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!).
All these issues suggest that we need to separate spill and alloca in order to properly implement this.
This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch.

The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap).
Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index.

In doing so, I also cleaned up a few things and also discovered a few other bugs.

Cleanups:
1. Found out that PromiseFieldId is not used, delete it.
2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users.
3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap.
4. All the loops that process Spills are simplified now because we use a map instead of a vector.

Bugs:
It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function.

The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame.

Differential Revision: https://reviews.llvm.org/D88872
2020-10-10 22:21:34 -07:00
Simon Pilgrim
473789ee5d Remove %tmp variables from test cases to appease update_test_checks.py 2020-10-10 19:13:16 +01:00
Simon Pilgrim
724010c5dd [InstCombine] Add test case showing rotate intrinsic being split by SimplifyDemandedBits
Noticed while triaging regression report on D88834
2020-10-10 18:19:42 +01:00
Nikita Popov
6d9e2cd7e0 [MemCpyOpt] Add test for incorrect memset DSE (NFC)
We can't shorten the memset if there's a throwing call in between
and the destination is non-local.
2020-10-10 16:11:14 +02:00
David Green
d5dc3dc1f4 [AArch64][LV] Move vectorizer test to Transforms/LoopVectorize/AArch64. NFC 2020-10-10 10:15:43 +01:00
Nikita Popov
12c9ccbc9d [MemCpyOpt] Don't hoist store that's not guaranteed to execute
MemCpyOpt can hoist stores while load+store pairs into memcpy.
This hoisting can currently result in stores being executed that
weren't guaranteed to execute in the original problem.

Differential Revision: https://reviews.llvm.org/D89154
2020-10-10 10:26:28 +02:00
Changpeng Fang
7c619d4ea7 Sink: Handle instruction sink when a user is dead
Summary:
  The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to.
However, a user may be in a dead block, which will result in unexpected behavior.

This patch handles such cases by skipping dead blocks. This patch fixes:
https://bugs.llvm.org/show_bug.cgi?id=47415

Reviewers:
  MaskRay, arsenm

Differential Revision:
  https://reviews.llvm.org/D89166
2020-10-09 16:20:26 -07:00
Eli Friedman
8b737bdfec [SCCP] Reduce the number of times ResolvedUndefsIn is called for large modules.
If a module has many values that need to be resolved by
ResolvedUndefsIn, compilation takes quadratic time overall. Solve should
do a small amount of work, since not much is added to the worklists each
time markOverdefined is called. But ResolvedUndefsIn is linear over the
length of the function/module, so resolving one undef at a time is
quadratic in general.

To solve this, make ResolvedUndefsIn resolve every undef value at once,
instead of resolving them one at a time. This loses a little
optimization power, but can be a lot faster.

We still need a loop around ResolvedUndefsIn because markOverdefined
could change the set of blocks that are live. That should be uncommon,
hopefully. We could optimize it by tracking which blocks transition from
dead to live, instead of iterating over the whole module to find them.
But I'll leave that for later. (The whole function will become a lot
simpler once we start pruning branches on undef.)

The regression test changes seem minor. The specific cases in question
could probably be optimized with a bit more work, but they seem like
edge cases that don't really matter.

Fixes an "infinite" compile issue my team found on an internal workoad.

Differential Revision: https://reviews.llvm.org/D89080
2020-10-09 15:24:16 -07:00
Arthur Eubanks
b450556f70 [Reg2Mem][NewPM] Pin test to legacy PM
This pass hasn't been touched in a long time and isn't used in tree.
2020-10-09 12:36:08 -07:00
Nikita Popov
b9a4cf95b0 [MemCpyOpt] Add test for incorrectly hoisted store (NFC) 2020-10-09 20:52:08 +02:00
Giorgis Georgakoudis
5c60ef51ef [OpenMPOpt] Merge parallel regions
There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way.  This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83635
2020-10-09 09:59:04 -07:00
Arthur Eubanks
a70ba1098b [FixIrreducible][NewPM] Port -fix-irreducible to NPM
In the NPM, a pass cannot depend on another non-analysis pass. So pin
the test that tests that -lowerswitch is run automatically to legacy PM.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D89051
2020-10-09 09:22:09 -07:00
Arthur Eubanks
a2aedea790 [LoopInterchange][NewPM] Port -loop-interchange to NPM
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89058
2020-10-09 09:21:31 -07:00
Simon Pilgrim
99440aa5b8 [InstCombine] Support lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) uniform vector tests
FoldShiftByConstant is hardcoded for scalar/uniform outer shift amounts atm so that needs to be fixed first to support non-uniform cases
2020-10-09 16:54:46 +01:00
Simon Pilgrim
9735e12152 [InstCombine] Add lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) vector tests 2020-10-09 16:54:46 +01:00
Simon Pilgrim
23e4436662 [InstCombine] commonShiftTransforms - add support for pow2 nonuniform constant vectors in srem fold
Note: we already fold srem to undef if any denominator vector element is undef.
2020-10-09 15:59:33 +01:00
Sanjay Patel
45b43d4e29 [InstCombine] allow vector splats for add+and with high-mask
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3

  Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
  %a = and %x, C2
  %r = add %a, C1
  =>
  %a2 = add %x, C1
  %r = and %a2, C2
2020-10-09 10:39:11 -04:00
Simon Pilgrim
f2ba86985b [InstCombine] Add tests for X shift (A srem B) -> X shift (A and B-1) pow2 nonuniform constant vectors 2020-10-09 15:33:06 +01:00
Florian Hahn
9cb6002d04 [SCEV] Do not apply info from loop guards in AddRecs.
We cannot guarantee that the replacement expression is loop-invariant in
all AddRecs in the source expression. Use a rewriter that skips
AddRecExpr for now.

Fixes PR47776.
2020-10-09 14:47:26 +01:00
Simon Pilgrim
af14bbf8c2 [InstCombine] foldShiftOfShiftedLogic - add support for nonuniform constant vectors 2020-10-09 14:25:12 +01:00
Simon Pilgrim
6d17aca7ab [InstCombine] Add nonuniform/undef vector tests for shift(binop(shift(x,c1),y),c2) patterns 2020-10-09 13:42:11 +01:00
Simon Pilgrim
8f93933775 [InstCombine] visitTrunc - trunc(shl(X, C)) --> shl(trunc(X),trunc(C)) vector support
Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp).

Differential Revision: https://reviews.llvm.org/D89067
2020-10-08 22:07:51 +01:00
Simon Pilgrim
a0c0bac22a [InstCombine] Add additional trunc(shl(x,c)) -> shl(trunc(x),trunc(c)) vector tests 2020-10-08 21:11:48 +01:00
Sanjay Patel
43799dbe92 [InstCombine] allow vector splats for add+xor with low-mask
This can be allowed with undef elements too, but that can be another step:
https://alive2.llvm.org/ce/z/hnC4Z-
2020-10-08 15:53:38 -04:00