1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

16365 Commits

Author SHA1 Message Date
Roman Lebedev
794d4b6ead [InstCombine] combineLoadToOperationType(): don't fold int<->ptr cast into load
And another step towards transforms not introducing inttoptr and/or
ptrtoint casts that weren't there already.

As we've been establishing (see D88788/D88789), if there is a int<->ptr cast,
it basically must stay as-is, we can't do much with it.

I've looked, and the most source of new such casts being introduces,
as far as i can tell, is this transform, which, ironically,
tries to reduce count of casts..

On vanilla llvm test-suite + RawSpeed, @ `-O3`, this results in
-33.58% less `IntToPtr`s (19014 -> 12629)
and +76.20% more `PtrToInt`s (18589 -> 32753),
which is an increase of +20.69% in total.

However just on RawSpeed, where i know there are basically
none `IntToPtr` in the original source code,
this results in -99.27% less `IntToPtr`s (2724 -> 20)
and +82.92% more `PtrToInt`s (4513 -> 8255).
which is again an increase of 14.34% in total.

To me this does seem like the step in the right direction,
we end up with strictly less `IntToPtr`, but strictly more `PtrToInt`,
which seems like a reasonable trade-off.

See https://reviews.llvm.org/D88860 / https://reviews.llvm.org/D88995
for some more discussion on the subject.

(Eventually, `CastInst::isNoopCast()`/`CastInst::isEliminableCastPair`
should be taught about this, yes)

Reviewed By: nlopes, nikic

Differential Revision: https://reviews.llvm.org/D88979
2020-10-11 20:24:28 +03:00
David Green
15eddfee76 [LV] Tail folded inloop reductions.
This expands upon the inloop reductions added in e9761688e41cb9e976,
allowing them to be inserted into tail folded loops. Reductions are
generates with the form:

  x = select(mask, vecop, zero)
  v = vecreduce.add(x)
  c = add chain, v

Where zero here is chosen as the identity value for add reductions. The
backend is then expected to fold the select and the vecreduce into a
single predicated instruction.

Most of the code is fairly straight forward, except for the creation of
blockmasks which need to ensure they are created in dominance order. The
order they are added is altered to be after any phis, keeping the
requirements for the underlying IR.

Differential Revision: https://reviews.llvm.org/D84451
2020-10-11 16:58:34 +01:00
Nikita Popov
992463d714 [MemCpyOpt] Add lifetime may alias test (NFC)
Test the case where a lifetime intrinsic may alias the memcpy
source. Other cases test must or no alias.
2020-10-11 17:08:28 +02:00
David Green
76941ccb01 [LV] Extra predicated inloop reduction tests. NFC 2020-10-11 15:06:21 +01:00
Nikita Popov
d785af869c [MemCpyOpt] Add additional byval tests (NFC)
Test read/write clobbers and the the non-local case.
2020-10-11 15:22:31 +02:00
Sanjay Patel
f2e6883077 [InstCombine] allow vector splats for add+xor --> shifts 2020-10-11 09:04:24 -04:00
Sanjay Patel
372d448d9b [InstCombine] add one-use check to add+xor transform
As shown in the affected test, we could increase instruction
count without this limitation. There's another test with extra
use that shows we still convert directly to a real "sext" if
possible.
2020-10-11 09:04:24 -04:00
Sanjay Patel
846ed8e1d0 [InstCombine] add tests with extra uses for add+xor transform; NFC 2020-10-11 09:04:24 -04:00
Sanjay Patel
cc5fc0ded3 [InstCombine] add/adjust tests for add+xor -> shifts; NFC 2020-10-11 09:04:24 -04:00
Simon Pilgrim
9d1902bc07 [InstCombine] matchFunnelShift - fold or(shl(a,x),lshr(b,sub(bw,x))) -> fshl(a,b,x) iff x < bw
If value tracking can confirm that a shift value is less than the type bitwidth then we can more confidently fold general or(shl(a,x),lshr(b,sub(bw,x))) patterns to a funnel/rotate intrinsic pattern without causing bad codegen regressions in the backend (see D89139).

Differential Revision: https://reviews.llvm.org/D88783
2020-10-11 10:37:20 +01:00
Xun Li
bcb2d04b33 [Coroutines] Refactor/Rewrite Spill and Alloca processing
This patch is a refactoring of how we process spills and allocas during CoroSplit.
In the previous implementation, everything that needs to go to the heap is put into Spills, including all the values defined by allocas.
And the way to identify a Spill, is to check whether there exists a use-def relationship that crosses suspension points.

This approach is fundamentally confusing, and unfortunately, incorrect.
First of all, allocas are always process differently than spills, hence it's quite confusing to put them together. It's a much cleaner to separate them and process them separately.
Doing so simplify lots of code and makes the logic more clear and easier to reason about.

Secondly, use-def relationship is insufficient to decide whether a value defined by AllocaInst needs to go to the heap.
There are many cases where a value defined by AllocaInst can implicitly be used across suspension points without a direct use-def relationship.
For example, you can store the address of an alloca into the heap, and load that address after suspension. Or you can escape the address into an object through a function call.
Or you can have a PHINode that takes two allocas, and this PHINode is used across suspension point (when this happens, the existing implementation will spill the PHINode, a.k.a a stack adddress to the heap!).
All these issues suggest that we need to separate spill and alloca in order to properly implement this.
This patch does not yet fix these bugs, however it sets up the code in a better shape so that we can start fixing them in the next patch.

The core idea of this patch is to add a new struct called FrameDataInfo, which contains all Spills, all Allocas, and a map from each definition to its layout index in the frame (FieldIndexMap).
Spills and Allocas are identified, stored and processed independently. When they are initially added to the frame, we record their field index through FieldIndexMap. When the frame layout is finalized, we update each index into their final layout index.

In doing so, I also cleaned up a few things and also discovered a few other bugs.

Cleanups:
1. Found out that PromiseFieldId is not used, delete it.
2. Previously, SpillInfo is a vector, which is strange because every def can have multiple users. This patch cleans it up by turning it into a map from def to users.
3. Previously, a frame Field struct contains a list of Spills that field corresponds to. This isn't necessary since we only need the layout index for each given definition. This patch removes that list. Instead, we connect each field and definition using the FieldIndexMap.
4. All the loops that process Spills are simplified now because we use a map instead of a vector.

Bugs:
It seems that we are only keeping llvm.dbg.declare intrinsics in the .resume part of the function. The ramp function will no longer has it. This means we are dropping some debug information in the ramp function.

The next step is to start fixing the bugs where the implementation fails to identify some allocas that should live on the frame.

Differential Revision: https://reviews.llvm.org/D88872
2020-10-10 22:21:34 -07:00
Simon Pilgrim
473789ee5d Remove %tmp variables from test cases to appease update_test_checks.py 2020-10-10 19:13:16 +01:00
Simon Pilgrim
724010c5dd [InstCombine] Add test case showing rotate intrinsic being split by SimplifyDemandedBits
Noticed while triaging regression report on D88834
2020-10-10 18:19:42 +01:00
Nikita Popov
6d9e2cd7e0 [MemCpyOpt] Add test for incorrect memset DSE (NFC)
We can't shorten the memset if there's a throwing call in between
and the destination is non-local.
2020-10-10 16:11:14 +02:00
David Green
d5dc3dc1f4 [AArch64][LV] Move vectorizer test to Transforms/LoopVectorize/AArch64. NFC 2020-10-10 10:15:43 +01:00
Nikita Popov
12c9ccbc9d [MemCpyOpt] Don't hoist store that's not guaranteed to execute
MemCpyOpt can hoist stores while load+store pairs into memcpy.
This hoisting can currently result in stores being executed that
weren't guaranteed to execute in the original problem.

Differential Revision: https://reviews.llvm.org/D89154
2020-10-10 10:26:28 +02:00
Changpeng Fang
7c619d4ea7 Sink: Handle instruction sink when a user is dead
Summary:
  The current instruction sink pass uses findNearestCommonDominator of all users to find block to sink the instruction to.
However, a user may be in a dead block, which will result in unexpected behavior.

This patch handles such cases by skipping dead blocks. This patch fixes:
https://bugs.llvm.org/show_bug.cgi?id=47415

Reviewers:
  MaskRay, arsenm

Differential Revision:
  https://reviews.llvm.org/D89166
2020-10-09 16:20:26 -07:00
Eli Friedman
8b737bdfec [SCCP] Reduce the number of times ResolvedUndefsIn is called for large modules.
If a module has many values that need to be resolved by
ResolvedUndefsIn, compilation takes quadratic time overall. Solve should
do a small amount of work, since not much is added to the worklists each
time markOverdefined is called. But ResolvedUndefsIn is linear over the
length of the function/module, so resolving one undef at a time is
quadratic in general.

To solve this, make ResolvedUndefsIn resolve every undef value at once,
instead of resolving them one at a time. This loses a little
optimization power, but can be a lot faster.

We still need a loop around ResolvedUndefsIn because markOverdefined
could change the set of blocks that are live. That should be uncommon,
hopefully. We could optimize it by tracking which blocks transition from
dead to live, instead of iterating over the whole module to find them.
But I'll leave that for later. (The whole function will become a lot
simpler once we start pruning branches on undef.)

The regression test changes seem minor. The specific cases in question
could probably be optimized with a bit more work, but they seem like
edge cases that don't really matter.

Fixes an "infinite" compile issue my team found on an internal workoad.

Differential Revision: https://reviews.llvm.org/D89080
2020-10-09 15:24:16 -07:00
Arthur Eubanks
b450556f70 [Reg2Mem][NewPM] Pin test to legacy PM
This pass hasn't been touched in a long time and isn't used in tree.
2020-10-09 12:36:08 -07:00
Nikita Popov
b9a4cf95b0 [MemCpyOpt] Add test for incorrectly hoisted store (NFC) 2020-10-09 20:52:08 +02:00
Giorgis Georgakoudis
5c60ef51ef [OpenMPOpt] Merge parallel regions
There are cases that generated OpenMP code consists of multiple,
consecutive OpenMP parallel regions, either due to high-level
programming models, such as RAJA, Kokkos, lowering to OpenMP code, or
simply because the programmer parallelized code this way.  This
optimization merges consecutive parallel OpenMP regions to: (1) reduce
the runtime overhead of re-activating a team of threads; (2) enlarge the
scope for other OpenMP optimizations, e.g., runtime call deduplication
and synchronization elimination.

This implementation defensively merges parallel regions, only when they
are within the same BB and any in-between instructions are safe to
execute in parallel.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D83635
2020-10-09 09:59:04 -07:00
Arthur Eubanks
a70ba1098b [FixIrreducible][NewPM] Port -fix-irreducible to NPM
In the NPM, a pass cannot depend on another non-analysis pass. So pin
the test that tests that -lowerswitch is run automatically to legacy PM.

Reviewed By: sameerds

Differential Revision: https://reviews.llvm.org/D89051
2020-10-09 09:22:09 -07:00
Arthur Eubanks
a2aedea790 [LoopInterchange][NewPM] Port -loop-interchange to NPM
Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D89058
2020-10-09 09:21:31 -07:00
Simon Pilgrim
99440aa5b8 [InstCombine] Support lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) uniform vector tests
FoldShiftByConstant is hardcoded for scalar/uniform outer shift amounts atm so that needs to be fixed first to support non-uniform cases
2020-10-09 16:54:46 +01:00
Simon Pilgrim
9735e12152 [InstCombine] Add lshr(trunc(lshr(x,c1)), c2) -> trunc(lshr(lshr(x,c1),c2)) vector tests 2020-10-09 16:54:46 +01:00
Simon Pilgrim
23e4436662 [InstCombine] commonShiftTransforms - add support for pow2 nonuniform constant vectors in srem fold
Note: we already fold srem to undef if any denominator vector element is undef.
2020-10-09 15:59:33 +01:00
Sanjay Patel
45b43d4e29 [InstCombine] allow vector splats for add+and with high-mask
There might be a better way to specify the pre-conditions,
but this is hopefully clearer than the way it was written:
https://rise4fun.com/Alive/Jhk3

  Pre: C2 < 0 && isShiftedMask(C2) && (C1 == C1 & C2)
  %a = and %x, C2
  %r = add %a, C1
  =>
  %a2 = add %x, C1
  %r = and %a2, C2
2020-10-09 10:39:11 -04:00
Simon Pilgrim
f2ba86985b [InstCombine] Add tests for X shift (A srem B) -> X shift (A and B-1) pow2 nonuniform constant vectors 2020-10-09 15:33:06 +01:00
Florian Hahn
9cb6002d04 [SCEV] Do not apply info from loop guards in AddRecs.
We cannot guarantee that the replacement expression is loop-invariant in
all AddRecs in the source expression. Use a rewriter that skips
AddRecExpr for now.

Fixes PR47776.
2020-10-09 14:47:26 +01:00
Simon Pilgrim
af14bbf8c2 [InstCombine] foldShiftOfShiftedLogic - add support for nonuniform constant vectors 2020-10-09 14:25:12 +01:00
Simon Pilgrim
6d17aca7ab [InstCombine] Add nonuniform/undef vector tests for shift(binop(shift(x,c1),y),c2) patterns 2020-10-09 13:42:11 +01:00
Simon Pilgrim
8f93933775 [InstCombine] visitTrunc - trunc(shl(X, C)) --> shl(trunc(X),trunc(C)) vector support
Annoyingly vectors aren't supported by shouldChangeType(), but we have precedents for always performing this on vector types (e.g. narrowBinOp).

Differential Revision: https://reviews.llvm.org/D89067
2020-10-08 22:07:51 +01:00
Simon Pilgrim
a0c0bac22a [InstCombine] Add additional trunc(shl(x,c)) -> shl(trunc(x),trunc(c)) vector tests 2020-10-08 21:11:48 +01:00
Sanjay Patel
43799dbe92 [InstCombine] allow vector splats for add+xor with low-mask
This can be allowed with undef elements too, but that can be another step:
https://alive2.llvm.org/ce/z/hnC4Z-
2020-10-08 15:53:38 -04:00
Sanjay Patel
75c88e4411 [InstCombine] remove unnecessary one-use check from add-xor transform
Pre-conditions seem to be optimal, but we don't need a use check
because we are only replacing an add with a sub.

https://rise4fun.com/Alive/hzN

  Pre: (~C1 | C2 == -1) && isPowerOf2(C2+1)
  %m = and i8 %x, C1
  %f = xor i8 %m, C2
  %r = add i8 %f, C3
  =>
  %r = sub i8 C2 + C3, %m
2020-10-08 15:08:51 -04:00
Sanjay Patel
5f6285202c [InstCombine] add tests for add-xor; NFC 2020-10-08 15:08:51 -04:00
Joseph Huber
92d68263b1 [OpenMP] Replace OpenMP RTL Functions With OMPIRBuilder and OMPKinds.def
Summary:
Replace the OpenMP Runtime Library functions used in CGOpenMPRuntimeGPU
for OpenMP device code generation with ones in OMPKinds.def and use
OMPIRBuilder for generating runtime calls. This allows us to
consolidate more OpenMP code generation into the OMPIRBuilder. Future
additions to the GPU runtime functions should now go in OMPKinds.def

Reviewers: jdoerfert

Subscribers: aaron.ballman cfe-commits guansong llvm-commits sstefan1 yaxunl

Tags: #OpenMP #LLVM #clang

Differential Revision: https://reviews.llvm.org/D88430
2020-10-08 14:00:22 -04:00
Sanjay Patel
43cd7a3a13 [InstCombine] allow vector splats for add+xor with signmask 2020-10-08 10:46:34 -04:00
Sanjay Patel
2ffd36d959 [InstCombine] add vector splat tests for add of signmask; NFC 2020-10-08 10:46:33 -04:00
Simon Pilgrim
86361d772e [InstCombine] matchFunnelShift - support non-uniform constant vector shift amounts (PR46895)
Complete basic PR46895 fixes by refactoring D87452/D88402 to allow us to match non-uniform constant values.

We still don't handle non-uniform vectors that contain undef elements, but that can wait until we have a decent generic mechanism for this.

Differential Revision: https://reviews.llvm.org/D88420
2020-10-08 12:56:27 +01:00
Markus Lavin
ad940d62c6 [DebugInfo] Improve dbg preservation in LSR.
Use SCEV to salvage additional @llvm.dbg.value that have turned into
referencing undef after transformation (and traditional
salvageDebugInfo). Before transformation compute SCEV for each
@llvm.dbg.value in the loop body and store it (along side its current
DIExpression). After transformation update those @llvm.dbg.value now
referencing undef by comparing its stored SCEV to the SCEV of the
current loop-header PHI-nodes. Allow match with offset by inserting
compensation code in the DIExpression.

Includes fix for the nullptr deref that caused the original commit
to be reverted in 9d63029770.

Fixes : PR38815

Differential Revision: https://reviews.llvm.org/D87494
2020-10-08 13:16:43 +02:00
Simon Pilgrim
c6978f43ad [InstCombine] matchRotate - add support for matching general funnel shifts with constant shift amounts (PR46896)
First step towards extending the existing rotation support to full funnel shift handling now that the backend legalization support has improved.

This enables us to match the shift by constant cases, which are pretty trivial to expand again if necessary.

D88420 will add non-uniform support for funnel shifts as well once its been finalized.

Differential Revision: https://reviews.llvm.org/D88834
2020-10-08 11:05:14 +01:00
Max Kazantsev
e4585c353d [Test] Add test showing that we fail to eliminate implied exit conditions 2020-10-08 16:36:28 +07:00
David Green
bc52cc0c7b [LV] Collect dead induction truncates
We currently collect the ICmp and Add from an induction variable,
marking them as dead so that vplan values are not created for them. This
extends that to include any single use trunk from the ICmp, which allows
the Add to more readily be removed too.

This can help with costing vplan nodes, as the ICmp and Add are more
reliably removed and are not double-counted.

Differential Revision: https://reviews.llvm.org/D88873
2020-10-08 08:28:58 +01:00
Max Kazantsev
04c0255443 Return "[SCEV] Prove implicaitons via AddRec start"
The initial version of the patch was reverted because it missed the check that
the predicate being proved is actually guarded by this check on 1st iteration.
If it was not executed on 1st iteration (but possibly executes after that), then
it is incorrect to use reasoning about IV start to prove it.

Added the test where the miscompile was seen. Unfortunately, my attempts
to reduce it with bugpoint did not succeed; it can further be reduced when
we understand how to do it without losing the initial bug's notion.

Returning assuming the miscompiles are now gone.

Differential Revision: https://reviews.llvm.org/D88208
2020-10-08 11:15:35 +07:00
Reid Kleckner
4191643790 Port StripGCRelocates pass to NPM
Fixes one test under NPM

Differential Revision: https://reviews.llvm.org/D88766
2020-10-07 14:41:29 -07:00
Reid Kleckner
1ec8cebaf1 [NPM] Port strip nonlinetable debuginfo pass to the new pass manager
Fixes a few tests in llvm/test/Transforms/Utils.

Differential Revision: https://reviews.llvm.org/D88762
2020-10-07 14:35:36 -07:00
Simon Pilgrim
3a1cb2438c [InstCombine] Add checks for and(logicalshift(zext(x),undef),y) cases
Prep work before some cleanup in narrowMaskedBinOp
2020-10-07 20:59:31 +01:00
Florian Hahn
fde77a4988 [LAA] Use DL to get element size for bound computation.
Currently LAA uses getScalarSizeInBits to compute the size of an element
when computing the end bound of an access.

This does not work as expected for pointers to pointers, because
getScalarSizeInBits will return 0 for pointer types.

By using DataLayout to get the size of the element we can also correctly
handle pointer element types.

Note the changes to the existing test, which seems to also use the wrong
offset for the end.

Fixes PR47751.

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D88953
2020-10-07 18:57:07 +01:00
Amara Emerson
59c2440372 [llvm][mlir] Promote the experimental reduction intrinsics to be first class intrinsics.
This change renames the intrinsics to not have "experimental" in the name.

The autoupgrader will handle legacy intrinsics.

Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html

Differential Revision: https://reviews.llvm.org/D88787
2020-10-07 10:36:44 -07:00