1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
Commit Graph

214320 Commits

Author SHA1 Message Date
Cullen Rhodes
9d799468bf [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D100377
2021-04-19 11:01:34 +00:00
Dmitry Preobrazhensky
3b9392fa2f [AMDGPU][MC] Corrected parsing of carry in/out operands in VOP3
Disabled constants as carry in/out operands. See bug 48711.

Differential Revision: https://reviews.llvm.org/D100642
2021-04-19 13:42:31 +03:00
Roman Lebedev
438e79af49 [X86][CostModel] X86TTIImpl::getShuffleCost(): subvector insertions are cheap
This is similar to the subvector extractions,
except that the 0'th subvector isn't free to insert,
because we generally don't know whether or not
the upper elements need to be preserved:
https://godbolt.org/z/rsxP5W4sW

This is needed to avoid regressions in D100684

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D100698
2021-04-19 13:24:58 +03:00
Fraser Cormack
d35a61570c [RISCV] Lower vector shuffles to vrgather operations
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.

For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.

On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.

For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.

To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D100549
2021-04-19 11:13:13 +01:00
Kerry McLaughlin
23bab76d04 [NFC] Add tests for scalable vectorization of loops with in-order reductions
D98435 added support for in-order reductions and included tests for fixed-width
vectorization with the -enable-strict-reductions flag.

This patch adds similar tests to verify support for scalable vectorization of loops
with in-order reductions.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D100385
2021-04-19 11:15:55 +01:00
OCHyams
9028e71ae0 [DebugInfo] Replace debug uses in replaceUsesOutsideBlock
Value::replaceUsesOutsideBlock doesn't replace debug uses which leads to an
unnecessary reduction in variable location coverage. Fix this, add a unittest for
it, and add a regression test demonstrating the change through instcombine's
replacedSelectWithOperand.

Reviewed By: djtodoro

Differential Revision: https://reviews.llvm.org/D99169
2021-04-19 11:06:53 +01:00
OCHyams
fbe80b0c21 [DebugInfo] Move the findDbg* functions into DebugInfo.cpp
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.

D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.

Reviewed By: dblaikie, rnk

Differential Revision: https://reviews.llvm.org/D100632
2021-04-19 10:30:25 +01:00
Clement Courbet
09f6560512 [llvm-exegesis] Honor -mcpu in analysis mode.
This is useful to set the baseline model for an unknown CPU.

Fixes PR50013.

Differential Revision: https://reviews.llvm.org/D100743
2021-04-19 10:44:28 +02:00
David Sherwood
40e085b6bd [CodeGen] Improve code generation for clamping of constant indices with scalable vectors
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.

Differential Revision: https://reviews.llvm.org/D100639
2021-04-19 08:34:17 +01:00
Serguei Katkov
3c8b0754c7 [GreedyRA ORE] Add stats for copy of virtual registers.
Greedy RA adds copies of virtual registers when splitting live interval.
This stat might be useful.

Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100017
2021-04-19 12:43:44 +07:00
Serguei Katkov
353cafc332 [Greedy RA] Add a check to MachineVerifier
If Virtual Register is alive in landing pad its def must be
before the call causing the exception or it should be statepoint instruction itself and
in this case def actually means the relocation of gc pointer and is alive in
landing pad.

The test shows the triggering this check for an option under development
use-registers-for-gc-values-in-landing-pad which is off by default until
it is functionally correct.

Reviewers: reames, void, jyknight, nickdesaulniers, efriedma, arsenm, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100525
2021-04-19 12:31:18 +07:00
Evgeniy Brevnov
8e314ca696 [CVP] processCallSite returns wrong status
Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true'  even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic  which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100538
2021-04-19 12:13:22 +07:00
Xun Li
8ba502ea6f Revert "[Coroutines] Set presplit attribute in Clang instead of CoroEarly pass"
This reverts commit fa6b54c44ab1d5f579304eadb7ac8bd7e72d0e77.
The commited patch broke mlir tests. It seems that mlir tests depend on coroutine function properties set in CoroEarly pass.
2021-04-18 17:22:28 -07:00
Craig Topper
59da2ddc9a [TableGen] Pass SmallVector to union_modes instead of returning a std::vector.
The number of modes is small so this should avoid a heap allocation.

Also replace std::set with SmallSet.
2021-04-18 15:59:52 -07:00
Xun Li
fcafc059cd [Coroutines] Set presplit attribute in Clang instead of CoroEarly pass
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

To fix this, we set the attributes in the Clang front-end instead of in CoroEarly pass.

Reviewed By: rjmccall, ChuanqiXu

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 15:41:09 -07:00
Xun Li
72ec560e95 Revert "[Coroutines] Move CoroEarly pass to before AlwaysInliner"
This reverts commit 2b50f5a4343f8fb06acaa5c36355bcf58092c9cd.
Forgot to update the description of the commit to sync with phabricator. Going to redo the commit.
2021-04-18 15:38:19 -07:00
Xun Li
c42f8ec702 [Coroutines] Move CoroEarly pass to before AlwaysInliner
Presplit coroutines cannot be inlined. During AlwaysInliner we check if a function is a presplit coroutine, if so we skip inlining.
The presplit coroutine attributes are set in CoroEarly pass.
However in O0 pipeline, AlwaysInliner runs before CoroEarly, so the attribute isn't set yet and will still inline the coroutine.
This causes Clang to crash: https://bugs.llvm.org/show_bug.cgi?id=49920

Differential Revision: https://reviews.llvm.org/D100282
2021-04-18 14:54:04 -07:00
Martin Storsjö
86db67416f [lit] Fix the return code for "not not" after evaluating "not" internally
This fixes cases where "not not <command>" is supposed to return
only the error codes 0 or 1, but after efee57925c3f46c74c6697,
it passed the original error code through.

This was visible on AIX in the shtest-output-printing.py testcase,
where 'wc' returns 2, while it returns 1 on other platforms, and the
test required "not not" to normalize it to 1.
2021-04-19 00:37:13 +03:00
Craig Topper
fdef0cf249 [TableGen] Use MachineValueTypeSet in place of SmallSet.
MachineValueTypeSet is effectively a std::bitset<256>. This allows
us quickly insert into the set and check if a type is in the set.
2021-04-18 13:38:30 -07:00
Nikita Popov
bf2498bc3f [LoopDeletion] Add test for PR49967 (NFC)
Test case for a SCEV invalidation bug caused by D100264, which
has since been reverted.
2021-04-18 22:08:51 +02:00
Craig Topper
bf5c787068 [TableGen] Use range-based for loop. NFC 2021-04-18 12:41:09 -07:00
Roman Lebedev
5a174914f6 Revert "[SCEV] Model ashr exact x, C as (abs(x) EXACT/u (1<<C)) * signum(x)"
As being discussed in https://reviews.llvm.org/D100721,
this modelling is lossy, we can't reconstruct `ash`/`ashr exact`
from it, which means that whenever we actually expand the IR,
we've just pessimized the code..

It would be good to model this pattern, after all it comes up every time
you want to compute a distance between two pointers, but not at this cost.

This reverts commit ec54867df5e7f20e12146e628af34f0384308bcb.
2021-04-18 16:26:45 +03:00
xgupta
79c9c7433b [Docs] Correct Boehm collector weblink in GarbageCollection.rst 2021-04-18 17:30:17 +05:30
LLVM GN Syncbot
60e1274472 [gn build] Port 01ace074fcb6 2021-04-18 11:35:28 +00:00
Florian Hahn
a77ecd76f3 [IndVarSimplify] Add test requiring ashr expansion.
Add test cases showing large ashr expansion during IndVarSimplify
after ec54867df5e7.
2021-04-18 12:28:49 +01:00
Roman Lebedev
f06347058d [NFC][X86][CostModel] Rewrite load_store.ll
Test SSE41, since that added float/i64/i32/i8 inserts/extracts.
Don't forget to test vectors of pointers.
Do test byte-aligned loads/stores.
Fixup test coverage to be rather more exhaustive,
testing all reasonable element sizes vs element counts permutations
that fit up to witin ZMM.
2021-04-18 11:12:36 +03:00
Roman Lebedev
6861dd5bf7 [NFC][LoopVectorize] Autogenerate check lines in X86/gather_scatter.ll test 2021-04-18 10:26:16 +03:00
Juneyoung Lee
49d8db7466 Update InstCombine to use undef matcher instead
This is a patch to use m_Undef() matcher instead of isa<UndefValue>().

As suggested in D100122, this update is separately committed.
2021-04-18 11:05:36 +09:00
Juneyoung Lee
e279a5783d Update m_Undef to match vectors/aggrs with undefs and poisons mixed
This fixes https://reviews.llvm.org/D93990#2666922
by teaching `m_Undef` to match vectors/aggrs with poison elements.

As suggested, fixes in InstCombine files to use the `m_Undef` matcher instead
of `isa<UndefValue>` will be followed.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100122
2021-04-18 10:57:04 +09:00
Florian Hahn
bd2bad88ac [ADT] Update RPOT to work with specializations of different types.
At the moment, ReversePostOrderTraversal performs a post-order walk on
the entry node of the passed in graph, rather than the graph type
itself.

If GT::NodeRef is the same as GraphT, everything works as expected and
this is the case for the current uses in-tree. But it does not work as
expected if GraphT != GT::NodeRef. In that case, we either fail to build
(if there is no GraphTrait specialization for GT:NodeRef) or we pick the
GraphTrait specialization for GT::NodeRef, instead of the specialization
of GraphT.

Both the depth-first and post-order iterators pick the expected
specalization and this patch updates ReversePostOrderTraversal to
delegate to po_begin & po_end to pick the right specialization, rather
than forcing using GraphTraits<GT::NodeRef>, by first getting the entry
node.

This makes `ReversePostOrderTraversal<Graph<6>> RPOT(G);` build and
work as expected in the test.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D100169
2021-04-17 20:45:04 +01:00
Nikita Popov
ac5625799c [LoopUnroll] Regenerate test checks (NFC) 2021-04-17 20:59:20 +02:00
Nikita Popov
cdf04cd5e0 [LoopUnroll] Make some tests more robust (NFC)
Replace branch on undef by branch on unknown condition.
2021-04-17 20:59:20 +02:00
Lang Hames
c2126ffe7e [JITLink] Add testcase that was accidentally left out of 19e402d2b34. 2021-04-17 11:55:55 -07:00
Alexandre Ganea
5098c69ed1 [Support] ThreadPool tests: silence warning unused variable 'It' 2021-04-17 14:22:50 -04:00
Craig Topper
e51b3ceb31 [TableGen] Remove local SmallSet from TypeSetByHwMode::insert.
This keeps track of which modes are in VVT so we can find out
if a mode is missing later. But we can just ask VVT whether it
has a particular mode.
2021-04-17 10:48:57 -07:00
Florian Hahn
dce8472e88 [ADT] Take graph as const & in some post-order iterators (NFC).
This patch updates a couple of functions that unnecessarily took the
input graph by value, when it was not needed. They can take the graph by
const-reference instead, which does not require GraphT to provide a copy
constructor.

Split off from D100169.
2021-04-17 17:05:24 +01:00
Yaxun (Sam) Liu
2596649ce8 [AMDGPU] Add GlobalDCE before internalization pass
The internalization pass only internalizes global variables
with no users. If the global variable has some dead user,
the internalization pass will not internalize it.

To be able to internalize global variables with dead
users, a global dce pass is needed before the
internalization pass.

This patch adds that.

Reviewed by: Artem Belevich, Matt Arsenault

Differential Revision: https://reviews.llvm.org/D98783
2021-04-17 11:25:25 -04:00
Nikita Popov
875a81f11e [LICM] Add more tests for promotion and capture (NFC)
We could optimize the first case, as the pointer is captured only
after the loop.
2021-04-17 16:57:15 +02:00
Florian Hahn
b26528306e [SimplifyCFG] Skip dbg intrinsics when checking for branch-only BBs.
Debug intrinsics are free to hoist and should be skipped when looking
for terminator-only blocks. As a consequence, we have to delegate to the
main hoisting loop to hoist any dbg intrinsics instead of jumping to the
terminator case directly.

This fixes PR49982.

Reviewed By: lebedev.ri

Differential Revision: https://reviews.llvm.org/D100640
2021-04-17 15:17:50 +01:00
Nikita Popov
174ac0349e [Inline] Don't add noalias metadata to inaccessiblememonly calls
It will not do anything useful for them, as we already know that
they don't modref with any accessible memory.

In particular, this prevents noalias metadata from being placed
on noalias.scope.decl intrinsics. This reduces the amount of
metadata needed, and makes it more likely that unnecessary decls
can be eliminated.
2021-04-17 14:56:13 +02:00
Simon Pilgrim
47aaabf409 [Support] AbsoluteDifference - add brackets to appease static analyzer warning. NFCI. 2021-04-17 13:47:02 +01:00
Serge Guelton
b2fe6dcbc2 Normalize interaction with boolean attributes
Such attributes can either be unset, or set to "true" or "false" (as string).
throughout the codebase, this led to inelegant checks ranging from

        if (Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

to

        if (Fn->hasAttribute("no-jump-tables") && Fn->getFnAttribute("no-jump-tables").getValueAsString() == "true")

Introduce a getValueAsBool that normalize the check, with the following
behavior:

no attributes or attribute set to "false" => return false
attribute set to "true" => return true

Differential Revision: https://reviews.llvm.org/D99299
2021-04-17 08:17:33 +02:00
Craig Topper
da04af3736 [TableGen] Replace two SmallDenseSets with SmallSets.
The key here is HwMode indices. They're going to be small numbers,
contiguous, and only a few different values. I don't think we need
to go through the SmallDenseSet hashing.

A BitVector would be even better, but we don't have the upper
bound here.
2021-04-16 17:57:53 -07:00
Nemanja Ivanovic
d506c40420 [PowerPC] Minor improvement for insert_vector_elt codegen
For v2f64, all VSX subtargets can insert an element with a single
XXPERMDI.
2021-04-16 18:52:37 -05:00
Philip Reames
e84c42802f [inferattrs] Don't infer lib func attributes for nobuiltin functions
If we have a nobuiltin function, we can't assume we know anything about the implementation.

I noticed this when tracing through a log from an in the wild miscompile (https://github.com/emscripten-core/emscripten/issues/9443) triggered after 8666463.  We were incorrectly assuming that a custom allocator could not free.  (It's not clear yet this is the only problem in said issue.)

I also noticed something similiar mentioned in the commit message of ab243e when scrolling back through history.  Through, from what I can tell, that commit fixed symptom not root cause.

The interface we have for library function detection is extremely error prone, but given the interaction between ``nobuiltin`` decls and ``builtin`` callsites, it's really hard to imagine something much cleaner.  I may iterate on that, but it'll be invasive enough I didn't want to hold an obvious functional fix on it.
2021-04-16 15:36:15 -07:00
Nico Weber
af14859209 [gn build] (manually) port ca6751043d88 better 2021-04-16 18:16:29 -04:00
Craig Topper
d27dd156eb [TableGen] Run GenerateVariants before ExpandHwModeBasedTypes.
A large portion of the patterns are duplicated for HwMode on RISCV.
If we expand HwMode first, we need to check nearly twice as many
patterns for variants. HwModes shouldn't affect whether a variant
is valid so we should be able to expand after.

This also reduces the RISCV isel table by 539 bytes due to factoring
working better on this pattern order. Unfortunately it increases
Hexagon table size by ~50 bytes. But I think this is a reasonable
trade.
2021-04-16 15:05:33 -07:00
Nico Weber
b82ca2a3f3 [gn build] (manually) port ca6751043d88 2021-04-16 18:03:44 -04:00
Philip Reames
9e6192bcdf [funcattrs] Add the maximal set of implied attributes to definitions
Have funcattrs expand all implied attributes into the IR. This expands the infrastructure from D100400, but for definitions not declarations this time.

Somewhat subtly, this mostly isn't semantic. Because the accessors did the inference, any client which used the accessor was already getting the stronger result. Clients that directly checked presence of attributes (there are some), will see a stronger result now.

The old behavior can end up quite confusing for two reasons:
* Without this change, we have situations where function-attrs appears to fail when inferring an attribute (as seen by a human reading IR), but that consuming code will see that it should have been implied. As a human trying to sanity check test results and study IR for optimization possibilities, this is exceeding error prone and confusing. (I'll note that I wasted several hours recently because of this.)
* We can have transforms which trigger without the IR appearing (on inspection) to meet the preconditions. This change doesn't prevent this from happening (as the accessors still involve multiple checks), but it should make it less frequent.

I'd argue in favor of deleting the extra checks out of the accessors after this lands, but I want that in it's own review as a) it's purely stylistic, and b) I already know there's some disagreement.

Once this lands, I'm also going to do a cleanup change which will delete some now redundant duplicate predicates in the inference code, but again, that deserves to be a change of it's own.

Differential Revision: https://reviews.llvm.org/D100226
2021-04-16 14:22:19 -07:00
serge-sans-paille
5678d281d2 Simplify BitVector code
Instead of managing memory by hand, delegate it to std::vector. This makes the
code much simpler, and also avoids repeatedly computing the storage size.

According to valgrind --tool=callgrind, this also slightly decreases the
instruction count, but by a small margin.

This is a recommit of 82f0e3d3ea6bf927e3397b2fb423abbc5821a30f with one usage
fixed in llvm/lib/CodeGen/RegisterScavenging.cpp.

Not the slight API change: BitVector::clear() now has the same behavior as any
other container: it does not free memory, but indeed sets the size of the
BitVector to 0. It is thus incorrect to access its content right afterwards, a
scenario which wasn't enforced in previous implementation.

Differential Revision: https://reviews.llvm.org/D100387
2021-04-16 22:48:33 +02:00