This change fixes a latent bug which was exposed by a change currently in review (https://reviews.llvm.org/D99802#2685032).
The story on this is a bit involved. Without this change, what ended up happening with the pending review was that we'd strip attributes off intrinsics, and then selectiondag would fail to lower the intrinsic. Why? Because the lowering of the intrinsic relies on the presence of the readonly attribute. We don't have a matcher to select the case where there's a glue node needed.
Now, on the surface, this still seems like a codegen bug. However, here it gets fun. I was unable to reproduce this with a standalone test at all, and was pretty much struck until skatkov provided the critical detail. This reproduces only when RS4GC and codegen are run in the same process and context. Why? Because it turns out we can't roundtrip the stripped attribute through serialized IR!
We'll happily print out the missing attribute, but when we parse it back, the auto-upgrade logic has a side effect of blindly overwriting attributes on intrinsics with those specified in Intrinsics.td. This makes it impossible to exercise SelectionDAG from a standalone test case.
At this point, I decided to treat this an RS4GC bug as a) we don't need to strip in this case, and b) I could write a test which shows the correct behavior to ensure this doesn't break again in the future.
As an aside, I'd originally set out to handle libfuncs too - since in theory they might have the same issues - but backed away quickly when I realized how the semantics of builtin, nobuiltin, and no-builtin-x all interacted. I'm utterly convinced that no part of the optimizer handles that correctly, and decided not to open that can of worms here.
We previously used splats instead of v128.const to materialize vector constants
because V8 did not support v128.const. Now that V8 supports v128.const, we can
use v128.const instead. Although this increases code size, it should also
increase performance (or at least require fewer engine-side optimizations), so
it is an appropriate change to make.
Differential Revision: https://reviews.llvm.org/D100716
This patch removes -fixed-abi check for indirect calls
and also adds queue-ptr which is required for indirect calls to work.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D100633
Comparisons to zero or one after cset instructions can be safely
removed in examples like:
cset w9, eq cset w9, eq
cmp w9, #1 ---> <removed>
b.ne .L1 b.ne .L1
cset w9, eq cset w9, eq
cmp w9, #0 ---> <removed>
b.ne .L1 b.eq .L1
Peephole optimization to detect suitable cases and get rid of that
comparisons added.
Differential Revision: https://reviews.llvm.org/D98564
During store promotion, we check whether the pointer was captured
to exclude potential reads from other threads. However, we're only
interested in captures before or inside the loop. Check this using
PointerMayBeCapturedBefore against the loop header.
Differential Revision: https://reviews.llvm.org/D100706
As noted in the FIXME there's a sort of agreement that the any
extra bits stored will be 0.
The generated code is pretty terrible. I was really hoping we
could use a tail undisturbed trick, but tail undisturbed no
longer applies to masked destinations in the current draft
spec.
Fingers crossed that it isn't common to do this. I doubt IR
from clang or the vectorizer would ever create this kind of store.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D100618
This is a partial port of AArch64TargetLowering::LowerCTPOP.
This custom lowering tries to uses NEON instructions to give a more efficient
CTPOP lowering when possible.
In the non-NEON/noimplicitfloat case, this should use the generic lowering
(see: https://godbolt.org/z/GcaPvWe4x). I think that's worth implementing after
implementing the widening code for s16/s8 though.
Differential Revision: https://reviews.llvm.org/D100399
These constraints are machine agnostic; there's no reason to handle
these per-arch. If arches don't support these constraints, then they
will fail elsewhere during instruction selection. We don't need virtual
calls to look these up; TargetLowering::getInlineAsmMemConstraint should
only be overridden by architectures with additional unique memory
constraints.
Reviewed By: echristo, MaskRay
Differential Revision: https://reviews.llvm.org/D100416
It turns out we actually import a bunch of selection code for intrinsics. The
imported code checks that the register banks on the G_INTRINSIC instruction
are correct. If so, it goes ahead and selects it.
This adds code to AArch64RegisterBankInfo to allow us to correctly determine
register banks on intrinsics which have known register bank constraints.
For now, this only handles @llvm.aarch64.neon.uaddlv. This is necessary for
porting AArch64TargetLowering::LowerCTPOP.
Also add a utility for getting the intrinsic ID from a G_INTRINSIC instruction.
This seems a little nicer than having to know about how intrinsic instructions
are structured.
Differential Revision: https://reviews.llvm.org/D100398
I guess this case hasn't come up thus far, and i'm not sure if it can
really happen for the existing usages, thus no test in *this* commit.
But, the following commit adds test coverage,
there we'd expirience a crash without this fix.
Currently, InsertNoopCastOfTo() would implicitly insert that cast,
but now that we have SCEVPtrToIntExpr, i'm hoping we could stop
InsertNoopCastOfTo() from doing that. But first all users must be fixed.
Move <string> include to ImportedFunctionsInliningStatistics.cpp and add missing <memory> include as we have explicit uses of std::unique_ptr in the header.
These are needed when buildling `clang-format` in a clean build dir.
It's a bit unfortunate that clang's lib/Basic depends on these
random TableGen targets. In the CMake build, this is less visible
because I think all llvm-tblgen's complete before all compiles there
(not sure though).
protectMappedMemory no longer returns an error message, so we don't need std::string - I've fixed an unnecessary doxygen entry as well (oddly I wasn't seeing a Wdocumentation warning)
This patch corrects more instances of text files being opened as text.
Reviewed By: Jonathan.Crowther
Differential Revision: https://reviews.llvm.org/D100654
Remove the MachineDCE pass after the first SIFoldOperands pass now
that SIFoldOperands deletes its own dead instructions.
Reapply after fixing dependent change D100188.
Differential Revision: https://reviews.llvm.org/D100189
This is fairly cheap to implement and means less work for future
passes like MachineDCE.
Reapply with a fix for using InstToErase after it had been erased.
Differential Revision: https://reviews.llvm.org/D100188
This is similar to the subvector extractions,
except that the 0'th subvector isn't free to insert,
because we generally don't know whether or not
the upper elements need to be preserved:
https://godbolt.org/z/rsxP5W4sW
This is needed to avoid regressions in D100684
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D100698
This patch extends the lowering of RVV fixed-length vector shuffles to
avoid the default stack expansion and instead lower to vrgather
instructions.
For "permute"-style shuffles where one vector is swizzled, we can lower
to one vrgather. For shuffles involving two vector operands, we lower to
one unmasked vrgather (or splat, where appropriate) followed by a masked
vrgather which blends in the second half.
On occasion, when it's not possible to create a legal BUILD_VECTOR for
the indices, we use vrgatherei16 instructions with 16-bit index types.
For 8-bit element vectors where we may have indices over 255, we have a
fairly blunt fallback to the stack expansion to avoid custom-splitting
of the vector types.
To enable the selection of masked vrgather instructions, this patch
extends the various RISCVISD::VRGATHER nodes to take a passthru operand.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D100549
D98435 added support for in-order reductions and included tests for fixed-width
vectorization with the -enable-strict-reductions flag.
This patch adds similar tests to verify support for scalable vectorization of loops
with in-order reductions.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D100385
Value::replaceUsesOutsideBlock doesn't replace debug uses which leads to an
unnecessary reduction in variable location coverage. Fix this, add a unittest for
it, and add a regression test demonstrating the change through instcombine's
replacedSelectWithOperand.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D99169
Move the findDbg* functions into lib/IR/DebugInfo.cpp from
lib/Transforms/Utils/Local.cpp.
D99169 adds a call to a function (findDbgUsers) that lives in
lib/Transforms/Utils/Local.cpp (LLVMTransformUtils) from lib/IR/Value.cpp
(LLVMCore). The Core lib doesn't include TransformUtils. The builtbots caught
this here: https://lab.llvm.org/buildbot/#/builders/109/builds/12664. This patch
moves the function, and the 3 similar ones for consistency, into DebugInfo.cpp
which is part of LLVMCore.
Reviewed By: dblaikie, rnk
Differential Revision: https://reviews.llvm.org/D100632
When trying to clamp a constant index into a scalable vector we can
test if the index is less than the minimum number of elements in the
vector. If so, we can simply return the index because we know it is
guaranteed to fit inside the vector.
Differential Revision: https://reviews.llvm.org/D100639
Greedy RA adds copies of virtual registers when splitting live interval.
This stat might be useful.
Reviewers: reames, MatzeB, anemet, thegameg
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100017
If Virtual Register is alive in landing pad its def must be
before the call causing the exception or it should be statepoint instruction itself and
in this case def actually means the relocation of gc pointer and is alive in
landing pad.
The test shows the triggering this check for an option under development
use-registers-for-gc-values-in-landing-pad which is off by default until
it is functionally correct.
Reviewers: reames, void, jyknight, nickdesaulniers, efriedma, arsenm, rnk
Reviewed By: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D100525
Recently processMinMaxIntrinsic has been added and we started to observe a number of analysis get invalidated after CVP. The problem is CVP conservatively returns 'true' even if there were no modifications to IR. I found one more place besides processMinMaxIntrinsic which has the same problem. I think processMinMaxIntrinsic and similar should better have boolean return status to prevent similar issue reappear in future.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D100538