Teach SCEV about the @loop.decrement.reg intrinsic, which has exactly the same
semantics as a sub expression. This allows us to query hardware-loops, which
contain this @loop.decrement.reg intrinsic, so that we can calculate iteration
counts, exit values, etc. of hardwareloops.
This "int_loop_decrement_reg" intrinsic is defined as "IntrNoDuplicate". Thus,
while hardware-loops and tripcounts now become analysable by SCEV, this
prevents the usual loop transformations from applying transformations on
hardware-loops, which is what we want at this point, for which I have added
test cases for loopunrolling and IndVarSimplify and LFTR.
Differential Revision: https://reviews.llvm.org/D71563
- Update documentation now that the move to monorepo has been made
- Do not tie compiler extension testing to LLVM_BUILD_EXAMPLES
- No need to specify LLVM libraries for plugins
- Add NO_MODULE option to match Polly specific requirements (i.e. building the
module *and* linking it statically)
- Issue a warning when building the compiler extension with
LLVM_BYE_LINK_INTO_TOOLS=ON, as it modifies the behavior of clang, which only
makes sense for testing purpose.
Still mark llvm/test/Feature/load_extension.ll as XFAIL because of a
ManagedStatic dependency that's going to be fixed in a seperate commit.
Differential Revision: https://reviews.llvm.org/D72327
PowerPC uses a dedicated method to check if the machine instr is
predicable by opcode. However, there's a bit `isPredicable` in instr
definition. This patch removes the method and set the bit only to
opcodes referenced in it.
Differential Revision: https://reviews.llvm.org/D71921
Memory instruction widening recipes use the pointer operand of their load/store
ingredient for generating the needed GEPs, making it difficult to feed these
recipes with pointers based on other ingredients or none at all.
This patch modifies these recipes to use a VPValue for the pointer instead, in
order to reduce ingredient def-use usage by ILV as a step towards full
VPlan-based def-use relations. The recipes are constructed with VPValues bound
to these ingredients, maintaining current behavior.
Differential revision: https://reviews.llvm.org/D70865
Currently running the xray tools generates a number of errors:
$ ./bin/llvm-xray
: for the -k option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -d option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -o option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -f option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -s option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -r option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -p option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
: for the -m option: cl::alias must not have cl::sub(), aliased option's cl::sub() will be used!
<snip>
Patch by Ryan Mansfield.
Differential Revision: https://reviews.llvm.org/D69386
down to pass builder in ltobackend.
Currently CodeGenOpts like UnrollLoops/VectorizeLoop/VectorizeSLP in clang
are not passed down to pass builder in ltobackend when new pass manager is
used. This is inconsistent with the behavior when new pass manager is used
and thinlto is not used. Such inconsistency causes slp vectorization pass
not being enabled in ltobackend for O3 + thinlto right now. This patch
fixes that.
Differential Revision: https://reviews.llvm.org/D72386
If an SGPR vector is indexed with a VGPR, the actual indexing will be
done on the SGPR and produce an SGPR. A copy needs to be inserted
inside the waterwall loop to the VGPR result.
For arguments that are not expected to be materialized with
G_CONSTANT, this was emitting predicates which could never match. It
was first adding a meaningless LLT check, which would always fail due
to the operand not being a register.
Infer the cases where a literal should check for an immediate operand,
instead of a register This avoids needing to invent a special way of
representing timm literal values.
Also handle immediate arguments in GIM_CheckLiteralInt. The comments
stated it handled isImm() and isCImm(), but that wasn't really true.
This unblocks work on the selection of all of the complicated AMDGPU
intrinsics in future commits.
The current implementation assumes there is an instruction associated
with the transform, but this is not the case for
timm/TargetConstant/immarg values. These transforms should directly
operate on a specific MachineOperand in the source
instruction. TableGen would assert if you attempted to define an
equivalent GISDNodeXFormEquiv using timm when it failed to find the
instruction matcher.
Specially recognize SDNodeXForms on timm, and pass the operand index
to the render function.
Ideally this would be a separate render function type that looks like
void renderFoo(MachineInstrBuilder, const MachineOperand&), but this
proved to be somewhat mechanically painful. Add an optional operand
index which will only be passed if the transform should only look at
the one source operand.
Theoretically it would also be possible to only ever pass the
MachineOperand, and the existing renderers would check the parent. I
think that would be somewhat ugly for the standard usage which may
want to inspect other operands, and I also think MachineOperand should
eventually not carry a pointer to the parent instruction.
Use it in one sample pattern. This isn't a great example, since the
transform exists to satisfy DAG type constraints. This could also be
avoided by just changing the MachineInstr's arbitrary choice of
operand type from i16 to i32. Other patterns have nontrivial uses, but
this serves as the simplest example.
One flaw this still has is if you try to use an SDNodeXForm defined
for imm, but the source pattern uses timm, you still see the "Failed
to lookup instruction" assert. However, there is now a way to avoid
it.
Only PPC seems to be using it, and only checks some simple cases and
doesn't distinguish between FP. Just switch to using LLT to simplify
use from GlobalISel.
As an intermediate step, some TLI functions can be converted to using
LLT instead of MVT. Move this somewhere out of GlobalISel so DAG
functions can use these.
When these arguments are broken down by the EVT based callbacks, the
pointer information is lost. Hack around this by coercing the register
types to be the expected pointer element type when building the
remerge operations.
This should be legal, but will require future selection work. 16-bit
shift amounts were already removed from being legal, but this didn't
adjust the transformation rules.
Summary:
In commit b91f239485fb7bb8d29be3e0b60660a2de7570a9 I updated the
MipsDelaySlotFiller to skip BUNDLE instructions.
However, in addition to not considering BUNDLE instructions for the delay
slot, we also need to ensure that the register def-use information is
updated. Not updating this information caused run-time crashes (when using
the out-of-tree CHERI backend) since later definitions could be overwritten
with earlier register values.
Reviewers: atanasyan
Reviewed By: atanasyan
Differential Revision: https://reviews.llvm.org/D72254
The ONE expansion calls OGT/OLT libcalls which will signal for QNAN.
The UEQ expansion uses unord and eq libcalls which won't signal.
We should probably use those libcalls for ONE with appropriate
logic.
Quiet OGT/OLT/OLE/OGE have similar issue, but not sure how to fix
those yet.
This adds support for selecting a large chunk of the load/store *roW patterns.
This is pretty much a straight port of AArch64DAGToDAGISel::SelectAddrModeWRO
into GISel. The code is very similar to the XRO code. The main difference is
that in the *roW patterns, we want to try and fold in an extend, and *possibly*
a shift along with it. A good portion of this patch is refactoring the existing
XRO code.
- Add selectAddrModeWRO
- Factor out the code from selectAddrModeShiftedExtendXReg which is used by both
selectAddrModeXRO and selectAddrModeWRO into selectExtendedSHL.
This is similar to the function of the same name in AArch64DAGToDAGISel.
- Add support for extends to the factored out code in selectExtendedSHL.
- Teach getExtendTypeForInst how to handle AND masks that are intended to be
used in loads/stores (necessary for this addressing mode.)
- Make getExtendTypeForInst not static because moving it made an annoying diff
and I wanted to have the WRO/XRO functions close to each other while I was
writing the code.
Differential Revision: https://reviews.llvm.org/D72426
Summary:
Extend D71677 to apply to all branch-target operands, rather than special-casing call instructions.
Also add a regression test for llvm.org/PR44272, since this finishes fixing it.
Reviewers: thakis, rnk
Reviewed By: thakis
Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D72417
The patch gives out the details of the znver2 scheduler model.
There are few improvements with respect to execution units, latencies and
throughput when compared with znver1.
The tests that were present for znver1 for llvm-mca tool were replicated.
The latencies, execution units, timeline and throughput information are updated for znver2.
Reviewers: craig.topper, Simon Pilgrim
Differential Revision: https://reviews.llvm.org/D66088
Fix a conditional that guarded code for execution only on 32-bit ELF by
checking that the Subtarget was not 64-bit and not-Darwin. By adding a new
target ABI (AIX), the condition is no longer correct. This code is dead for
AIX, due to a 'report_fatal_error' for thread local storage usage earlier in the
pipeline, but needs to be modifed as part of Darwins removal from the
PowerPC backend.
If we're doing a compare that only tests the sign bit and only the sign bit is demanded, we can just bypass the node. This removes one of the blend dependencies in our v2i64->v2f32 uint_to_fp codegen on pre-sse4.2 targets.
Differential Revision: https://reviews.llvm.org/D72356
SystemZDAGToDAGISel::Select will attempt to split logical instruction
with a large immediate constant. This must not happen if the result
matches one of the z15 combined operations, so the code checks for
those. However, one of them was missed, causing invalid code to
be generated in the test case for PR44496.
pass.
Summary: This patch changes LoopUnrollAndJamPass to a function pass, and
keeps the loops traversal order same as defined in
FunctionToLoopPassAdaptor LoopPassManager.h.
The next patch will change the loop traversal to outer to inner order,
so more loops can be transform.
Discussion in llvm-dev mailing list:
https://groups.google.com/forum/#!topic/llvm-dev/LF4rUjkVI2g
Reviewer: dmgreen, jdoerfert, Meinersbur, kbarton, bmahjour, etiotto
Reviewed By: dmgreen
Subscribers: hiraditya, zzheng, llvm-commits
Tag: LLVM
Differential Revision: https://reviews.llvm.org/D72230