Add unit tests for this behavior, since the integration test for
clang-cl did not catch these bugs.
Fixes PR47604
Differential Revision: https://reviews.llvm.org/D90866
This is the cmp/sel sibling to D90692.
Again, the reasoning is: the throughput cost is number of instructions/uops,
so size/blended costs are identical except in special cases (for example,
fdiv or other known-expensive machine instructions or things like MVE that
may require cracking into >1 uops).
We need to check for a valid (non-null) condition type parameter because
SimplifyCFG may pass nullptr for that (and so we will crash multiple
regression tests without that check). I'm not sure if passing nullptr makes
sense, but other code in the cost model does appear to check if that param
is set or not.
Differential Revision: https://reviews.llvm.org/D90781
The debug location is removed from any outlined instruction. This
causes the MachineVerifier to crash on outlined DBG_VALUE
instructions.
Then, debug instructions are "invisible" to the outliner, that is, two
ranges of instructions from different functions are considered
identical if the only difference is debug instructions. Since a debug
instruction from one function is unlikely to provide sensible debug
information about all functions, sharing an outlined sequence, this
patch just removes debug instructions from the outlined functions.
Differential Revision: https://reviews.llvm.org/D89485
These were previously handled by pattern matching shuffles in the selector, but
adding a new opcode and making it equivalent to the AArch64duplane SDAG node
allows us to select more patterns, like lane indexed FMLAs (patch adding a test
for that will be committed later).
The pattern matching code has been simply moved to postlegalize lowering.
Differential Revision: https://reviews.llvm.org/D90820
The if was checking !Res.getNode() but that's always true since
Res was initialized to SDValue() and not touched before the if.
This appears to be a leftover from a previous implementation of
Custom legalization where Res was updated instead of returning
immediately.
D80526 added custom lowering to pick the si lib call on RV64, but this custom handling is only enabled when the F and D extension are both disabled. This prevents the si library call from being used for double when F is enabled but D is not.
This patch changes the behavior so we always enable the Custom hook on RV64 and decide in ReplaceNodeResults if we should emit a libcall based on whether the FP type should be softened or not.
Differential Revision: https://reviews.llvm.org/D90817
This change adds a real glc operand to the return atomic
instead of just string " glc" in the middle of the asm
string.
Improves asm parser diagnostics.
Differential Revision: https://reviews.llvm.org/D90730
The _F and _D registers are already sub/super registers. When one gets allocated all its aliases are already marked as allocated. We don't need to explicitly shadow it too.
I believe shadow is for calling conventions like 64-bit Windows on X86 where have rules like this
CCIfType<[i32], CCAssignToRegWithShadow<[ECX , EDX , R8D , R9D ],
[XMM0, XMM1, XMM2, XMM3]>>
For that calling convention the argument number determines which register is used regardless of how many scalars or vectors came before it.
Removing this removes a question I had in D90738.
Differential Revision: https://reviews.llvm.org/D90801
There is no FSLI instruction, but we can emulate it using FSRI by swapping operands and subtracting the immediate from the bitwidth.
Differential Revision: https://reviews.llvm.org/D90826
This reverts commit 15694fd6ad955c6a16b446a6324364111a49ae8b.
Need to investigate and fix a failing clang test: synchronized.m.
Might need a test update.
This moves WidenIV from IndVarSimplify to Utils/SimplifyIndVar so that we have
createWideIV available as a generic helper utility. I.e., this is not only
useful in IndVarSimplify, but could be useful for loop transformations. For
example, motivation for this refactoring is the loop flatten transformation: if
induction variables in a loop nest can be widened, we can avoid having to
perform certain overflow checks, enabling this transformation.
Differential Revision: https://reviews.llvm.org/D90421
CapturesBefore tracker has an overly restrictive dominates check when
the `BeforeHere` and the capture point are in different basic blocks.
All we need to check is that there is no path from the capture point
to `BeforeHere` (which is less stricter than the dominates check).
See added testcase in one of the users of CapturesBefore.
Reviewed-By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90688
Update the Programmer's Reference document.
Add a test. Update a couple of tests with an improved error message.
Differential Revision: https://reviews.llvm.org/D90635
When replacing an assume(false) with a store, we have to be more careful
with the order we insert the new access. This patch updates the code to
look at the accesses in the block to find a suitable insertion point.
Alterantively we could check the defining access of the assume, but IIRC
there has been some discussion about making assume() readnone, so
looking at the access list might be more future proof.
Fixes PR48072.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D90784
Convert GISelKnownBits.computeKnownBitsImpl shift handling to use the common KnownBits implementations, which makes use of the known leading/trailing bits for shifted values in cases where we don't know the shift amount value, as detailed in https://blog.regehr.org/archives/1709
Differential Revision: https://reviews.llvm.org/D90527
To accommodate frame layouts that have both fixed and scalable objects
on the stack, describing a stack location or offset using a pointer + uint64_t
is not sufficient. For this reason, we've introduced the StackOffset class,
which models both the fixed- and scalable sized offsets.
The TargetFrameLowering::getFrameIndexReference is made to return a StackOffset,
so that this can be used in other interfaces, such as to eliminate frame indices
in PEI or to emit Debug locations for variables on the stack.
This patch is purely mechanical and doesn't change the behaviour of how
the result of this function is used for fixed-sized offsets. The patch adds
various checks to assert that the offset has no scalable component, as frame
offsets with a scalable component are not yet supported in various places.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D90018
Some targets may add required passes via
TargetMachine::registerPassBuilderCallbacks(). We need to run those even
under -O0. As an example, BPFTargetMachine adds
BPFAbstractMemberAccessPass, a required pass.
This also allows us to clean up BackendUtil.cpp (and out-of-tree Rust
usage of the NPM) by allowing us to share added passes like coroutines
and sanitizers between -O0 and other optimization levels.
Tests are a continuation of those added in
https://reviews.llvm.org/D89083.
In order to prevent TargetMachines from adding unnecessary optimization
passes at -O0, TargetMachine::registerPassBuilderCallbacks() will be
changed to take an OptimizationLevel, but that will be done separately.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D89158
Add more profitable sinking patterns if the target bb register pressure
is not too high.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D88126
With new test file this time.
Original message
This new test covers both with and without the F extension enabled.
This shows that the fptosi/fptoui for double->i32 use a different
libcall depending on whether the F extension is enabled. If it's
not enabled we use the 'si' library call. If it is enabled we use 'di'.
This new test covers both with and without the F extension enabled.
This shows that the fptosi/fptoui for double->i32 use a different
libcall depending on whether the F extension is enabled. If it's
not enabled we use the 'si' library call. If it is enabled we use 'di'.
This patch adds the llvm.loop.mustprogress loop metadata. This is to be
added to loops where the frontend language requires that the loop makes
observable interactions with the environment. This is the loop-level
equivalent to the function attribute `mustprogress` defined in D86233.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D88464
The current compact unwind scheme does not work when the prologue is not at the
start (the instructions before the prologue cannot be described). (Technically
this is fixable, but it requires multiple compact unwind descriptors for one
function.)
rL255175 chose to not perform shrink-wrapping for no-frame-pointer functions not
marked as nounwind to work around PR25614. This is overly limited, as platforms
not supporting compact unwind (all non-Darwin) does not need the workaround.
This patch restricts the limitation to compact unwind platforms.
Reviewed By: qcolombet
Differential Revision: https://reviews.llvm.org/D89930
Allow single-quoted strings and double-quoted character values, as well as doubled-quote escaping.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D89731
Hook up legalizations for VECREDUCE_SEQ_FMUL. This is following up on the VECREDUCE_SEQ_FADD work from D90247.
Differential Revision: https://reviews.llvm.org/D90644
The operations in these patterns shouldn't be effected by sign
bits. And the pattern is starting from a sign_extend_inreg so
we aren't expecting sign bits to be passed through either.
Differential Revision: https://reviews.llvm.org/D90739
If getClobberingMemoryAccess() is called with an explicit
MemoryLocation, but the starting access happens to be a call, the
provided location is currently ignored, and alias analysis queries
will be performed against the call instruction instead. Something
similar happens if the starting access is a load with a MemoryDef.
Change the implementation to not set Q.Inst in the first place if
we want to perform a MemoryLocation-based query, to make sure it
can't be turned into an Instruction-based query along the way...
Additionally, remove the special handling that lifetime.start
intrinsics currently get. They simply report NoAlias for clobbers
between lifetime.start and other calls, but that's obviously not
right if the other call is something like a memset or memcpy. The
default behavior we get from getModRefInfo() will already do the
right thing here.
Differential Revision: https://reviews.llvm.org/D88782
`mftb` and `mftbl` are equivalent, there is no need to have two names for doing the same thing, rename `mftbl` to only have `mftb`.
Differential Revision: https://reviews.llvm.org/D89506