We encountered an issue where LTO running on IR that used the DSOLocalEquivalent
constant would result in bad codegen. The underlying issue was ValueMapper wasn't
properly handling DSOLocalEquivalent, so this just adds the machinery for handling
it. This code path is triggered by a fix to DSOLocalEquivalent::handleOperandChangeImpl
where DSOLocalEquivalent could potentially not have the same type as its underlying GV.
This updates DSOLocalEquivalent::handleOperandChangeImpl to change the type if
the GV type changes and handles this constant in ValueMapper.
Differential Revision: https://reviews.llvm.org/D97978
This test shows a case where we can potentially scalarize the store in a
predicated loop, creating a lot of instructions that would be much
slower than scalar.
Turn `-Wreturn-type` into an error.
This is currently used by libcxx, libcxxabi, and libunwind, and would be a good default
for all of llvm. I'm not aware of any cases where this shouldn't be an error. This
ensures different build configs, merges, and downstream branches catch issues sooner.
Differential Revision: https://reviews.llvm.org/D98224
This pull request implements patterns to exploit the load rightmost vector
element instructions for loading element 0 on little endian PowerPC subtargets
into v8i16 and v16i8 vector registers for i16 and i8 data types.
Differential Revision: https://reviews.llvm.org/D94816#inline-921403
This was suggested by lebedev.ri over on D96534. You'll note lack of tests. During review, we weren't actually able to find a case which exercises it, but both I and lebedev.ri feel it's a reasonable change, straight forward, and near free.
Differential Revision: https://reviews.llvm.org/D97064
This removes hard-coded shadow width references and adds more RUN
lines to increase test coverage under different options (fast16 labels
mode).
Also, shortens the test by unifying common lines under both combine- and no-combine-ptr-label options.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D98227
LSR prefers to schedule iv increments just before the latch. The recent 80511565 broadened this to moving increments in the original IR. This pointed out a robustness problem with the CGP transform.
When we have a use of an induction increment outside of the loop (we canonicalize away from this form, but it happens e.g. unanalyzeable loops) we'd avoid performing the uadd/usub transform. Interestingly, all of these involve moving the increment closer to it's operands, so there's no concern about dominating all uses. We can handle that case cheaply, resulting in a more robust transform.
For <2 x s32>, we can use G_DUPLANE32, but with a <4 x s32> source. To make it
work, we can just widen the original source with a concat_vectors.
Doing this allows <2 x float> indexed fmul instruction selection patterns to
fire, which gives a nice 0.3% code size saving on Bullet with -Os.
Differential Revision: https://reviews.llvm.org/D98059
If every element is extracted from a G_BUILD_VECTOR, pass through the source
registers. This is different to the extract(build_vector) combine because this
one tolerates multiple users as long as they're exhaustive.
Differential Revision: https://reviews.llvm.org/D97890
Refactor and add comments to explain where the magic numbers come from
in terms of the instruction cache line size. NFC.
Differential Revision: https://reviews.llvm.org/D98266
This patch implements DBG_VALUE_LIST handling to the LiveDebugValues pass. This
is a substantial change, and makes a few fundamental changes to the existing
logic.
We still use the basic model of a VarLocMap that is indexed by a LocIndex, with
a VarLocSet (a CoalescingBitVector underneath) giving us efficient lookups of
existing variable locations for a given location type. The main change is that
the VarLocMap may contain a given VarLoc multiple times (once for each unique
location operand), so that a VarLoc can be looked up from any of the registers
that it uses. This means that each VarLoc has multiple corresponding LocIndexes;
to allow us to iterate through the set of VarLocs (previously we would iterate
through the VarLocSet), we now also maintain a single entry in the VarLocMap
that contains every VarLoc exactly once.
The VarLoc class itself is also changed; this change is much simpler,
refactoring out location-specific members into a MachineLocation class and
adding a vector of these locations.
Differential Revision: https://reviews.llvm.org/D83890
This test currently fails to compile when using a MinGW toolchain as setenv is not defined. This function is a POSIX function Windows does not implement.
This patch enables the setenv macro used in the unit test for all of Windows, making the test compile and run successfully.
Differential Revision: https://reviews.llvm.org/D98271
llvm-jitlink and llvm-jitlink-executor make use of APIs that are
part of the socket and nsl libraries on SunOS systems (Solaris and
Illumos). Make sure they get linked.
Ran into this in Rust CI when cross-compiling LLVM 12 to these
targets.
Differential Revision: https://reviews.llvm.org/D97633
I've left mask registers to a future patch as we'll need
to convert them to full vectors, shuffle, and then truncate.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97609
AMDGPU target tries to handle the SGPR and VGPR spills in a
custom pass before the actual frame lowering pass. Once they
are handled and the respective frames are eliminated in the
custom pass, certain uses of them still remain. For instance,
the DBG_VALUE instructions inserted by the allocator alongside
the spill instruction will use the corresponding frame index.
They become dead later during PEI and causes a crash while trying to
replace the frame indices. We should possibly avoid this custom pass.
For now, replacing such dead references with null register value.
Reviewed By: arsenm, scott.linder
Differential Revision: https://reviews.llvm.org/D98038
All extractvalues of the same value at the same index will map to
the same register, so even if one specific extractvalue only has
one use, we should not mark it as a trivial kill, as there may be
more extractvalues later.
Fixes https://bugs.llvm.org/show_bug.cgi?id=49467.
Differential Revision: https://reviews.llvm.org/D98145
The LiveDebugValues and LiveDebugVariables implementations for handling
DBG_VALUE_LIST instructions can be simplified significantly if they do not have
to deal with any duplicated operands, such as a DBG_VALUE_LIST that uses the
same register multiple times in its expression. This patch adds a function,
replaceArg, that can be used to simplify a DIExpression in the case of
duplicated operands.
Differential Revision: https://reviews.llvm.org/D83896
I've included tests that require type legalization to split the
vector. The i64 version of these scalarizes on RV32 due to type
legalization visiting the result before the vector type. So we
have to abort our custom expansion to avoid creating target
specific nodes with an illegal type. Then type legalization ends
up scalarizing. We might be able to fix this by doing custom
splitting for large vectors in our handler to get down to a legal
type.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98102
Previously we set the value to -1, but the SEW information could
be useful for scheduling.
Reviewed By: frasercrmck, rogfer01
Differential Revision: https://reviews.llvm.org/D98062
The default fixed vector expansion uses sra+xor+add since it can't
see that smax is legal due to our custom handling. So we select
smax(X, sub(0, X)) manually.
Scalable vectors are able to use the smax expansion automatically
for most cases. It crashes in one case because getConstant can't build a
SPLAT_VECTOR for nxvXi64 when i64 scalars aren't legal. So
we manually emit a SPLAT_VECTOR_I64 for that case.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97991
As far as I know we're not enforcing the StdExtM must be enabled
to use the V extension. If we use an assert here and hit this
code in a release build we'll silently emit an invalid instruction.
By using a diagnostic we report the error to the user in release
builds. I think there may still be a later fatal error from
the code emitter though.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97970
This patch updates the various IR passes to correctly handle dbg.values with a
DIArgList location. This patch does not actually allow DIArgLists to be produced
by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
Other than that, it should cover every IR pass.
Most of the changes simply extend code that operated on a single debug value to
operate on the list of debug values in the style of any_of, all_of, for_each,
etc. Instances of setOperand(0, ...) have been replaced with with
replaceVariableLocationOp, which takes the value that is being replaced as an
additional argument. In places where this value isn't readily available, we have
to track the old value through to the point where it gets replaced.
Differential Revision: https://reviews.llvm.org/D88232
This is another step towards parity between existing select
transforms and min/max intrinsics (D98152)..
The existing 'not' folds around select are complicated, so
it's likely that we will need to enhance this, but this
should be a safe step.
Add a comment explaining how we lay out stack frames for ARM targets,
based on the existing one for AArch64. Also expand the comment to
explain reserved call frames for both architectures.
Differential revision: https://reviews.llvm.org/D98258
This is a partial translation of the existing select-based
folds. We need to recreate several different transforms to
avoid regressions as noted in D98152.
https://alive2.llvm.org/ce/z/teuZ_J
This patch completes ISel support for DIArgList dbg.values by allowing
SDDbgValues with multiple location operands to be emitted as DBG_VALUE_LIST
instructions.
The primary change of this patch is refactoring EmitDbgValue by pulling location
operand emission out to the new function AddDbgValueLocationOps, which is used
for both DIArgList and single value dbg.values. Outside of that, the only
behaviour change is that the scheduler has a lambda added, HasUnknownVReg, to
prevent us from attempting to emit a DBG_VALUE_LIST before all of its used VRegs
have become available.
Differential Revision: https://reviews.llvm.org/D88592
Some EVEX instructions should check the predicates when compress to VEX
encoding. For example, avx512vnni instructions. This is because avx512vnni
doesn't mean that avxvnni is supported on the target.
This patch moving the manually added check to .inc that generated by tablegen.
Differential Revision: https://reviews.llvm.org/D98011
If the incoming values of a phi are pointer casts of the same original
value, replace the phi with a single cast. Such redundant phis are
somewhat common after loop-rotate and removing them can avoid some
unnecessary code bloat, e.g. because an iteration of a loop is peeled
off to make the phi invariant. It should also simplify further analysis
on its own.
InstCombine already uses stripPointerCasts in a couple of places and
also simplifies phis based on the incoming values, so the patch should
fit in the existing scope.
The patch causes binary changes in 47 out of 237 benchmarks in
MultiSource/SPEC2000/SPEC2006 with -O3 -flto on X86.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98058