In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
Move fixed length SDIV tests from sve-fixed-length-int-arith.ll to sve-fixed-length-int-div.ll. The former uses CHECK lines that verify legalization decisions. That's overkill for the i8/i16 SDIV tests, since they have a tricky legalization.
There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32.
Differential Revision: https://reviews.llvm.org/D86114
For scalable vector shifts the prediacte is typically all active,
which gets selected to an unpredicated shift by immediate. When
code generating for fixed length vectors the predicate is based
on the vector length and so additional patterns are required to
make use of SVE's predicated shift by immediate instructions.
Differential Revision: https://reviews.llvm.org/D86204
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.
This change relaxes the check to allow PHI nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86141
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.
It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.
Differential Revision: https://reviews.llvm.org/D85720
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463
In DAGTypeLegalizer::GenWidenVectorLoads the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the code in that
function to use TypeSize instead of unsigned for tracking the remaining
load amount. In addition, I've changed the load loop to use the new
IncrementPointer helper function for updating the addresses in each
iteration, since this handles scalable vector types.
Also, I've added report_fatal_errors in GenWidenVectorExtLoads,
TargetLowering::scalarizeVectorLoad and TargetLowering::scalarizeVectorStores,
since these functions currently use a sequence of element-by-element
scalar loads/stores. In a similar vein, I've also added a fatal error
report in FindMemType for the case when we decide to return the element
type for a scalable vector type.
I've added new tests in
CodeGen/AArch64/sve-split-load.ll
CodeGen/AArch64/sve-ld-addressing-mode-reg-imm.ll
for the changes in GenWidenVectorLoads.
Differential Revision: https://reviews.llvm.org/D85909
We probably want to introduce pseudo-instructions at some point, like
we have for binary operations, but this seems okay for now.
One thing I'm not sure about is whether we should be doing this as a
DAGCombine instead of directly pattern-matching it. I don't see any big
downside to doing it this way, though.
Differential Revision: https://reviews.llvm.org/D85681
This isn't necessaary for ACLE, but could be useful in other situations.
And the change is simple.
Differential Revision: https://reviews.llvm.org/D85251
Similar to this commit:
faf8065a99817bcb10e6f09b558fe3e0972c35ce
Testcase is pretty much the same as
test/CodeGen/AArch64/tailcall-explicit-sret.ll
Except it uses i64 (since we don't handle the i1024 return values yet), and
doesn't have indirect tail call testcases (because we can't translate those
yet).
Differential Revision: https://reviews.llvm.org/D86148
This is restricted to single use loads, which if we fold to sextloads we can
find more optimal addressing modes on AArch64.
This also fixes an overload the MachineFunction::getMachineMemOperand() method
which was incorrectly using the MF alignment instead of the MMO alignment.
Differential Revision: https://reviews.llvm.org/D85966
By detecting this sign extend pattern early, we can uncover opportunities for
more optimizations.
Differential Revision: https://reviews.llvm.org/D85965
We weren't looking through the parameters on calls at all.
E.g., say you had
```
declare i32 @zext(i32 zeroext %x)
...
%y = call i32 @zext(i32 %something)
...
```
At the point of the call, we wouldn't know that the %something should have the
zeroext attribute.
This sets flags in about the same way as
TargetLoweringBase::ArgListEntry::setAttributes.
Differential Revision: https://reviews.llvm.org/D86125
Right shift patterns will no longer incorrectly accept a shift
amount of zero. At the same time they will allow larger shift
amounts that are now saturated to their upper bound.
Patterns have been extended to enable immediate forms for shifts
taking an arbitrary predicate.
This patch also unifies the code path for immediate parsing so the
i64 based shifts are no longer treated specially.
Differential Revision: https://reviews.llvm.org/D86084
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
This cleans up copies that the legalizer or other combines leave around. They
can occasionally end up escaping as moves.
Differential Revision: https://reviews.llvm.org/D85964
These should really match either G_BUILD_VECTOR or
G_BUILD_VECTOR_TRUNC, but there doesn't seem to be an existing
mechanism for matching alternative opcodes. There is GIM_SwitchOpcode,
but it seems to assume it's oly only used for matcher optimization.
I could also omit any opcode check and rely on the matcher directly
checking the opcode, but the table optimizer currently assumes there
has to be an opcode check.
Also doesn't try to handle undef elements like the DAG version.
The code wasn't taking into account that the two operands
passed to ptest could be identical and was trying to erase
them twice.
Differential Revision: https://reviews.llvm.org/D85892
This patch restricts the behaviour of referencing via .Lfoo$local
local aliases, introduced in https://reviews.llvm.org/D73230, to
STV_DEFAULT globals only.
Hidden symbols via --fvisiblity=hidden (https://gcc.gnu.org/wiki/Visibility)
is an important scenario.
Benefits:
- Improves the size of object files by using fewer STT_SECTION symbols.
- The code reads a bit better (it was not obvious to me without going
back to the code reviews why the canBenefitFromLocalAlias function
currently doesn't consider visibility).
- There is also a side benefit in restoring the effectiveness of the
--wrap linker option and making the behavior of --wrap consistent
between LTO and normal builds for references within a translation-unit.
Note: this --wrap behavior (which is specific to LLD) should not be
considered reliable. See comments on https://reviews.llvm.org/D73230
for more.
Differential Revision: https://reviews.llvm.org/D85782
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
This patch changes SplitVecOp_EXTRACT_VECTOR_ELT to work correctly
for scalable vectors and also fixes an a bug in DAGCombiner where
the scalable property is dropped in visitTRUNCATE when attempting
to fold an extract + a truncate.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D85754
Similarly as for pointers, even for integers a == b is usually false.
GCC also uses this heuristic.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D85781
In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.
Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.
Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.
I've added new tests in
CodeGen/AArch64/sve-intrinsics-stores.ll
CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
for the changes in GenWidenVectorStores.
Differential Revision: https://reviews.llvm.org/D84937
In narrowExtractedVectorLoad there is an optimisation that tries to
combine extract_subvector with a narrowing vector load. At the moment
this produces warnings due to the incorrect calls to
getVectorNumElements() for scalable vector types. I've got this
working for scalable vectors too when the extract subvector index
is a multiple of the minimum number of elements. I have added a
new variant of the function:
MachineFunction::getMachineMemOperand
that copies an existing MachineMemOperand, but replaces the pointer
info with a null version since we cannot currently represent scaled
offsets.
I've added a new test for this particular case in:
CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D83950
Testing is performed when targeting 128, 256 and 512-bit wide vectors.
For 128-bit vectors, the original behavior of using NEON instructions is
preserved.
Differential Revision: https://reviews.llvm.org/D85479
This is mostly a straight port from SelectionDAG. We re-use the actual bit-test
analysis part from SwitchLoweringUtils, which was factored out earlier to
support jump-tables.
Differential Revision: https://reviews.llvm.org/D85233
In this patch I have fixed two issues:
1. Our SVE tuple get/set intrinsics were using the wrong constant type
for the index passed to EXTRACT_SUBVECTOR. I have fixed this by using the
function SelectionDAG::getVectorIdxConstant to create the value. Also, I
have updated the documentation for EXTRACT_SUBVECTOR describing what type
the constant index should be and we now enforce this when creating the
node.
2. The AArch64 backend was missing the appropriate patterns for
extracting certain subvectors (nxv4f16 and nxv2f32) from legal SVE types.
I have added them as part of this patch.
The only way that I could find to test the new patterns was to use the
SVE tuple get intrinsics, although I realise it looks a bit unusual.
Tests added here:
test/CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D85516
This implements
```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```
when `op` is an extend, a shift, or an and.
This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)
This is implemented so it works both pre and post-legalization.
This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).
Differential Revision: https://reviews.llvm.org/D85050
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.
Some work started in 20aafa31569b5157e792daa8860d71dd0df8a53a to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.
Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.
This patch does the following to ensure this works for scalable vectors:
- Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
- Adds an IsScalable flag to getMemBasePlusOffset() and scales the
offset by VScale when this is true
- Ensures the immediate is clamped correctly by clampDynamicVectorIndex
so that we don't try to use an out of range index
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D84874