This implements
```
(logic_op (op x...), (op y...)) -> (op (logic_op x, y))
```
when `op` is an extend, a shift, or an and.
This is similar to `DAGCombiner::hoistLogicOpWithSameOpcodeHands`
(with a bunch of missing cases, e.g. G_TRUNC, G_BITCAST, etc.)
This is implemented so it works both pre and post-legalization.
This also adds a general way to add a series of instructions in a combine.
(`applyBuildInstructionSteps`).
Differential Revision: https://reviews.llvm.org/D85050
ISD::ATOMIC_STORE arbitrarily has the operands in the opposite order
from regular ISD::STORE, which always introduced an annoying
duplication of patterns to handle both cases. Since in GlobalISel
there's just the one G_STORE, we need to swap the operands to
correctly emit the type check for the pointer operand.
Some work started in 20aafa31569b5157e792daa8860d71dd0df8a53a to
migrate SelectionDAG to use ISD::STORE for atomics, but that work
seems to have stalled. Since this is the pretty much the last
operation which matters which isn't supported for AMDGPU, use this
compatibility hack to unblock declaring it functionally complete.
Not sure what's going on with the pending_phis AArch64 test. It seems
it didn't always use atomics, and I'm not sure what it was originally
testing matters anymore.
When the result type of insertelement needs to be split,
SplitVecRes_INSERT_VECTOR_ELT will try to store the vector to a
stack temporary, store the element at the location of the stack
temporary plus the index, and reload the Lo/Hi parts.
This patch does the following to ensure this works for scalable vectors:
- Sets the StackID with getStackIDForScalableVectors() in CreateStackTemporary
- Adds an IsScalable flag to getMemBasePlusOffset() and scales the
offset by VScale when this is true
- Ensures the immediate is clamped correctly by clampDynamicVectorIndex
so that we don't try to use an out of range index
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D84874
In cases where MachineOutliner candidates either are:
* noreturn
* have calls with no available LR or free regs
* Don't use SP
we can end up hitting stack fixup code for the caller and the callee for
a FrameID of MachineOutlinerDefault. This triggers the assert:
`assert(OF.FrameConstructionID != MachineOutlinerDefault &&
"Can only fix up stack references once");`
in AArch64InstrInfo.cpp. This assert exists for now because a lot of the
fixup code is not tested to handle fixing up more than once and needs
some better checks and enhancements to avoid potentially generating
illegal code.
I've filed a Bugzilla report to track this until these cases are handled
by the AArch64 MachineOutliner: https://bugs.llvm.org/show_bug.cgi?id=46767
This diff detects cases that will cause these multiple stack fixups and
prune the Candidates from `RepeatedSequenceLocs`.
Differential Revision: https://reviews.llvm.org/D83923
Fixed an incorrect pattern in lib/Target/AArch64/AArch64SVEInstrInfo.td
for storing out <vscale x 2 x f32> unpacked scalable vectors. Added
a couple of tests to
test/CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
Differential Revision: https://reviews.llvm.org/D85441
If it is load cluster, we don't need to create the dependency edges(SUb->reg) from SUb to SUa
as they both depend on the base register "reg"
+-------+
+----> reg |
| +---+---+
| ^
| |
| |
| |
| +---+---+
| | SUa | Load 0(reg)
| +---+---+
| ^
| |
| |
| +---+---+
+----+ SUb | Load 4(reg)
+-------+
But if it is store cluster, we need to create it as follow shows to avoid the instruction store
depend on scheduled in-between SUb and SUa.
+-------+
+----> reg |
| +---+---+
| ^
| | Missing +-------+
| | +-------------------->+ y |
| | | +---+---+
| +---+-+-+ ^
| | SUa | Store x 0(reg) |
| +---+---+ |
| ^ |
| | +------------------------+
| | |
| +---+--++
+----+ SUb | Store y 4(reg)
+-------+
Reviewed By: evandro, arsenm, rampitec, foad, fhahn
Differential Revision: https://reviews.llvm.org/D72031
If we can't identify alloca used in lifetime marker we
need to assume to worst case scenario.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D84630
Add given input and mark it as tied.
Doesn't create additional copy compared to
matching input constraint to virtual register.
Differential Revision: https://reviews.llvm.org/D85122
This allows us to remove extra patterns from AArch64SVEInstrInfo.td
because we can reuse those required for fixed length vectors.
Differential Revision: https://reviews.llvm.org/D85328
NOTE: Also uses SVE code generation for NEON size vectors, instead
of expanding i64 based vector multiplications.
Differential Revision: https://reviews.llvm.org/D85327
Since there are no ill effects when performing these operations
with undefined elements, they are lowered to the already supported
unpredicated scalable vector equivalents.
Differential Revision: https://reviews.llvm.org/D85117
This fixes an issue triggered by the following code, where emitEpilogue
got confused when trying to restore the SVE registers after the call,
whereas the call to bar() is implemented as a TCReturn:
int non_sve();
int sve(svint32_t x) { return non_sve(); }
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84869
Get the argument register and ensure there's a copy to the virtual
register. AMDGPU and AArch64 have similarish code to get the livein
value, and I also want to use this in multiple places.
This is a bit more aggressive about setting the register class than
the original function, but that's probably OK.
I think we're missing a few verifier checks for function live ins. I
noticed AArch64's calling convention code is not actually adding
liveins to functions, only the entry block (which apparently might not
matter that much?). There should probably be a verifier check that
entry block live ins are also live into the function. We also might
need a verifier check that the copy to the livein virtual register is
in the entry block.
The SVE instruction set only supports sdiv/udiv for 32-bit and 64-bit
integers. If we see an 8-bit or 16-bit divide, widen the operands to 32
bits, and narrow the result.
Differential Revision: https://reviews.llvm.org/D85170
This one is pretty easy and shrinks the list of unhandled
intrinsics. I'm not sure how relevant the insert point is. Using the
insert position of EntryBuilder will place this after
constants. SelectionDAG seems to end up emitting these after argument
copies and before anything else, but I don't think it really
matters. This also ends up emitting these in the opposite order from
SelectionDAG, but I don't think that matters either.
This also needs a fix to stop the later passes dropping this as a dead
instruction. DeadMachineInstructionElim's version of isDead special
cases LOCAL_ESCAPE for some reason, and I'm not sure why it's excluded
from MachineInstr::isLabel (or why isDead doesn't check it).
I also noticed DeadMachineInstructionElim never considers inline asm
as dead, but GlobalISel will drop asm with no constraints.
This patch stops unconditionally transforming FSUB(-0, X) into an FNEG(X) while building the MIR.
This corresponds with the SelectionDAGISel change in D84056.
Differential Revision: https://reviews.llvm.org/D85139
This patch adds a CFI entry for each SVE callee saved register
that needs unwind info at an offset from the CFA. The offset is
a DWARF expression because the offset is partly scalable.
The CFI entries only cover a subset of the SVE callee-saves and
only encodes the lower 64-bits, thus implementing the lowest
common denominator ABI. Existing unwinders may support VG but
only restore the lower 64-bits.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84044
The CFA is calculated as (SP/FP + offset), but when there are
SVE objects on the stack the SP offset is partly scalable and
should instead be expressed as the DWARF expression:
SP + offset + scalable_offset * VG
where VG is the Vector Granule register, containing the
number of 64bits 'granules' in a scalable vector.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84043
This is the final bit of work to relax the register allocation
requirements when code generating normal LLVM IR, which rarely
care about the result of inactive lanes. By using _PRED nodes
we can make better use of SVE's reversed instructions.
Also removes a redundant parameter from the min/max tests.
Differential Revision: https://reviews.llvm.org/D85142
Currently, instruction level fast math flags are not considered when
generating patterns for the machine combiner.
This currently leads to some missed opportunities to generate FMAs in
combination with `#pragma clang fp contract (fast)`.
For example, when building the example below with -O3 for AArch64, no
FMADD is generated. If built with -O2 and the DAGCombiner is used
instead of the MachineCombiner for FMAs, an FMADD is generated.
With this patch, the same code is generated in both cases.
float madd_contract(float a, float b, float c) {
#pragma clang fp contract (fast)
return (a * b) + c;
}
Reviewed By: dmgreen
Differential Revision: https://reviews.llvm.org/D84930
GlobalISel is the default ISel for aarch64 at -O0. Prior to D78465, GlobalISel
didn't have support for dealing with address-of-global lowerings, so it fell
back to SelectionDAGISel.
HWASan Globals require special handling, as they contain the pointer tag in the
top 16-bits, and are thus outside the code model. We need to generate a `movk`
in the instruction sequence with a G3 relocation to ensure the bits are
relocated properly. This is implemented in SelectionDAGISel, this patch does
the same for GlobalISel.
GlobalISel and SelectionDAGISel differ in their lowering sequence, so there are
differences in the final instruction sequence, explained in
`tagged-globals.ll`. Both of these implementations are correct, but GlobalISel
is slightly larger code size / slightly slower (by a couple of arithmetic
instructions). I don't see this as a problem for now as GlobalISel is only on
by default at `-O0`.
Reviewed By: aemerson, arsenm
Differential Revision: https://reviews.llvm.org/D82615
Fixes test-suite compile failure caused by 8dfb5d7.
While I'm in the area, add some more test coverage to related
operations, to make sure we aren't missing any other patterns.
Refer to LangRef http://llvm.org/docs/LangRef.html#llvm-masked-load-intrinsics
'llvm.masked.load/store.*’ intrinsics are overloaded intrinsic, which allow the
load/store data to be a vector of any integer, floating-point or pointer data type.
Therefore, allow pointer data type when checking 'isLegalMaskedLoadStore()'.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D85045
https://reviews.llvm.org/D84909
Patch adds two new GICombinerRules, one for G_INTTOPTR and one for
G_PTRTOINT. The G_INTTOPTR elides ptr2int(int2ptr(x)) to a copy of x, if
the cast is within the same address space. The G_PTRTOINT elides
int2ptr(ptr2int(x)) to a copy of x. Patch additionally adds new combiner
tests for the AArch64 target to test these new combiner rules.
Patch by mkitzan
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.
Differential Revision: https://reviews.llvm.org/D84706
This fixes an assertion failure that was being triggered in
SelectionDAG::getZeroExtendInReg(), where it was trying to extend the <2xi32>
to i64 (which should have been <2xi64>).
Fixes: rdar://66016901
Differential Revision: https://reviews.llvm.org/D84884
When building code at -O0 We weren't falling back to DAG ISel correctly
when encountering alloca instructions with scalable vector types. This
is because the alloca has no operands that are scalable. I've fixed this by
adding a check in AArch64ISelLowering::fallBackToDAGISel for alloca
instructions with scalable types.
Differential Revision: https://reviews.llvm.org/D84746
I still think it's highly questionable that we have two intrinsics
with identical behavior and only vary by the name of the libcall used
if it happens to be lowered that way, but try to reduce the feature
delta between SDAG and GlobalISel for recently added intrinsics. I'm
not sure which opcode should be considered the canonical one, but
lower roundeven back to round.
In future, we'd like to use the perfect-shuffle mechanism to deal with these
shuffle permutations. For now, this improves performance by avoiding the
super-expensive const-pool load + tbl instruction.
Differential Revision: https://reviews.llvm.org/D84866
Port the wide immediate case from AArch64DAGToDAGISel::SelectAddrModeXRO.
If we have a wide immediate which can't be represented in an add, we can end up
with code like this:
```
mov x0, imm
add x1, base, x0
ldr x2, [x1, 0]
```
If we use the [base, xN] addressing mode instead, we can produce this:
```
mov x0, imm
ldr x2, [base, x0]
```
This saves 0.4% code size on 7zip at -O3, and gives a geomean code size
improvement of 0.1% on CTMark.
Differential Revision: https://reviews.llvm.org/D84784
I have added tests to:
CodeGen/AArch64/sve-intrinsics-int-arith.ll
for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:
1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.
Differential revision: https://reviews.llvm.org/D84016
Previous patches fixed up all the warnings in this test:
llvm/test/CodeGen/AArch64/sve-sext-zext.ll
and this change simply checks that no new warnings are added in future.
Differential revision: https://reviews.llvm.org/D83205
I think these were added as a workaround for SelectionDAG lacking half
legalization support in the past. I think they should probably be
removed from the IR, but clang does still have a target control to
emit these instead of the native half fpext/fptrunc.
While deallocating the stackframe, the offset used to reload the
callee-saved registers was not pointing to the SVE callee-saves,
but rather to the whole SVE area.
+--------------+
| GRP callee |
| saves |
+--------------+ <- FP
| SVE callee |
| saves |
+--------------+ <- Should restore SVE callee saves from here
| SVE Spills |
| and Locals |
+--------------+ <- instead of from here.
| |
: :
| |
+--------------+ <- SP
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D84539
Instead of aligning the last callee-saved-register slot to the stack
alignment (16 bytes), just align the SVE callee-saved block. This also
simplifies the code that allocates space for the callee-saves.
This change is needed to make sure the offset to which the callee-saved
register is spilled, corresponds to the offset used for e.g. unwind call
frame instructions.
Reviewers: efriedma, paulwalker-arm, david-arm, rengolin
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D84042
Fixed stack objects are preallocated and defined to be allocated before
any of the regular stack objects. These are normally used to model stack
arguments.
The AAPCS does not support passing SVE registers on the stack by value
(only by reference). The current layout also doesn't place them before
all stack objects, but rather before all SVE objects. Removing this
simplifies the code that emits the allocation/deallocation
around callee-saved registers (D84042).
This patch also removes all uses of fixedStack from from
framelayout-sve.mir, where this was used purely for testing purposes.
Reviewers: paulwalker-arm, efriedma, rengolin
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D84538