For now, we correct the result for sqrt if iteration > 0. This doesn't make
sense as they are not strict relative.
Reviewed By: dmgreen, spatel, RKSimon
Differential Revision: https://reviews.llvm.org/D94480
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
InstrEmitter.h needs TargetMachine but relies on a forward declaration
of TargetMachine in MachineOperand.h. This patch adds a forward
declaration right in InstrEmitter.h.
While we are at it, this patch removes the one in MachineOperand.h,
where it is unnecessary.
In RISC-V there is a single addressing mode of the form imm(reg) where
imm is a signed integer of 12-bit with a range of [-2048..2047] bytes
from reg.
The test MultiSource/UnitTests/C++11/frame_layout of the LLVM test-suite
exercises several scenarios with the stack, including function calls
where the stack will need to be realigned to to a local variable having
a large alignment of 4096 bytes.
In situations of large stacks, the RISC-V backend (in
RISCVFrameLowering) reserves an extra emergency spill slot which can be
used (if no free register is found) by the register scavenger after the
frame indexes have been eliminated. PrologEpilogInserter already takes
care of keeping the emergency spill slots as close as possible to the
stack pointer or frame pointer (depending on what the function will
use). However there is a final alignment step to honour the maximum
alignment of the stack that, when using the stack pointer to access the
emergency spill slots, has the side effect of setting them farther from
the stack pointer.
In the case of the frame_layout testcase, the net result is that we do
have an emergency spill slot but it is so far from the stack pointer
(more than 2048 bytes due to the extra alignment of a variable to 4096
bytes) that it becomes unreachable via any immediate offset.
During elimination of the frame index, many (regular) offsets of the
stack may be immediately unreachable already. Their address needs to be
computed using a register. A virtual register is created and later
RegisterScavenger should be able to find an unused (physical) register.
However if no register is available, RegisterScavenger will pick a
physical register and spill it onto an emergency stack slot, while we
compute the offset (restoring the chosen register after all this). This
assumes that the emergency stack slot is easily reachable (this is,
without requiring another register!).
This is the assumption we seem to break when we perform the extra
alignment in PrologEpilogInserter.
We can "float" the emergency spill slots by increasing (in absolute
value) their offsets from the incoming stack pointer. This way the
emergency spill slots will remain close to the stack pointer (once the
function has allocated storage for the stack, including the needed
realignment). The new size computed in PrologEpilogInserter is padding
so it should be OK to move the emergency spill slots there. Also because
we're increasing the alignment, the new location should stay aligned for
the purpose of the emergency spill slots.
Note that this change also impacts other backends as shown by the tests.
Changes are minor adjustments to the emergency stack slot offset.
Differential Revision: https://reviews.llvm.org/D89239
The only caller of this function is in the LocalStackSlotAllocation
and it creates base register of class returned by the target's
getPointerRegClass(). AMDGPU wants to use a different reg class
here so let materializeFrameBaseRegister to just create and return
whatever it wants.
Differential Revision: https://reviews.llvm.org/D95268
The widenScalar implementation for signed and unsigned overflowing
operations were very similar: both are checked by truncating the result
and then re-sign/zero-extending it and checking that it matches the
computed operation.
Using a truncate + zero-extend for the unsigned case instead of manually
producing the AND instruction like before leads to an extra copy
instruction during legalization, but this should be harmless.
Differential Revision: https://reviews.llvm.org/D95035
Noticed while I was touching other nearby code. I don't have a
test where this matters because the targets I work on
use zero or one boolean contents. And the tests cases I've seen
this fire on happen before type legalization where the result type
is MVT::i1 so the distinction doesn't matter.
There was code to handle the first operand being different than
the result type. And code to handle first operand having the
same type as the type to extend from. This should never happen
for a correctly formed SIGN_EXTEND_INREG. I've replace the
code with asserts.
I also noticed we created the same APInt twice so I've reused it.
Implement widening for G_SADDO and G_SSUBO. Previously it was only
implemented for G_UADDO and G_USUBO. Also add legalize-add/sub tests for
narrow overflowing add/sub on AArch64.
Differential Revision: https://reviews.llvm.org/D95034
Make this look more like the DAG handling and move to common code.
I also noticed AArch64 seems to not be properly adding the
physreg:virtreg mapping to the function live ins.
Add DemandedElts support inside the TRUNCATE analysis.
REAPPLIED - this was reverted by @hans at rGa51226057fc3 due to an issue with vector shift amount types, which was fixed in rG935bacd3a724 and an additional test case added at rG0ca81b90d19d
Differential Revision: https://reviews.llvm.org/D56387
As noticed on D56387, for vectors we must always correctly adjust the shift amount type during truncation (not just after legalization). We were getting away with it as we currently only accepted scalars via the dyn_cast<ConstantSDNode>.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the regmask of call instruction and insert ldtilecfg if there is
any tile data register live across the call. Inserting the sttilecfg
before the call is unneccessary, because the tile config doesn't change
and we can just reload the config.
Besides we also need check tile config register interference. Since we
don't model the config register we should check interference from the
ldtilecfg to each tile data register def.
ldtilecfg
/ \
BB1 BB2
/ \
call BB3
/ \
%1=tileload %2=tilezero
We can start from the instruction of each tile def, and backward to
ldtilecfg. If there is any call instruction, and tile data register is
not preserved, we should insert ldtilecfg after the call instruction.
Differential Revision: https://reviews.llvm.org/D94155
It caused "Vector shift amounts must be in the same as their first arg"
asserts in Chromium builds. See the code review for repro instructions.
> Add DemandedElts support inside the TRUNCATE analysis.
>
> Differential Revision: https://reviews.llvm.org/D56387
This reverts commit cad4275d697c601761e0819863f487def73c67f8.
Add the aarch64[_be]-*-gnu_ilp32 targets to support the GNU ILP32 ABI for AArch64.
The needed codegen changes were mostly already implemented in D61259, which added support for the watchOS ILP32 ABI. The main changes are:
- Wiring up the new target to enable ILP32 codegen and MC.
- ILP32 va_list support.
- ILP32 TLSDESC relocation support.
There was existing MC support for ELF ILP32 relocations from D25159 which could be enabled by passing "-target-abi ilp32" to llvm-mc. This was changed to check for "gnu_ilp32" in the target triple instead. This shouldn't cause any issues since the existing support was slightly broken: it was generating ELF64 objects instead of the ELF32 object files expected by the GNU ILP32 toolchain.
This target has been tested by running the full rustc testsuite on a big-endian ILP32 system based on the GCC ILP32 toolchain.
Reviewed By: kristof.beyls
Differential Revision: https://reviews.llvm.org/D94143
If constants are hidden behind G_ANYEXT we can treat them same way as G_SEXT.
For that purpose we extend getConstantVRegValWithLookThrough with option
to handle G_ANYEXT same way as G_SEXT.
Differential Revision: https://reviews.llvm.org/D92219
When constraining an operand register using constrainOperandRegClass(),
the function may emit a COPY in case the provided register class does
not match the current operand register class. However, the operand
itself is not updated to make use of the COPY, thereby resulting in
incorrect code. This patch fixes that bug by updating the machine
operand accordingly.
Reviewed By: dsanders
Differential Revision: https://reviews.llvm.org/D91244
function-instrument=xray-never wasn't actually honored before. We were
getting lucky that it worked because CodeGenFunction would omit the
other xray attributes when a function was annotated with
xray_never_instrument. This patch adds proper support.
Differential Revision: https://reviews.llvm.org/D89441
Just like llvm.assume, there are a lot of cases where we can just ignore llvm.experimental.noalias.scope.decl.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D93042
This is a restricted version of the combine in `DAGCombiner::MatchLoadCombine`.
(See D27861)
This tries to recognize patterns like below (assuming a little-endian target):
```
s8* x = ...
s32 val = a[0] | (a[1] << 8) | (a[2] << 16) | (a[3] << 24)
->
s32 val = *((i32)a)
s8* x = ...
s32 val = a[3] | (a[2] << 8) | (a[1] << 16) | (a[0] << 24)
->
s32 val = BSWAP(*((s32)a))
```
(This patch also handles the big-endian target case as well, in which the first
example above has a BSWAP, and the second example above does not.)
To recognize the pattern, this searches from the last G_OR in the expression
tree.
E.g.
```
Reg Reg
\ /
OR_1 Reg
\ /
OR_2
\ Reg
.. /
Root
```
Each non-OR register in the tree is put in a list. Each register in the list is
then checked to see if it's an appropriate load + shift logic.
If every register is a load + potentially a shift, the combine checks if those
loads + shifts, when OR'd together, are equivalent to a wide load (possibly with
a BSWAP.)
To simplify things, this patch
(1) Only handles G_ZEXTLOADs (which appear to be the common case)
(2) Only works in a single MachineBasicBlock
(3) Only handles G_SHL as the bit twiddling to stick the small load into a
specific location
An IR example of this is here: https://godbolt.org/z/4sP9Pj (lifted from
test/CodeGen/AArch64/load-combine.ll)
At -Os on AArch64, this is a 0.5% code size improvement for CTMark/sqlite3,
and a 0.4% improvement for CTMark/7zip-benchmark.
Also fix a bug in `isPredecessor` which caused it to fail whenever `DefMI` was
the first instruction in the block.
Differential Revision: https://reviews.llvm.org/D94350
This is a additional bug fix for c5be0e0cc0. The distance for
the spill instructions is wrong in previous patch.
Differential Revision: https://reviews.llvm.org/D94772
This recommits 2c51bef76cbf0149101b9e7c7c658b4a58657929.
I've fixed the broken check line from when I renamed the test function.
Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
Differential Revision: https://reviews.llvm.org/D94149
Currently when spilling statepoint register operands in FixupStatepoints
we do not pay attention that it might be `undef`. We just generate a
spill, which may lead to verifier error because we have a use without def.
To handle it, let FixupStateponts ignore `undef` register operands
completely and change them to some constant value when generating
stack map. Use same value as used by ISel for this purpose (0xFEFEFEFE).
Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D94703
Use the KnownBits icmp comparisons to determine when a ISD::UMIN/UMAX op is unnecessary should either op be known to be ULT/ULE or UGT/UGE than the other.
Differential Revision: https://reviews.llvm.org/D94532
RISC-V would like to use a struct of scalable vectors to return multiple
values from intrinsics. This woud also be needed for target independent
intrinsics like llvm.sadd.overflow.
This patch removes the existing restriction for this. I've modified
StructType::isSized to consider a struct containing scalable vectors
as unsized so the verifier won't allow loads/stores/allocas of these
structs.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D94142
Reassociating some patterns to generate more fma instructions to
reduce register pressure.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92071
This patch promotes result integer type of FP_TO_XINT in expanding.
So crash in conversion from ppc_fp128 to i1 will be fixed.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D92473
add one use check to lookThruCopyLike.
The root node is safe to be deleted if we are sure that every
definition in the copy chain only has one use.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D92069