The problem is the following. With fast8, we broke an important
invariant when loading shadows. A wide shadow of 64 bits used to
correspond to 4 application bytes with fast16; so, generating a single
load was okay since those 4 application bytes would share a single
origin. Now, using fast8, a wide shadow of 64 bits corresponds to 8
application bytes that should be backed by 2 origins (but we kept
generating just one).
Let’s say our wide shadow is 64-bit and consists of the following:
0xABCDEFGH. To check if we need the second origin value, we could do
the following (on the 64-bit wide shadow) case:
- bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
- push the result along with the first origin load to the shadow/origin vectors
- load the second 32-bit origin of the 64-bit wide shadow
- push the wide shadow along with the second origin to the shadow/origin vectors.
The combineOrigins would then select the second origin if the wide
shadow is of the form 0xABCDE0000. The tests illustrate how this
change affects the generated bitcode.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D101584
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
Related to PR50172.
Protects us against regressions after we will start doing cttz(zext(x)) -> zext(cttz(x)) transformation in the middle-end.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D101662
This reverts commit 3d27b5d28aabf8516aa1fefc78a6878b89a992f0.
Broke one of the PPC tests, which I didn't see because I usually build with
only the x86/AARch64 targets enabled... oops.
https://lab.llvm.org/buildbot#builders/109/builds/13834
llvm/test/CodeGen/PowerPC/expand-foldable-isel.ll
This is a long overdue cleanup. Not every use is eliminated, I stuck to uses
that were directly being called from select(), and not the render functions.
Differential Revision: https://reviews.llvm.org/D101590
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
The right symbol flag mask is ~0x7, not ~0xf.
Also emit string names for the other flags (we were missing some).
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101548
Relative look table converter pass caused an issue when full lto
is enabled (reported in https://reviews.llvm.org/D94355).
This patch disables that pass from full lto pre-link phase optimization
pipeline until the issue is fixed.
Differential Revision: https://reviews.llvm.org/D101664
SIPreEmitPeephole did not try to remove redundant s_set_gpr_idx_*
instructions in blocks that end with a conditional branch instruction.
This seems like a simple oversight.
Differential Revision: https://reviews.llvm.org/D101629
The current code can scan an unlimited number of instructions,
if the containing basic block is very large. The test case from
PR50155 contains a basic block with approximately 100k instructions.
To avoid this, limit the number of instructions we inspect. At
the same time, drop the limit on the number of basic blocks, as
this will be implicitly limited by the number of instructions as
well.
This introduces a flag that aborts if we ever reduce to IR that fails
the verifier.
Reviewed By: swamulism, arichardson
Differential Revision: https://reviews.llvm.org/D101279
Fixes the following warnings observerd when building the experimental
m68k backend (-DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD="M68k"):
../lib/Target/M68k/M68kMachineFunction.h:71:3: warning: explicitly
defaulted default constructor is implicitly deleted
[-Wdefaulted-function-deleted]
M68kMachineFunctionInfo() = default;
^
../lib/Target/M68k/M68kMachineFunction.h:24:20: note: default
constructor of 'M68kMachineFunctionInfo' is implicitly deleted because
field 'MF' of reference type 'llvm::MachineFunction &' would not be
initialized
MachineFunction &MF;
^
In file included from ../lib/Target/M68k/M68kISelLowering.cpp:18:
In file included from ../lib/Target/M68k/M68kSubtarget.h:17:
../lib/Target/M68k/M68kFrameLowering.h:60:8: warning:
'llvm::M68kFrameLowering::emitCalleeSavedFrameMoves' hides overloaded
virtual functions [-Woverloaded-virtual]
void emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
^
../include/llvm/CodeGen/TargetFrameLowering.h:215:3: note: hidden
overloaded virtual function
'llvm::TargetFrameLowering::emitCalleeSavedFrameMoves' declared here:
different number of parameters (2 vs 3)
emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
^
../include/llvm/CodeGen/TargetFrameLowering.h:218:16: note: hidden
overloaded virtual function
'llvm::TargetFrameLowering::emitCalleeSavedFrameMoves' declared here:
different number of parameters (4 vs 3)
virtual void emitCalleeSavedFrameMoves(MachineBasicBlock &MBB,
^
pr/50071
Reviewed By: myhsu
Differential Revision: https://reviews.llvm.org/D101588
Stop using the compatibility spellings of `OF_{None,Text,Append}`
left behind by 1f67a3cba9b09636c56e2109d8a35ae96dc15782. A follow-up
will remove them.
Differential Revision: https://reviews.llvm.org/D101650
These operations don't exist natively, so just let the
target-independent code expand to plain shifts.
The generated sequences could probably be optimized a bit more, but
they seem good enough for now.
Differential Revision: https://reviews.llvm.org/D101574
Move some types in STLExtras.h which are named and behave identically to
STL types from future standards into a dedicated header. This keeps them
organized (they are not "extras" in the same sense as most types in
STLExtras.h are) and fixes circular dependencies in future patches.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D100668
If the extracts from the non-power-2 vectors are recognized as shuffles,
need some extra checks to not crash cost calculations if trying to gext
the ecost for subvector extracts. In this case need to check carefully
that we do not exit out of bounds of the original vector, otherwise the
TTI's cost model will crash on assert.
Differential Revision: https://reviews.llvm.org/D101477
This patch merges STR<S,D,Q,W,X>pre-STR<S,D,Q,W,X>ui and
LDR<S,D,Q,W,X>pre-LDR<S,D,Q,W,X>ui instruction pairs into a single
STP<S,D,Q,W,X>pre and LDP<S,D,Q,W,X>pre instruction, respectively.
For each pair, there is a MIR test that verifies this optimization.
Differential Revision: https://reviews.llvm.org/D99272
Change-Id: Ie97a20c8c716c08492fe229c22e14e3c98ef08b7
The functionality in SVEIntrinsicOpts::isReinterpretToSVBool was moved in
D101302, however the original now unused function was not removed (NFC).
Differential Revision: https://reviews.llvm.org/D101642
atomicrmw instructions are expanded by AtomicExpandPass before register allocation
into cmpxchg loops. Register allocation can insert spills between the exclusive loads
and stores, which invalidates the exclusive monitor and can lead to infinite loops.
To avoid this, reimplement atomicrmw operations as pseudo-instructions and expand them
after register allocation.
Floating point legalisation:
f16 ATOMIC_LOAD_FADD(*f16, f16) is legalised to
f32 ATOMIC_LOAD_FADD(*i16, f32) and then eventually
f32 ATOMIC_LOAD_FADD_16(*i16, f32)
Differential Revision: https://reviews.llvm.org/D101164
This patch fixes two bugs that arise when a 'defm' inherits from a multiclass
and also from a class with assertions.
Differential Revision: https://reviews.llvm.org/D101626
This patch introduces a new infrastructure that is used to select the load and
store instructions in the PPC backend.
The primary motivation is that the current implementation of selecting load/stores
is dependent on the ordering of patterns in TableGen. Given this limitation, we
are not able to easily and reliably generate the P10 prefixed load and stores
instructions (such as when the immediates that fit within 34-bits). This
refactoring is meant to provide us with more control over the patterns/different
forms to exploit, as well as eliminating dependency of pattern declaration in TableGen.
The idea of this refactoring is that it introduces a set of addressing modes that
correspond to different instruction formats of a particular load and store
instruction, along with a set of common flags that describes a load/store.
Whenever a load/store instruction is being selected, we analyze the instruction
and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.
This patch is the first of a series of patches to be committed - it contains the
initial implementation of the refactored load/store selection infrastructure and
also updates P8/P9 patterns to adopt this infrastructure. The idea is that
incremental patches will add more implementation and support, and eventually
the old implementation will be removed.
Differential Revision: https://reviews.llvm.org/D93370
Summary:
This patch implements the backend implementation of adding global variables
directly to the table of contents (TOC), rather than adding the address of the
variable to the TOC.
Currently, this patch will look for the "toc-data" attribute on symbols in the
IR, and then add those symbols to the TOC.
ATM, this is implemented for 32 bit AIX.
Reviewers: sfertile
Differential Revision: https://reviews.llvm.org/D101178
This patch implements expansion of llvm.vp.* intrinsics
(https://llvm.org/docs/LangRef.html#vector-predication-intrinsics).
VP expansion is required for targets that do not implement VP code
generation. Since expansion is controllable with TTI, targets can switch
on the VP intrinsics they do support in their backend offering a smooth
transition strategy for VP code generation (VE, RISC-V V, ARM SVE,
AVX512, ..).
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D78203
The only effect of the optimization is to remove s_set_gpr_idx_*
instructions, and update_mir_test_checks.py always inserts CHECK: rather
than CHECK-NEXT: checks, so without this implicit negative check, the
tests would always pass even if the optimization did nothing.
Differential Revision: https://reviews.llvm.org/D101622
Early exit from method DispatchStage::isAvailable() if the dispatch group is
already full. Not all instructions declare at least one uOP.
Fixes PR50174.
Previous attempt to fix infinite recursion in min/max reassociation was not fully successful (D100170). Newly discovered failing case is due to not properly handled when there is a single use. It should be processed separately from 2 uses case.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D101359
Hoisting and sinking instructions out of conditional blocks enables
additional vectorization by:
1. Executing memory accesses unconditionally.
2. Reducing the number of instructions that need predication.
After disabling early hoisting / sinking, we miss out on a few
vectorization opportunities. One of those is causing a ~10% performance
regression in one of the Geekbench benchmarks on AArch64.
This patch tires to recover the regression by running hoisting/sinking
as part of a SimplifyCFG run after LoopRotate and before LoopVectorize.
Note that in the legacy pass-manager, we run LoopRotate just before
vectorization again and there's no SimplifyCFG run in between, so the
sinking/hoisting may impact the later run on LoopRotate. But the impact
should be limited and the benefit of hosting/sinking at this stage
should outweigh the risk of not rotating.
Compile-time impact looks slightly positive for most cases.
http://llvm-compile-time-tracker.com/compare.php?from=2ea7fb7b1c045a7d60fcccf3df3ebb26aa3699e5&to=e58b4a763c691da651f25996aad619cb3d946faf&stat=instructions
NewPM-O3: geomean -0.19%
NewPM-ReleaseThinLTO: geoman -0.54%
NewPM-ReleaseLTO-g: geomean -0.03%
With a few benchmarks seeing a notable increase, but also some
improvements.
Alternative to D101290.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101468