This moves SinkIntoLoop from MachineLICM to MachineSink. The motivation for
this work is that hoisting is a canonicalisation transformation, but we do not
really have a good story to sink instructions back if that is better, e.g. to
reduce live-ranges, register pressure and spilling. This has been discussed a
few times on the list, the latest thread is:
https://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html
There it was pointed out that we have the LoopSink IR pass, but that works on
IR, lacks register pressure informatiom, and is focused on profile guided
optimisations, and then we have MachineLICM and MachineSink that both perform
sinking. MachineLICM is more about hoisting and CSE'ing of hoisted
instructions. It also contained a very incomplete and disabled-by-default
SinkIntoLoop feature, which we now move to MachineSink.
Getting loop-sinking to do something useful is going to be at least a 3-step
approach:
1) This is just moving the code and is almost a NFC, but contains a bug fix.
This uses helper function `isLoopInvariant` that was factored out in D94082 and
added to MachineLoop.
2) A first functional change to make loop-sink a little bit less restrictive,
which it really is at the moment, is the change in D94308. This lets it do
more (alias) analysis using functions in MachineSink, making it a bit more
powerful. Nothing changes much: still off by default. But it shows that
MachineSink is a better home for this, and it starts using its functionality
like `hasStoreBetween`, and in the next step we can use `isProfitableToSinkTo`.
3) This is the going to be he interesting step: decision making when and how
many instructions to sink. This will be driven by the register pressure, and
deciding if reducing live-ranges and loop sinking will help in better
performance.
4) Once we are happy with 3), this should be enabled by default, that should be
the end goal of this exercise.
Differential Revision: https://reviews.llvm.org/D93694
This adds sadd.sat, uadd.sat, ssub.sat and usub.sat costs for AArch64,
similar to how they were recently added for ARM.
Differential Revision: https://reviews.llvm.org/D95292
This patch fixes some crashes coming from
`RISCVISelLowering::getSetCCResultType`, which would occasionally return
an EVT constructed from an invalid MVT, which has a null Type pointer.
The attached test shows this happening currently for some fixed-length
vectors, which hit this issue when the V extension was enabled, even
though they're not legal types under the V extension. The fix was also
pre-emptively extended to scalable vectors which can't be represented as
an MVT, even though a test case couldn't be found for them.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95434
... and similarly for some other cases. This is for consistency and to
make it easier to search for mentions of a particular architecture.
Differential Revision: https://reviews.llvm.org/D95453
This adds some simple fp16 scalar_to_vector patterns, preventing a
selection failure if this came up.
Differential Revision: https://reviews.llvm.org/D95427
This makes G_SADDE and G_SSUBE legal in preparation for further work
legalizing overflowing operations. It's fine that they don't have an
instruction selector implementation yet, because G_UADDE and G_USUBE are
already legal on AArch64 without an instruction selector implementation. This
completes the set of G_[SU]{ADD,SUB}[EO] operations on AArch64.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D95325
A fix has been implemented in the ittap repo to fix an error about implicit fallthrough in a switch that was occurring during self build.
A new tag has been created for that fix. This is to update the tag.
Reviewed By: bader
Differential Revision: https://reviews.llvm.org/D95462
Patch by Zahira Ammarguellat.
AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.
While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.
Remove common instructions from rv64 tests since they are now
covered by the rv64 run lines in the rv32 tests.
Add rv32-only* tests for a few cases that aren't common between
r32 and rv64.
Addresses review feedback from D95150.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95272
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Add a new `raw_pwrite_ostream` variant, `buffer_unique_ostream`, which
is like `buffer_ostream` but with unique ownership of the stream it's
wrapping. Use this in CompilerInstance to simplify the ownership of
non-seeking output streams, avoiding logic sprawled around to deal with
them specially.
This also simplifies future work to encapsulate output files in a
different class.
Differential Revision: https://reviews.llvm.org/D93260
Just use the existing `Known.sextInReg` implementation.
- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.
Differential Revision: https://reviews.llvm.org/D95484
This patch improves the availability for variables stored in the
coroutine frame by emitting an alloca to hold the pointer to the frame
object and rewriting dbg.declare intrinsics to point inside the frame
object using salvaged DIExpressions. Finally, a new alloca is created
in the funclet to hold the FramePtr pointer to ensure that it is
available throughout the entire function at -O0.
This path also effectively reverts D90772. The testcase updates
highlight nicely how every removed CHECK for a dbg.value is preceded
by a new CHECK for a dbg.declare.
Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews!
Differential Revision: https://reviews.llvm.org/D93497
rdar://71866936
STRT, STRHT, and STRBT are store instructions and their source register
$Rt should be treated as an input operand instead of an output operand.
This should fix things (e.g., liveness tracking in LivePhysRegs) if
these instructions were used in CodeGen.
Differential Revision: https://reviews.llvm.org/D95074
As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D95457
Revert the change to use APInt::isSignedIntN from
5ff5cf8e057782e3e648ecf5ccf1d9990b53ee90.
Its clear that the games we were playing to avoid the topological
sort aren't working. So just fix it once and for all.
Fixes PR48888.
Before this change, when reading ELF file, elfabi determines number of
entries in .dynsym by reading the .gnu.hash section. This change makes
elfabi read section headers directly first. This change allows elfabi
works on ELF files which do not have .gnu.hash sections.
Differential Revision: https://reviews.llvm.org/D93362
There are two use cases.
Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).
Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727https://sourceware.org/bugzilla/show_bug.cgi?id=22969)
Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).
This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.
`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.
Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).
Differential Revision: https://reviews.llvm.org/D85474
Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.
The possible settings are:
- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On
GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".
The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.
Differential Revision: https://reviews.llvm.org/D85882
This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.
Differential Revision: https://reviews.llvm.org/D94820
Recent shouldAssumeDSOLocal changes (introduced by 961f31d8ad14c66)
do not take in consideration the relocation model anymore. The ARM
fast-isel pass uses the function return to set whether a global symbol
is loaded indirectly or not, and without the expected information
llvm now generates an extra load for following code:
```
$ cat test.ll
@__asan_option_detect_stack_use_after_return = external global i32
define dso_local i32 @main(i32 %argc, i8** %argv) #0 {
entry:
%0 = load i32, i32* @__asan_option_detect_stack_use_after_return,
align 4
%1 = icmp ne i32 %0, 0
br i1 %1, label %2, label %3
2:
ret i32 0
3:
ret i32 1
}
attributes #0 = { noinline optnone }
$ lcc test.ll -o -
[...]
main:
.fnstart
[...]
movw r0, :lower16:__asan_option_detect_stack_use_after_return
movt r0, :upper16:__asan_option_detect_stack_use_after_return
ldr r0, [r0]
ldr r0, [r0]
cmp r0, #0
[...]
```
And without 'optnone' it produces:
```
[...]
main:
.fnstart
[...]
movw r0, :lower16:__asan_option_detect_stack_use_after_return
movt r0, :upper16:__asan_option_detect_stack_use_after_return
ldr r0, [r0]
clz r0, r0
lsr r0, r0, #5
bx lr
[...]
```
This triggered a lot of invalid memory access in sanitizers for
arm-linux-gnueabihf. I checked this patch both a stage1 built with
gcc and a stage2 bootstrap and it fixes all the Linux sanitizers
issues.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D95379
239cfbccb0509da1a08d9e746706013b732e646b add support for legalizing
i8/i16 UDIV/UREM/SDIV to use *W instructions. So we need to truncate
to i8/i16 if we're legalizing one of those.
The initial problem with the remaining bot config was resolved.
We can now use Python3. Let's use `os.cpu_count()` to cleanup this
helper.
Differential Revision: https://reviews.llvm.org/D94734
If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.
Try to predict that we will have an FP later, although this is ugly.
The existing test has less FMF than we might expect if
our FMF was fixed (on all FP values), so this additional
test is intended to check propagation in a more "normal"
example.
The switch must set the predicate correctly; anything else
should lead to unreachable/assert.
I'm trying to fix FMF propagation here and the callers,
so this is a preliminary cleanup.
This patch adds additional checks to avoid partial unswitching
in cases where it won't be profitable, e.g. because the path directly
exits the loop anyways.