If a global object is listed in `@llvm.used`, place it in a unique section with
the `SHF_GNU_RETAIN` flag. The section is a GC root under `ld --gc-sections`
with LLD>=13 or GNU ld>=2.36.
For front ends which do not expect to see multiple sections of the same name,
consider emitting `@llvm.compiler.used` instead of `@llvm.used`.
SHF_GNU_RETAIN is restricted to ELFOSABI_GNU and ELFOSABI_FREEBSD in
binutils. We don't do the restriction - see the rationale in D95749.
The integrated assembler has supported SHF_GNU_RETAIN since D95730.
GNU as>=2.36 supports section flag 'R'.
We don't need to worry about GNU ld support because older GNU ld just ignores
the unknown SHF_GNU_RETAIN.
With this change, `__attribute__((retain))` functions/variables emitted
by clang will get the SHF_GNU_RETAIN flag.
Differential Revision: https://reviews.llvm.org/D97448
There are existing patterns for FMOVHi, FMOVSi, and FMOVDi in
AArch64InstrFormats.td.
Importing these allows us to remove the manual selection code for FMOV.
It also allows us to select FMOVHi for non-zero constants when we have full
fp-16 support.
Refactor some of the code in AArch64InstrFormats.td so that we can create
equivalent custom renderers in GlobalISel.
Differential Revision: https://reviews.llvm.org/D97511
Many optimizers (e.g. GlobalOpt/ConstantMerge) do not respect linker semantics
for comdat and may not discard the sections as a unit.
The interconnected `__llvm_prf_{cnts,data}` sections (in comdat for ELF)
are similar to D97432: `__profd_` is not directly referenced, so
`__profd_` may be discarded while `__profc_` is retained, breaking the
interconnection. We currently conservatively add all such sections to
`llvm.used` and let the linker do GC for ELF.
In D97448, we will change GlobalObject's in the llvm.used list to use SHF_GNU_RETAIN,
causing the metadata sections to be unnecessarily retained (some `check-profile` tests check for GC).
Use `llvm.compiler.used` to retain the current GC behavior.
Differential Revision: https://reviews.llvm.org/D97585
And clarify in the "writing a pass" docs that both the legacy and new
PMs are being used for the codegen/optimization pipelines.
Reviewed By: ychen, asbirlea
Differential Revision: https://reviews.llvm.org/D97515
Previously we would use a bundle to hint the register allocator to not
overwrite the pointers in a sequence of loads to avoid breaking soft
clauses. This bundling was based on a fuzzy register pressure
heuristic, so we could not guarantee using more registers than are
really available. This would result in register allocator failing on
unsatisfiable bundles. Use a kill to artificially extend the live
ranges, so we can always succeed at register allocation even if it
means extra spills in the worst case.
This seems to capture most of the benefit of the bundle while avoiding
most of the risk presented by the bundle. However the lit tests do
show a handful of regressions. In some cases with sequences of
volatile loads, unused load components end up getting reallocated to
the next load which forces a wait between. There are also a few small
scheduling regressions where a hazard used to be avoided, and one
spill torture test which for some reason nearly doubles the stack
usage. There is also a bit of noise from leftover kills (it may make
sense for post-RA pseudos to strip all of these out).
Using ComputeNumSignBits or computeKnownBits we might be able
to determine that overflow is impossible.
This especially helps after type legalization if the type was
promoted from a type with half the bits or more. Type legalization
conservatively creates a promoted smulo/umulo and an overflow
check for the promoted bits. The overflow from the promoted
smulo/umulo is ORed with the result of the promoted bits
overflow check. Proving that the promoted smulo/umulo can never
overflow will leave us with just the promoted bits overflow check.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97160
This will allow identifying exactly how many shadow bytes were used
during compilation, for when fast8 mode is introduced.
Also, it will provide a consistent matching point for instrumentation
tests so that the exact llvm type used (i8 or i16) for the shadow can
be replaced by a pattern substitution. This is handy for tests with
multiple prefixes.
Reviewed by: stephan.yichao.zhao, morehouse
Differential Revision: https://reviews.llvm.org/D97409
Use `APInt` to convert a 32-bit or 64-bit immediate to an `APFloat` rather than
`bit_cast` to a `float` or `double` to avoid going through host floating-point and
potentially changing the bit pattern of NaNs.
Differential Revision: https://reviews.llvm.org/D97490
This is a case D97178 tried to solve but missed. D97178 could not handle
the case when
multiple consecutive delegates are generated:
- Before:
```
block
br (a)
try
catch
end_try
end_block
<- (a)
```
- After
```
block
br (a)
try
...
try
try
catch
end_try
<- (a)
delegate
delegate
end_block
<- (b)
```
(The `br` should point to (b) now)
D97178 assumed `end_block` exists two BBs later than `end_try`, because
it assumed the order as `end_try` BB -> `delegate` BB -> `end_block` BB.
But it turned out there can be multiple `delegate`s in between. This
patch changes the logic so we just search from `end_try` BB until we
find `end_block`.
Fixes https://github.com/emscripten-core/emscripten/issues/13515.
(More precisely, fixes
https://github.com/emscripten-core/emscripten/issues/13515#issuecomment-784711318.)
Reviewed By: dschuff, tlively
Differential Revision: https://reviews.llvm.org/D97569
If a region was not constrained by a high register pressure
and was not rescheduled without clustering we can skip
rescheduling it ClusteredLowOccupancyReschedule stage.
This improves scheduling speed by 25% on some kernels.
Differential Revision: https://reviews.llvm.org/D97506
We are attempting rescheduling without load store clustering
if occupancy limits were not met with clustering. Skip this
for regions which do not have any loads or stores at all.
In a set of kernels I am experimenting with this improves
scheduling time by ~30%.
Differential Revision: https://reviews.llvm.org/D97342
- This patch introduces a different assembler dialect ("hlasm") for z/OS.
The default dialect has now been given the "att" dialect name. For this
appropriate changes have been added to SystemZ.td.
- This patch also makes a few changes to SystemZInstrFormats.td which
restrict a few condition code mnemonics to just the "att" dialect
variant (he, le, lh, nhe, nle, nlh). These extended condition code
mnemonics are not available in HLASM.
- A new private function has been introduced in SystemZAsmParser.cpp to
return the assembler dialect set in SystemZMCAsmInfo.cpp. The reason we
couldn't/haven't explicitly queried the overriden getAssemblerDialect
function from AsmParser is outlined in this thread here. This returned
dialect is directly passed onto the relevant matcher functions which taken
in a variantID, so that the matcher functions can appropriately choose an
instruction based on the variant.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D94250
This will allow FrameIndex as the base address instead of
emitting a separate ADDI from isel. eliminateFrameIndex will likely turn
it back into an ADDI, but this makes things consistent with the
SDPatterns and VLPatterns.
I only tested one case for simplicity. I can test more if reviewers
want.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D97221
This clarifies the interface of the matchSimpleRecurrence helper introduced in 8020be0b8 for non-commutative operators. After ebd3aeba, I realized the original way I framed the routine was inconsistent. For shifts, we only matched the the LHS form, but for sub we matched both and the caller wanted that information. So, instead, we now consistently match both forms for non-commutative operators and the caller becomes responsible for filtering if needed. I tried to put a clear warning in the header because I suspect the RHS form of e.g. a sub recurrence is non-obvious for most folks. (It was for me.)
This is a part of https://reviews.llvm.org/D95835.
Each customized function has two wrappers. The
first one dfsw is for the normal shadow propagation. The second one dfso is used
when origin tracking is on. It calls the first one, and does additional
origin propagation. Which one to use can be decided at instrumentation
time. This is to ensure minimal additional overhead when origin tracking
is off.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D97483
`__sancov_pcs` parallels the other metadata section(s). While some optimizers
(e.g. GlobalDCE) respect linker semantics for comdat and retain or discard the
sections as a unit, some (e.g. GlobalOpt/ConstantMerge) do not. So we have to
conservatively retain all unconditionally in the compiler.
When a comdat is used, the COFF/ELF linkers' GC semantics ensure the
associated parallel array elements are retained or discarded together,
so `llvm.compiler.used` is sufficient.
Otherwise (MachO (see rL311955/rL311959), COFF special case where comdat is not
used), we have to use `llvm.used` to conservatively make all sections retain by
the linker. This will fix the Windows problem once internal linkage
GlobalObject's in `llvm.used` are retained via `/INCLUDE:`.
Reviewed By: morehouse, vitalybuka
Differential Revision: https://reviews.llvm.org/D97432
This allows GlobalISel to use this instruction where available. I assume
SelectionDAG always selects s_xnor_b32 so it isn't affected by this
change.
Differential Revision: https://reviews.llvm.org/D97560
num_children is "last_index" + 1, thus
num_children + 1 = "last_index" + 2
this worked anyway because the index of `$$dereference$$` would work as long as
it was past the last index.
Add deref support to `llvm::Optional` in `lldbDataFormatters.py`.
This creates a synthetic provider that adds dereference support, but otherwise proxies all access to the underlying value.
With this change, an optional value can be displayed by running `v *someOptional`, and its contents can be accessed with the arrow `operator->`, for example `v someOpt->HasThing`. This matches expressions usable from expression evaluation.
See also D97165 and D97524.
Differential Revision: https://reviews.llvm.org/D97525
collectBitParts uses int8_t for the bit indices, leaving a 128-bit limit.
We already test for this before calling collectBitParts, but rGb94c215592bd added truncate handling which meant we could end up processing wider integers.
Thanks to @manojgupta for the repro.
This is useful for projects that pull in libcxx and libcxxabi and build
them using out-of-tree build files, but don't make them sibling
directories (or don't call the sibling directories libcxx and libcxxabi
for some reason).
Fixes PR49313.
Differential Revision: https://reviews.llvm.org/D97379
Spilling and reloading AMX registers are expensive. We allow PTILEZEROV
and PTILELOADDV to be rematerializable to avoid the register spilling.
Reviewed By: LuoYuanke
Differential Revision: https://reviews.llvm.org/D97453
This patch modifies TryToSinkInstruction in the InstCombine pass, to prevent
redundant debug intrinsics from being produced, and also prevent the intrinsics
from being emitted in an incorrect order. It does this by ensuring that when
this pass sinks an instruction and creates clones of the debug intrinsics that
use that instruction, it inserts those debug intrinsics in their original order,
and only inserts the last debug intrinsic for each variable in the Instruction's
block.
Differential revision: https://reviews.llvm.org/D95463
So far we had no way to distinguish between JITLink and RuntimeDyld in lli. Instead, we used implicit knowledge that RuntimeDyld would be used for linking ELF. In order to get D97337 to work with lli though, we have to move on and allow JITLink for ELF. This patch uses extensible RTTI to allow external clients to add their own layers without touching the LLVM sources.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D97338
As discussed on D97478. The removal of the custom tag causes some changes in the add/sub-overflow expansion as it no longer expands to sat-arith codegen.