This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils. We
would previously aggressively resolve the relocation. With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.
GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation. Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.
This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling. In the process, we also sink the "special" handling for debug
information down into the RISCV target. Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation. However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.
Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference. This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.
This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.
This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).
Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143
Reviewed By: nickdesaulniers, maskray
Differential Revision: https://reviews.llvm.org/D103539
Reapply the commit which previously caused build failures due to the
mismatched template arguments between the return type and the returned
SmallVector.
This reverts commit e8991caea8690ec2d17b0b7e1c29bf0da6609076.
In D103169 I'm adding to InstSimplify support for NaN to constrained
intrinsics that have a regular FP IR instruction counterpart. Precommit
the tests for clarity when that ticket lands.
This patch fixes an issue where builds of programs with multiple dbg.values
with DIArgList locations could have non-deterministic output. This issue
was caused by ReplaceableMetadataImpl::getAllArgListUsers, which
returned DIArgList pointers in a random order; the output of this
function would later be used to insert dbg.values, causing the order of
insertion to be non-deterministic. This patch changes getAllArgListUsers
to return pointers in a fixed order.
Differential Revision: https://reviews.llvm.org/D104105
This patch adds a test case showing how a single extra .loc can cause
binary differences when using -x86-pad-for-align=true.
The issue has been discussed in D94542, PR42138, PR48742.
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
Also use the default LLVM target as default for dlltool. This
matches how GNU dlltool behaves; it is compiled with one default
target, which is used if no option is provided.
Extend the anonymous namespace in the implementation file instead
of using static functions.
Based on a patch by Mateusz Mikuła.
The effect of the default LLVM target, if neither the -m option
nor a tool triple prefix is provided, isn't tested, as we can't
make assumptions about what it is set to.
(We could make the default be forced to one of the four supported
architectures if the default triple is another arch, and then just
test that llvm-dlltool without an -m option is able to produce an
import library, without checking the actual architecture though.)
Differential Revision: https://reviews.llvm.org/D104212
The existing tests only test that some options (but not e.g. arm)
are accepted, but it doesn't test their functional effect of
affecting the generated object files.
Differential Revision: https://reviews.llvm.org/D104215
The --coff-exports option to llvm-readobj prints the exported symbols
from a DLL/EXE, it doesn't do anything with regards to an import
library.
Differential Revision: https://reviews.llvm.org/D104214
With the exception of `frem`, this patch supports the current set of VP
floating-point binary intrinsics by lowering them to to RVV instructions. It
does so by using the existing `RISCVISD *_VL` custom nodes as an intermediate
layer. Both scalable and fixed-length vectors are supported by using this
method.
The `frem` node is unsupported due to a lack of available instructions. For
fixed-length vectors we could scalarize but that option is not (currently)
available for scalable-vector types. The support is intentionally left out so
it equivalent for both vector types.
The matching of vector/scalar forms is currently lacking, as scalable vector
types do not lower to the custom `VFMV_V_F_VL` node. We could either make
floating-point scalable vector splats lower to this node, or support the
matching of multiple kinds of splat via a `ComplexPattern`, much like we do for
integer types.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D104237
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.
Differential Revision: https://reviews.llvm.org/D104255
The llvm::demangle is currently used by llvm-objdump and llvm-readobj,
so this effectively adds support for Rust v0 mangling to those
applications.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D104340
At the moment, we create insertelement instructions directly after
LastInst when inserting scalar values in a vector in
VPTransformState::get.
This results in invalid IR when LastInst is a phi, followed by another
phi. In that case, the new instructions should be inserted just after
the last PHI node in the block.
At the moment, I don't think the problematic case can be triggered, but
it can happen once predicate regions are merged and multiple
VPredInstPHI recipes are in the same block (D100260).
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D104188
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.
The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.
One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.
Differential Revision: https://reviews.llvm.org/D99439
Recently added convertConstantExprsToInstructions() does not handle
a case when a same ConstantExpr used multiple times in the same
instruction. A first use is replaced and the rest of the uses in the
instruction are replaced as well with the replaceUsesOfWith(). Then
function attempts to replace a constant already destroyed.
So far this interface is only used by the AMDGPU BE.
Differential Revision: https://reviews.llvm.org/D104425
Currently the value is only used when calling `F->viewCFG()` which is missing out on its potential and usefulness.
So I added the check to the printer passes as well.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D102011
Put the dtor of mca::CustomBehaviour into the cpp file to avoid
undefined vtable when linking libLLVMMCACustomBehaviourAMDGPU as shared
library.
Differential Revision: https://reviews.llvm.org/D104401
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:
1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.
2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.
3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.
4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.
5. Better readability.
6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.
Note that the current patch does include any work for #3. There will be follow-up changes.
For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of LBRProfileSection is reduced by 35%.
For #4, I have seen general counts quality for SPEC2017 is improved by 10%.
Reviewed By: wenlei, wlei, wmi
Differential Revision: https://reviews.llvm.org/D104129
I'm not sure what behavior we want if the FP environment is
not default (also not sure if there's a way to enumerate
the full list of intrinsics programmatically), but currently
these are all defaulting to 'false' (doesn't propagate).
There is no need to differentiate whether `UseSegments` is true or
false. Unifying the cases makes the behavior closer to BinaryWriter.
This improves compatibility with objcopy because SHF_ALLOC sections not in
a PT_LOAD will not be skipped. Such cases are usually erroneous input, though.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D104186
The original change was pushed in main as commit f7a23ecece52.
It was then reverted by commit a04f01bab2 because it caused linker failures
on buildbots that don't build the AMDGPU target.
--
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
More details are available in the original commit log message (f7a23ecece52).
Differential Revision: https://reviews.llvm.org/D104149
We already have this fold:
fadd float poison, 1.0 --> poison
...via ConstantFolding, so this makes the behavior consistent
if the other operand(s) are non-constant.
The fold for undef was added before poison existed as a
value/type in IR.
This came up in D102673 / D103169
because we're trying to sort out the more complicated handling
for constrained math ops.
We should have the handling for the regular instructions done
first, so we can build on that (or diverge as needed).
Differential Revision: https://reviews.llvm.org/D104383
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.
Fixes PR50725
I believe that after https://reviews.llvm.org/D102355 the behaviour of --print-source-context-lines has changed.
Before: --print-source-context-lines=3 prints 4 lines.
After: --print-source-context-lines=3 prints 3 lines.
Adjust the example in the docs for this change and make the testing a little more robust.
Differential Revision: https://reviews.llvm.org/D104114