The original change was pushed in main as commit f7a23ecece52.
It was then reverted by commit a04f01bab2 because it caused linker failures
on buildbots that don't build the AMDGPU target.
--
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
More details are available in the original commit log message (f7a23ecece52).
Differential Revision: https://reviews.llvm.org/D104149
We already have this fold:
fadd float poison, 1.0 --> poison
...via ConstantFolding, so this makes the behavior consistent
if the other operand(s) are non-constant.
The fold for undef was added before poison existed as a
value/type in IR.
This came up in D102673 / D103169
because we're trying to sort out the more complicated handling
for constrained math ops.
We should have the handling for the regular instructions done
first, so we can build on that (or diverge as needed).
Differential Revision: https://reviews.llvm.org/D104383
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46996
One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)
This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
ret [ 1 x %T1 ] zeroinitializer
}
```
We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.
This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.
You get this:
```
renamable $d0 = FMOVD0
$w0 = COPY killed renamable $d0
```
Rather than this:
```
$d0 = FMOVD0
$w0 = COPY $wzr
```
The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.
This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.
Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.
New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D104123
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.
Fixes PR50725
I believe that after https://reviews.llvm.org/D102355 the behaviour of --print-source-context-lines has changed.
Before: --print-source-context-lines=3 prints 4 lines.
After: --print-source-context-lines=3 prints 3 lines.
Adjust the example in the docs for this change and make the testing a little more robust.
Differential Revision: https://reviews.llvm.org/D104114
Under MVE v4f32 and v8f16 vectors should be using v4i1/v8i1 predicates
for the setcc result type, as they have predicated registers for those
types. Setting this correctly prevents some inefficient optimizations
from happening.
This patch adds support for a new field in the FileHeader, which states
the name to use for the section header string table. This also allows
combining the string table with another string table in the object, e.g.
the symbol name string table. The field is optional. By default,
.shstrtab will continue to be used.
This partially fixes https://bugs.llvm.org/show_bug.cgi?id=50506.
Reviewed by: Higuoxing
Differential Revision: https://reviews.llvm.org/D104035
There was an off-by-one error caused by an index (which included an
index for the null section header) being used to check against the size
of a list of sections (which didn't include the null section header).
This is a partial fix for https://bugs.llvm.org/show_bug.cgi?id=50506.
Reviewed by: MaskRay
Differential Revision: https://reviews.llvm.org/D104098
This does not affect codegen, which only tests these flags on Pseudo
instructions, but might help llvm-mca which has to work with Real
instructions. In particular setting LGKM_CNT on DS instructions helps
with the problem identified in D104149.
Differential Revision: https://reviews.llvm.org/D104293
Factor out repeated !cast<SOP*_Pseudo>(NAME) into a new "defvar ps",
just to improve readability and maintainability.
Differential Revision: https://reviews.llvm.org/D104306
Addresses FIXMEs in TPC-based EH-frame and debug object registration code by
replacing manual argument serialization with WrapperFunction utility calls.
As per the discussion in D103818, so far, this does not appear to be worthwhile.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103818
As per (committed without review) @reames's rGac81cb7e6dde9b0890ee1780eae94ab96743569b change,
we are now allowed to produce `ptrtoint` for non-integral pointers.
This will unblock further unbreaking of SCEV regarding int-vs-pointer type confusion.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D104322
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.
Differential Revision: https://reviews.llvm.org/D103988
This reverts commit 434fed5aff5e62460e2e984c7cc2674c12779b1e.
Post commit review suggested to use another implmenentation.
Detailed can be found in the review.
This was broken in ba1509da7b89c850c89f0f98afbab375794cd3c8. The Win64
frame would not perform the setup of the Swift async context parameter
but would tear down the setup in the epilogue resulting in crashes.
This ensures that we do the full setup when we do the tear down.
Although this is non-conforming to the Win64 calling convention, it
corrects the setup and exposes the actual issue that the change
introduced: incorrect frame setup.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104246
The original implementation calculating UserBonus uses operator ^, which means XOR in C++
language.
At the first glance of reviewing, I thought it should be power, my bad.
It doesn't make sense to use XOR here. So I believe it should be a
carelessness as I made.
Test Plan: check-all
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D104282
This allows overriding the `CMAKE_CXX_VISIBILITY_PRESET` on the command line. For example, setting the value to `default` lets PIC LLVM static libraries be converted to DSOs, without the need to rebuild LLVM with BUILD_SHARED_LIBS=ON.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104168
Verifying opaque pointer as function parameter when using with `byval`, `byref`,
`inalloca`, `preallocated`.
Differential Revision: https://reviews.llvm.org/D104309
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.
This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.
In this patch, we use a "common" linkage for this variable so that
it will be GC'ed by the linker.
Differential Revision: https://reviews.llvm.org/D103988
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.
Implementation details:
llvm-mca does its best to extract relevant register, resource, and memory
information from every MCInst when lowering them to an mca::Instruction. It then
uses this information to detect dependencies and simulate stalls within the
pipeline. For some instructions, the information that gets captured within the
mca::Instruction is not enough for mca to simulate them properly. In these
cases, there are two main possibilities:
1. The instruction has a dependency that isn’t detected by mca.
2. mca is incorrectly enforcing a dependency that shouldn’t exist.
For the rest of this discussion, I will be focusing on (1), but I have put some
thought into (2) and I may revisit it in the future.
So we have an instruction that has dependencies that aren’t picked up by mca.
The basic idea for both pipelines in mca is that when an instruction wants to be
dispatched, we first check for register hazards and then we check for resource
hazards. This is where CB is injected. If no register or resource hazards have
been detected, we make a call to CustomBehaviour::checkCustomHazard() to give
the target specific CB the chance to detect and enforce any custom dependencies.
The return value for checkCustomHazaard() is an unsigned int representing the
(minimum) number of cycles that the instruction needs to stall for. It’s fine to
underestimate this value because when StallCycles gets down to 0, we’ll end up
checking for all the hazards again before the instruction is actually
dispatched. However, it’s important not to overestimate the value and the more
accurate your estimate is, the more efficient mca’s execution can be.
In general, for checkCustomHazard() to be able to detect these custom
dependencies, it needs information about the current instruction and also all of
the instructions that are still executing within the pipeline. The mca pipeline
uses mca::Instruction rather than MCInst and the current information encoded
within each mca::Instruction isn’t sufficient for my use cases. I had to add a
few extra attributes to the mca::Instruction class and have them get set by the
MCInst during instruction building. For example, the current mca::Instruction
doesn’t know its opcode, and it also doesn’t know anything about its immediate
operands (both of which I had to add to the class).
With information about the current instruction, a list of all currently
executing instructions, and some target specific objects (MCSubtargetInfo and
MCInstrInfo which the base CB class has references to), developers should be
able to detect and enforce most custom dependencies within checkCustomHazard. If
you need more information than is present in the mca::Instruction, feel free to
add attributes to that class and have them set during the lowering sequence from
MCInst.
Fortunately, in the in-order pipeline, it’s very convenient for us to pass these
arguments to checkCustomHazard. The hazard checking is taken care of within
InOrderIssueStage::canExecute(). This function takes a const InstRef as a
parameter (representing the instruction that currently wants to be dispatched)
and the InOrderIssueStage class maintains a SmallVector<InstRef, 4> which holds
all of the currently executing instructions. For the out-of-order pipeline, it’s
a bit trickier to get the list of executing instructions and this is why I have
held off on implementing it myself. This is the main topic I will bring up when
I eventually make a post to discuss and ask for feedback.
CB is a base class where targets implement their own derived classes. If a
target specific CB does not exist (or we pass in the -disable-cb flag), the base
class is used. This base class trivially returns 0 from its checkCustomHazard()
implementation (meaning that the current instruction needs to stall for 0 cycles
aka no hazard is detected). For this reason, targets or users who choose not to
use CB shouldn’t see any negative impacts to accuracy or performance (in
comparison to pre-patch llvm-mca).
Differential Revision: https://reviews.llvm.org/D104149
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Relanding with a check for self-referential values.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
c98ebda325c996b3a12f4fded0368734dc0fe28a Rename fp-op fusion option (yet
again) for compatibility with GCC option.
The comment in the header should be updated too to avoid confusion.
We have added STXVP/LXVP for spilling and restoring the registers
but we neglected to add FI elimination code for these. The result
is that we end up producing impossible MachineInstr's that have
register operands in place of immediates.
Remove the compatibility spellings of `OF_{None,Text,Append}` that
were left behind by 1f67a3cba9b09636c56e2109d8a35ae96dc15782.
No functionality change here, just an API cleanup.
Differential Revision: https://reviews.llvm.org/D101506
Fixes:
- PR36507 Floating point varargs are not handled correctly with
-mno-implicit-float
- PR48528 __builtin_va_start assumes it can pass SSE registers
when using -Xclang -msoft-float -Xclang -no-implicit-float
On x86_64, floating-point parameters are normally passed in XMM
registers. For va_start, we spill those to memory so va_arg can
find them. There is an interaction here with -msoft-float and
-no-implicit-float:
When -msoft-float is in effect, instead of passing floating-point
parameters in XMM registers, they are passed in general-purpose
registers.
When -no-implicit-float is in effect, it "disables implicit
floating-point instructions" (per the LangRef). The intended
effect is to not have the compiler generate floating-point code
unless explicit floating-point operations are present in the
source code, but what exactly counts as an explicit floating-point
operation is not specified. The existing behavior of LLVM here has
led to some surprises and PRs.
This change modifies the behavior as follows:
| soft | no-implicit | old behavior | new behavior |
| no | no | spill XMM regs | spill XMM regs |
| yes | no | don't spill XMM | don't spill XMM |
| no | yes | don't spill XMM | spill XMM regs |
| yes | yes | assert | don't spill XMM |
In particular, this avoids the assert that happens when
-msoft-float and -no-implicit-float are both in effect. This
seems like a perfectly reasonable combination: If we don't want
to rely on hardware floating-point support, we want to both
avoid using float registers to pass parameters and avoid having
the compiler generate floating-point code that wasn't in the
original program. Instead of crashing the compiler, the new
behavior is to not synthesize floating-point code in this
case. This fixes PR48528.
The other interesting case is when -no-implicit-float is in
effect, but -msoft-float is not. In that case, any floating-point
parameters that are present will be in XMM registers, and so we
have to spill them to correctly handle those. This fixes
PR36507. The spill is conditional on %al indicating that
parameters are present in XMM registers, so no floating-point
code will be executed unless the function is called with
floating-point parameters.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D104001
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.
Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.
Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.
Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.
Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D104099
This commit adds nodes that might not always be used, which the
expensive checks builder does not like. Reverting for now to think up a
better way of handling it.