This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.
The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
As opposed to going through the Aliasee type.
For opaque pointers, we're trying to remove uses of PointerType::getElementType().
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101715
This patch introduces a helper to obtain an iterator range for the
PHI-like recipes in a block.
Reviewed By: Ayal
Differential Revision: https://reviews.llvm.org/D100101
This update supports the following transformation:
```
select(extract(mul_with_overflow(a, _), _), (a == 0), false)
=>
and(extract(mul_with_overflow(a, _), _), (a == 0))
```
which is correct because if `a` was poison the select's condition was
also poison.
This update is splitted from D101423.
X32 uses 32-bit ELF object files with 32-bit alignment, so the
.note.gnu.property section needs to be emitted as it is for X86.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D101689
Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.
This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.
I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)
Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
{F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
{F16422601}
* throughput is within reason
{F16422607}
I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}
I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
I'm not really sure if it's a modelling defect that needs to be fixed,
or it's a defect of measurments.
* Pipe distributions are probably bad :)
I can't do much here until AMD allows that to be fixed
by documenting the appropriate counters and updating libpfm
That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!
Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.
Differential Revision: https://reviews.llvm.org/D94395
This seems to be a leftover from when the BackedgeTakenInfo
stored multiple exit counts with manual memory management. At
some point this was switchted to a simple vector, and there should
be no need to micro-manage the clearing anymore. We can simply
drop the loop from the map and the the destructor do its job.
Apply the same logic used to check if CMPXCHG nodes should be expanded
at -O0: the register allocator may end up spilling some register in
between the atomic load/store pairs, breaking the atomicity and possibly
stalling the execution.
Fixes PR48017
Reviewed By: efriedman
Differential Revision: https://reviews.llvm.org/D101163
Pre-requisite for D101163, the `NOLSE-0O` case shows registers being
spilled inside the rmw loop.
Use two separate prefixes for the `LSE-O0` case as some outputs differ
only by a comment that update_llc_test_checks.py ignores but lit does
not, causing the test to fail unexpectedly when run.
Commit 70c433a184a54819835e54c62c3e6891e7069861 added this
test case that has -stop-before that mentions a pass that is
only added for non-release builds. Add the requirement for asserts.
The problem is the following. With fast8, we broke an important
invariant when loading shadows. A wide shadow of 64 bits used to
correspond to 4 application bytes with fast16; so, generating a single
load was okay since those 4 application bytes would share a single
origin. Now, using fast8, a wide shadow of 64 bits corresponds to 8
application bytes that should be backed by 2 origins (but we kept
generating just one).
Let’s say our wide shadow is 64-bit and consists of the following:
0xABCDEFGH. To check if we need the second origin value, we could do
the following (on the 64-bit wide shadow) case:
- bitwise shift the wide shadow left by 32 bits (yielding 0xEFGH0000)
- push the result along with the first origin load to the shadow/origin vectors
- load the second 32-bit origin of the 64-bit wide shadow
- push the wide shadow along with the second origin to the shadow/origin vectors.
The combineOrigins would then select the second origin if the wide
shadow is of the form 0xABCDE0000. The tests illustrate how this
change affects the generated bitcode.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D101584
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
Related to PR50172.
Protects us against regressions after we will start doing cttz(zext(x)) -> zext(cttz(x)) transformation in the middle-end.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D101662
This reverts commit 3d27b5d28aabf8516aa1fefc78a6878b89a992f0.
Broke one of the PPC tests, which I didn't see because I usually build with
only the x86/AARch64 targets enabled... oops.
https://lab.llvm.org/buildbot#builders/109/builds/13834
llvm/test/CodeGen/PowerPC/expand-foldable-isel.ll
This is a long overdue cleanup. Not every use is eliminated, I stuck to uses
that were directly being called from select(), and not the render functions.
Differential Revision: https://reviews.llvm.org/D101590
This extends the early-ifcvt pass to avoid a few more cases where the resulting
select instructions would have matching operands. Additionally, we now use TII
to determine "sameness" of the operands so that as TII gets smarter, so too
will ifcvt.
The attached test case was bugpoint-reduced down from CINT2000/252.eon in the
test-suite. See: https://clang.godbolt.org/z/WvnrcrGEn
Differential Revision: https://reviews.llvm.org/D101508
The right symbol flag mask is ~0x7, not ~0xf.
Also emit string names for the other flags (we were missing some).
Reviewed By: #lld-macho, gkm
Differential Revision: https://reviews.llvm.org/D101548
Relative look table converter pass caused an issue when full lto
is enabled (reported in https://reviews.llvm.org/D94355).
This patch disables that pass from full lto pre-link phase optimization
pipeline until the issue is fixed.
Differential Revision: https://reviews.llvm.org/D101664
SIPreEmitPeephole did not try to remove redundant s_set_gpr_idx_*
instructions in blocks that end with a conditional branch instruction.
This seems like a simple oversight.
Differential Revision: https://reviews.llvm.org/D101629
The current code can scan an unlimited number of instructions,
if the containing basic block is very large. The test case from
PR50155 contains a basic block with approximately 100k instructions.
To avoid this, limit the number of instructions we inspect. At
the same time, drop the limit on the number of basic blocks, as
this will be implicitly limited by the number of instructions as
well.
This introduces a flag that aborts if we ever reduce to IR that fails
the verifier.
Reviewed By: swamulism, arichardson
Differential Revision: https://reviews.llvm.org/D101279