Fix for https://bugs.llvm.org/show_bug.cgi?id=40846.
This adds a combine for cases where a (a + b) < a style overflow
check is performed, but with a + b being the result of
uadd.with.overflow, so the overflow result is also already available
and we can just use it. Subsequently GVN/CSE will deduplicate the extracts.
We can run into this situation if you have both a uadd.with.overflow
and a manual add + overflow check in the same function (on the same
operands), in which case GVN will rewrite the add to the with.overflow
result and leave you with this pattern.
The implementation is a bit ugly because I'm handling the various
canonicalization edge cases.
This does not yet handle the negated version of this pattern.
Differential Revision: https://reviews.llvm.org/D58644
If the pointer was loaded/stored before the null check, the check
is redundant and can be removed. For now the optimizers do not
remove the nullptr check, see https://gcc.godbolt.org/z/H2r5GG.
The patch allows to use more nonnull constraints. Also, it found
one more optimization in some PowerPC test. This is my first llvm
review, I am free to any comments.
Differential Revision: https://reviews.llvm.org/D71177
Fix for https://bugs.llvm.org/show_bug.cgi?id=44236. This code was
originally introduced in rG36512330041201e10f5429361bbd79b1afac1ea1.
However, the attribute copying was done in the wrong place (in general
call replacement, not thunk generation) and a proper fix was
implemented in D12581.
Previously this code was just unnecessary but harmless (because
FunctionComparator ensured that the attributes of the two functions
are exactly the same), but since byval was changed to accept a type
this copying is actively wrong and may result in malformed IR.
Differential Revision: https://reviews.llvm.org/D71173
This is an alternate fix for the bug discussed in D70595.
This also includes minimal tests for other in-tree targets
to show the problem more generally.
We check the number of uses as a predicate for whether some
value is free to negate, but that use count can change as we
rewrite the expression in getNegatedExpression(). So something
that was marked free to negate during the cost evaluation
phase becomes not free to negate during the rewrite phase (or
the inverse - something that was not free becomes free).
This can lead to a crash/assert because we expect that
everything in an expression that is negatible to be handled
in the corresponding code within getNegatedExpression().
This patch skips the use check during the rewrite phase.
So we determine that some expression isNegatibleForFree
(identically to without this patch), but during the rewrite,
don't rely on use counts to decide how to create the optimal
expression.
Differential Revision: https://reviews.llvm.org/D70975
Summary:
The current da printer shows the dependence without indicating
which instructions are being considered as the src vs dst. It
also silently ignores call instructions, despite the fact that
they create confused dependence edges to other memory
instructions. This patch addresses these two issues plus a
couple of minor non-functional improvements.
Authored By: bmahjour
Reviewer: dmgreen, fhahn, philip.pfaffe, chandlerc
Reviewed By: dmgreen, fhahn
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71088
AMDGPU was the last in tree target to use this tablegen mode. I plan to
split up the global intrinsic enum similar to the way that clang
diagnostics are split up today. I don't plan to build on this mode.
Reviewers: arsenm, echristo, efriedma
Reviewed By: echristo
Differential Revision: https://reviews.llvm.org/D71318
Before z14, we did not have any FMA instruction for 128-bit
floating-point, so the @llvm.fma.f128 intrinsic needs to be
expanded to a libcall on those platforms.
This worked correctly for regular FMA, but was implemented
incorrectly for the strict version. This was not noticed
because we did not have test coverage for this case.
This patch fixes that incorrect expansion and adds the
missing test cases.
Summary:
This patch adds a method to determine if a loop is in rotated form (the latch is
an exiting block). It also modifies the getLoopGuardBranch method to use this
new method. This method can also be used in Loopfusion. Once this patch lands I
will make the corresponding changes there.
Reviewers: jdoerfert, Meinersbur, dmgreen, etiotto, Whitney, fhahn, hfinkel
Reviewed By: Meinersbur
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65958
This simplifies code where no extra details are required
Also don't write out detail when it is empty.
Differential Revision: https://reviews.llvm.org/D71347
There are a few places that check specific string attributes have
particular values, and assert if they are something else. The verifier
should catch these kinds of cases.
In some cases, we can rename a store operand, in order to enable pairing
of stores. For store pairs, that cannot be merged because the first
tored register is defined in between the second store, we try to find
suitable rename register.
First, we check if we can rename the given register:
1. The first store register must be killed at the store, which means we
do not have to rename instructions after the first store.
2. We scan backwards from the first store, to find the definition of the
stored register and check all uses in between are renamable. Along
they way, we collect the minimal register classes of the uses for
overlapping (sub/super)registers.
Second, we try to find an available register from the minimal physical
register class of the original register. A suitable register must not be
1. defined before FirstMI
2. between the previous definition of the register to rename
3. a callee saved register.
We use KILL flags to clear defined registers while scanning from the
beginning to the end of the block.
This triggers quite often, here are the top changes for MultiSource,
SPEC2000, SPEC2006 compiled with -O3 for iOS:
Metric: aarch64-ldst-opt.NumPairCreated
Program base patch diff
test-suite...nch/fourinarow/fourinarow.test 2.00 39.00 1850.0%
test-suite...s/ASC_Sequoia/IRSmk/IRSmk.test 46.00 80.00 73.9%
test-suite...chmarks/Olden/power/power.test 70.00 96.00 37.1%
test-suite...cations/hexxagon/hexxagon.test 29.00 39.00 34.5%
test-suite...nchmarks/McCat/05-eks/eks.test 100.00 132.00 32.0%
test-suite.../Trimaran/enc-rc4/enc-rc4.test 46.00 59.00 28.3%
test-suite...T2006/473.astar/473.astar.test 160.00 200.00 25.0%
test-suite.../Trimaran/enc-md5/enc-md5.test 8.00 10.00 25.0%
test-suite...telecomm-gsm/telecomm-gsm.test 113.00 139.00 23.0%
test-suite...ediabench/gsm/toast/toast.test 113.00 139.00 23.0%
test-suite...Source/Benchmarks/sim/sim.test 91.00 111.00 22.0%
test-suite...C/CFP2000/179.art/179.art.test 41.00 49.00 19.5%
test-suite...peg2/mpeg2dec/mpeg2decode.test 245.00 279.00 13.9%
test-suite...marks/Olden/health/health.test 16.00 18.00 12.5%
test-suite...ks/Prolangs-C/cdecl/cdecl.test 90.00 101.00 12.2%
test-suite...fice-ispell/office-ispell.test 91.00 100.00 9.9%
test-suite...oxyApps-C/miniGMG/miniGMG.test 430.00 465.00 8.1%
test-suite...lowfish/security-blowfish.test 39.00 42.00 7.7%
test-suite.../Applications/spiff/spiff.test 42.00 45.00 7.1%
test-suite...arks/mafft/pairlocalalign.test 2473.00 2646.00 7.0%
test-suite.../VersaBench/ecbdes/ecbdes.test 29.00 31.00 6.9%
test-suite...nch/beamformer/beamformer.test 220.00 235.00 6.8%
test-suite...CFP2000/177.mesa/177.mesa.test 2110.00 2252.00 6.7%
test-suite...ve-susan/automotive-susan.test 109.00 116.00 6.4%
test-suite...s-C/unix-smail/unix-smail.test 65.00 69.00 6.2%
test-suite...CI_Purple/SMG2000/smg2000.test 1194.00 1265.00 5.9%
test-suite.../Benchmarks/nbench/nbench.test 472.00 500.00 5.9%
test-suite...oxyApps-C/miniAMR/miniAMR.test 248.00 262.00 5.6%
test-suite...quoia/CrystalMk/CrystalMk.test 18.00 19.00 5.6%
test-suite...rks/tramp3d-v4/tramp3d-v4.test 7331.00 7710.00 5.2%
test-suite.../Benchmarks/Bullet/bullet.test 5651.00 5938.00 5.1%
test-suite...ternal/HMMER/hmmcalibrate.test 750.00 788.00 5.1%
test-suite...T2006/456.hmmer/456.hmmer.test 764.00 802.00 5.0%
test-suite...ications/JM/ldecod/ldecod.test 1028.00 1079.00 5.0%
test-suite...CFP2006/444.namd/444.namd.test 1368.00 1434.00 4.8%
test-suite...marks/7zip/7zip-benchmark.test 4471.00 4685.00 4.8%
test-suite...6/464.h264ref/464.h264ref.test 3122.00 3271.00 4.8%
test-suite...pplications/oggenc/oggenc.test 1497.00 1565.00 4.5%
test-suite...T2000/300.twolf/300.twolf.test 742.00 774.00 4.3%
test-suite.../Prolangs-C/loader/loader.test 24.00 25.00 4.2%
test-suite...0.perlbench/400.perlbench.test 1983.00 2058.00 3.8%
test-suite...ications/JM/lencod/lencod.test 4612.00 4785.00 3.8%
test-suite...yApps-C++/PENNANT/PENNANT.test 995.00 1032.00 3.7%
test-suite...arks/VersaBench/dbms/dbms.test 54.00 56.00 3.7%
Reviewers: efriedma, thegameg, samparker, dmgreen, paquette, evandro
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70450
A number of the --debug-* options in llvm-dwarfdump are not particularly
well tested. In some cases, the option is only tested as part of testing
another feature, or a specific part of the section that the options
dump. This change adds four new tests to address some of these holes. It
is not aiming to address every hole however.
I kept the --debug-line switch test separate to X86/brief.s because the
latter only considers the parts of the line table that are affected by
verbose printing, thus missing out things like the header and different
values for things like the Line, Column etc registers.
Reviewed by: JDevlieghere
Differential Revision: https://reviews.llvm.org/D71276
The Isa register is a uint8_t, but at least on Windows this is
internally an unsigned char, which meant that prior to this patch it got
formatted as an ASCII character, rather than a decimal number. This
patch fixes this by casting it to a uint64_t before printing. I did it
this way instead of using a uint8_t formatter because a) it is simpler,
and b) it allows us to change the internal type of Isa in the future
without this code breaking.
I also took the opportunity to test the printing of the other standard
opcodes.
Reviewed by: probinson
Differential Revision: https://reviews.llvm.org/D71274
Summary: Rollback of parts of D71213. After digging more into the code I think we should leave 0 when creating the instructions (CreateMemcpy, CreateMaskedStore, CreateMaskedLoad). It's probably fine for MemorySanitizer because Alignement is resolved but I'm having a hard time convincing myself it has no impact at all (although tests are passing).
Reviewers: courbet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71332
After recent changes it is now seems possible to get rid of
printing '\n' before each error and warning. This makes the output
cleaner.
Differential revision: https://reviews.llvm.org/D71246
Summary:
These allow you to get and set the operator of a dag node, without
affecting its list of arguments.
`!getop` is slightly fiddly because in many contexts you need its
return value to have a static type more specific than 'any record'. It
works to say `!cast<BaseClass>(!getop(...))`, but it's cumbersome, so
I made `!getop` take an optional type suffix itself, so that can be
written as the shorter `!getop<BaseClass>(...)`.
Reviewers: hfinkel, nhaehnle
Reviewed By: nhaehnle
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71191
Summary:
Adds the following intrinsics:
- llvm.aarch64.sve.ldnt1
- llvm.aarch64.sve.stnt1
This patch creates masked loads and stores with the
MONonTemporal flag set when used with the intrinsics above.
Reviewers: sdesmalen, paulwalker-arm, dancgr, mgudim, efriedma, rengolin
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71000
After creating a low-overhead loop, the loop update instruction was still
lingering around hurting performance. This removes dead loop update
instructions, which in our case are mostly SUBS instructions.
To support this, some helper functions were added to MachineLoopUtils and
ReachingDefAnalysis to analyse live-ins of loop exit blocks and find uses
before a particular loop instruction, respectively.
This is a first version that removes a SUBS instruction when there are no other
uses inside and outside the loop block, but there are some more interesting
cases in test/CodeGen/Thumb2/LowOverheadLoops/mve-tail-data-types.ll which
shows that there is room for improvement. For example, we can't handle this
case yet:
..
dlstp.32 lr, r2
.LBB0_1:
mov r3, r2
subs r2, #4
vldrh.u32 q2, [r1], #8
vmov q1, q0
vmla.u32 q0, q2, r0
letp lr, .LBB0_1
@ %bb.2:
vctp.32 r3
..
which is a lot more tricky because r2 is not only used by the subs, but also by
the mov to r3, which is used outside the low-overhead loop by the vctp
instruction, and that requires a bit of a different approach, and I will follow
up on this.
Differential Revision: https://reviews.llvm.org/D71007
This adds the family of `vshlq_n` and `vshrq_n` ACLE intrinsics, which
shift every lane of a vector left or right by a compile-time
immediate. They mostly work by expanding to the IR `shl`, `lshr` and
`ashr` operations, with their second operand being a vector splat of
the immediate.
There's a fiddly special case, though. ACLE specifies that the
immediate in `vshrq_n` can take values up to //and including// the bit
size of the vector lane. But LLVM IR thinks that shifting right by the
full size of the lane is UB, and feels free to replace the `lshr` with
an `undef` half way through the optimization pipeline. Hence, to keep
this legal in source code, I have to detect it at codegen time.
Logical (unsigned) right shifts by the element size are handled by
simply emitting the zero vector; arithmetic ones are converted into a
shift of one bit less, which will always give the same output.
In order to do that check, I also had to enhance the tablegen
MveEmitter so that it can cope with converting a builtin function's
operand into a bare integer to pass to a code-generating subfunction.
Previously the only bare integers it knew how to handle were flags
generated from within `arm_mve.td`.
Reviewers: dmgreen, miyuki, MarkMurrayARM, ostannard
Reviewed By: dmgreen, MarkMurrayARM
Subscribers: echristo, hokein, rdhindsa, kristof.beyls, hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D71065
Enable the TypePromotion pass my default (again).
This patch was originally committed in 393dacacf7e7.
This patch was reverted in a38396939c54.
Differential Revision: https://reviews.llvm.org/D70998
Summary:
In the function `EarlyIfPredicator::shouldConvertIf()`, we call
`TII->isProfitableToIfCvt()` with `BranchProbability::getUnknown()`, it may
cause the potential assertion error for those hook which use `BranchProbability`
in `isProfitableToIfCvt()`, for example `SystemZ`.
`SystemZ` use `Probability < BranchProbability(1, 8))` in the function
`SystemZInstrInfo::isProfitableToIfCvt()`, if we call this function with
`BranchProbability::getUnknown()`, it will cause assertion error.
This patch is to fix the potential bug.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D71273
This iterator range just includes physical registers and register masks,
which are interesting when dealing with register liveness.
Reviewers: evandro, t.p.northover, paquette, MatzeB, arsenm
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D70562
It is discussed here https://reviews.llvm.org/D71118#inline-643172
Currently when a version is empty, llvm-readelf prints:
"000: 0 (*local*) 2 (<corrupt>)"
But GNU readelf does not treat empty section as corrupt.
There is no sense in having empty versions anyways it seems, but
this change is for consistency with GNU.
Differential revision: https://reviews.llvm.org/D71243
ARMWinEHPrinter was already designed to handle linked PE images
(since d2941b43f40d), but resolving symbols didn't consistently
take the image base into account (as linked images seldom have a
symbol table, except for in MinGW setups).
Win64EHDumper wasn't really designed to handle linked images (it would
crash if executed on such a file), but a few concepts (getSymbol,
taking a virtual address instead of a relocation, and
getSectionContaining for finding the section containing a certain
virtual address) can be borrowed from ARMWinEHPrinter.
Adjust ARMWinEHPrinter to print the address of the exception handler
routine as a VA instead of an RVA, consistently with other addresses
in the same printout, and make Win64EHDumper print addresses similarly
for image cases.
Differential Revision: https://reviews.llvm.org/D71303
PowerPC has instruction to do the semantics of this piece of code:
vector int foo(vector int m, vector int n) {
return (m + n + 1) >> 1;
}
This patch is adding the match rule to select it.
Differential Revision: https://reviews.llvm.org/D71002
I think this is no longer needed. The system should take care
of legalizing any new nodes that are added. I think this might
have been needed prior to r371709 or r307053.