No changes relative to last time, but after a mitigation for
an AMDGPU regression landed.
---
If SimplifyInstruction() does not succeed in simplifying the
instruction, it will compute the known bits of the instruction
in the hope that all bits are known and the instruction can be
folded to a constant. I have removed a similar optimization
from InstCombine in D75801, and would like to drop this one as well.
On average, we spend ~1% of total compile-time performing this
known bits calculation. However, if we introduce some additional
statistics for known bits computations and how many of them succeed
in simplifying the instruction we get (on test-suite):
instsimplify.NumKnownBits: 216
instsimplify.NumKnownBitsComputed: 13828375
valuetracking.NumKnownBitsComputed: 45860806
Out of ~14M known bits calculations (accounting for approximately
one third of all known bits calculations), only 0.0015% succeed in
producing a constant. Those cases where we do succeed to compute
all known bits will get folded by other passes like InstCombine
later. On test-suite, only lencod.test and GCC-C-execute-pr44858.test
show a hash difference after this change. On lencod we see an
improvement (a loop phi is optimized away), on the GCC torture
test a regression (a function return value is determined only
after IPSCCP, preventing propagation from a noinline function.)
There are various regressions in InstSimplify tests. However, all
of these cases are already handled by InstCombine, and corresponding
tests have already been added there.
Differential Revision: https://reviews.llvm.org/D79294
DW_OP_call_ref is the only operation that has an operand which depends
on the DWARF format. The patch fixes handling that operation in DWARF64
units.
Differential Revision: https://reviews.llvm.org/D79501
This is preparation for D79294, which removes an expensive
InstSimplify optimization, on the assumption that it will be
picked up by InstCombine instead. Of course, this does not hold
up if a backend performs non-trivial IR expansions without running
a canonicalization pipeline afterwards, which turned up as an
issue in the context of AMDGPU div/rem expansion.
This patch mitigates the issue by explicitly performing a known
bits calculation where it matters. No test changes, as those would
only be visible after the other patch lands.
Differential Revision: https://reviews.llvm.org/D79596
Check for `DynSecSize % sizeof(Elf_Dyn) != 0` is unneeded in this context.
1. If the .dynamic section is acquired from program headers, the .dynamic
section is "cut off" by
```
makeArrayRef(..., Phdr.p_filesz / sizeof(Elf_Dyn));
DynSeSize = Phdr.p_filesz;
```
2. If the .dynamic section is acquired from section headers, the .dynamic
section is checked in `getSectionContentsAsArray<Elf_Dyn>(&Sec)`.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D79560
This patch removes FC0.ExitBlock and FC1GuardBlock from DT and LI
after fusion of guarded loops. They become unreachable and LI
verification failed when they happened to be inside another loop.
Reviewed By: kbarton
Differential Revision: https://reviews.llvm.org/D78679
CorrectExtraCFGEdges function.
The latter was a workaround for "Various pieces of code" leaving bogus
extra CFG edges in place. Where by "various" it meant only
IfConverter::MergeBlocks, which failed to clear all of the successors
of dead blocks it emptied out. This wouldn't matter a whole lot,
except that the dead blocks remained listed as predecessors of
still-useful blocks, inhibiting optimizations.
This fix slightly changed two thumb tests, because the correct CFG
successors allowed for the "diamond" if-conversion pattern to be
detected, when it could only use "simple" before.
Additionally, the removal of a now-redundant call to analyzeBranch
(with AllowModify=true) in BranchFolder::OptimizeFunction caused a
later check for an empty block in BranchFolder::OptimizeBlock to
fail. Correct this by moving the call to analyzeBranch in
OptimizeBlock higher.
Differential Revision: https://reviews.llvm.org/D79527
In a recent patch we introduced a problem with abstract attributes that
were assumed dead at some point. Since `Attributor::updateAA` was
introduced in 95e0d28b71e42c9b7cd77c96f728311981a021f6, we did not
remember the dependence on the liveness AA when an abstract attribute
was assumed dead and therefore not updated.
Explicit reproducer added in liveness.ll.
---
Single run of the Attributor module and then CGSCC pass (oldPM)
for SPASS/clause.c (~10k LLVM-IR loc):
Before:
```
calls to allocation functions: 509242 (345483/s)
temporary memory allocations: 98666 (66937/s)
peak heap memory consumption: 18.60MB
peak RSS (including heaptrack overhead): 103.29MB
total memory leaked: 269.10KB
```
After:
```
calls to allocation functions: 529332 (355494/s)
temporary memory allocations: 102107 (68574/s)
peak heap memory consumption: 19.40MB
peak RSS (including heaptrack overhead): 102.79MB
total memory leaked: 269.10KB
```
Difference:
```
calls to allocation functions: 20090 (1339333/s)
temporary memory allocations: 3441 (229400/s)
peak heap memory consumption: 801.45KB
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B
```
Once we hit AT_NULL, we need to bail out of the loop; not just the
enclosing switch. This fixes basic usage (e.g. `cc --version`) when
AT_EXECPATH isn't present on older branches (e.g. under
emu-user-static, at the moment), where we would previously run off
the end of ::environ.
Patch By: kevans
Reviewed By: arichardson
Differential Revision: https://reviews.llvm.org/D79239
Summary:
Update the check for the default exit block to not only check that the
terminator is not unreachable, but also check that unreachable block has
*only* the unreachable instruction.
Reviewers: chandlerc
Subscribers: hiraditya, uabelho, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78277
Summary:
This patch fix the following issues with visitExtractElementInst:
1. Restrict VectorUtils::findScalarElement to fixed-length vector.
For scalable type, the number of elements in shuffle mask is
unknown at compile-time.
2. Fix out-of-range calculation for fixed-length vector.
3. Skip scalable type when analysis rely on fixed number of elements.
4. Add unit tests to check functionality of extractelement for scalable type.
Reviewers: sdesmalen, efriedma, spatel, nikic
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78267
See PR45812 for motivation.
No explicit test since I couldn't figure out how to get the
current disk drive in lower case into a form in lit where I could
mkdir it and cd to it. But the change does have test coverage in
that I can remove the case normalization in lit, and tests failed
on several bots (and for me locally if in a pwd with a lower-case
drive) without that normalization prior to this change.
Differential Revision: https://reviews.llvm.org/D79531
Summary:
This patch fixes the following issues in visitInsertElementInst:
1. Bail out for scalable type when analysis requires fixed size number of vector elements.
2. Use cast<FixedVectorType> to get vector number of elements. This ensure assertion
on scalable vector type.
3. For scalable type, avoid folding a chain of insertelement into splat:
insertelt(insertelt(insertelt(insertelt X, %k, 0), %k, 1), %k, 2) ...
->
shufflevector(insertelt(X, %k, 0), undef, zero)
The length of scalable vector is unknown at compile-time, therefore we don't know if
given insertelement sequence is valid for splat.
Reviewers: sdesmalen, efriedma, spatel, nikic
Reviewed By: sdesmalen, efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78895
These were testing byval private kernel arguments, which doesn't make
any sense and has never been used. There didn't seem to be any tests
for real value struct arguments, which are.
The original patch (rG86dfbc676ebe) exposed an existing bug:
we could wrongly cast a constant expression to BinaryOperator
because the pattern matching allows that. This adds a check
for that case, and there's a reduced test case to verify no
crashing.
Original commit message:
This builds on the or-reduction bailout that was added with D67841.
We still do not have IR-level load combining, although that could
be a target-specific enhancement for -vector-combiner.
The heuristic is narrowly defined to catch the motivating case from
PR39538:
https://bugs.llvm.org/show_bug.cgi?id=39538
...while preserving existing functionality.
That is, there's an unmodified test of pure load/zext/store that is
not seen in this patch at llvm/test/Transforms/SLPVectorizer/X86/cast.ll.
That's the reason for the logic difference to require the 'or'
instructions. The chances that vectorization would actually help a
memory-bound sequence like that seem small, but it looks nicer with:
vpmovzxwd (%rsi), %xmm0
vmovdqu %xmm0, (%rdi)
rather than:
movzwl (%rsi), %eax
movl %eax, (%rdi)
...
In the motivating test, we avoid creating a vector mess that is
unrecoverable in the backend, and SDAG forms the expected bswap
instructions after load combining:
movzbl (%rdi), %eax
vmovd %eax, %xmm0
movzbl 1(%rdi), %eax
vmovd %eax, %xmm1
movzbl 2(%rdi), %eax
vpinsrb $4, 4(%rdi), %xmm0, %xmm0
vpinsrb $8, 8(%rdi), %xmm0, %xmm0
vpinsrb $12, 12(%rdi), %xmm0, %xmm0
vmovd %eax, %xmm2
movzbl 3(%rdi), %eax
vpinsrb $1, 5(%rdi), %xmm1, %xmm1
vpinsrb $2, 9(%rdi), %xmm1, %xmm1
vpinsrb $3, 13(%rdi), %xmm1, %xmm1
vpslld $24, %xmm0, %xmm0
vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
vpslld $16, %xmm1, %xmm1
vpor %xmm0, %xmm1, %xmm0
vpinsrb $1, 6(%rdi), %xmm2, %xmm1
vmovd %eax, %xmm2
vpinsrb $2, 10(%rdi), %xmm1, %xmm1
vpinsrb $3, 14(%rdi), %xmm1, %xmm1
vpinsrb $1, 7(%rdi), %xmm2, %xmm2
vpinsrb $2, 11(%rdi), %xmm2, %xmm2
vpmovzxbd %xmm1, %xmm1 # xmm1 = xmm1[0],zero,zero,zero,xmm1[1],zero,zero,zero,xmm1[2],zero,zero,zero,xmm1[3],zero,zero,zero
vpinsrb $3, 15(%rdi), %xmm2, %xmm2
vpslld $8, %xmm1, %xmm1
vpmovzxbd %xmm2, %xmm2 # xmm2 = xmm2[0],zero,zero,zero,xmm2[1],zero,zero,zero,xmm2[2],zero,zero,zero,xmm2[3],zero,zero,zero
vpor %xmm2, %xmm1, %xmm1
vpor %xmm1, %xmm0, %xmm0
vmovdqu %xmm0, (%rsi)
movl (%rdi), %eax
movl 4(%rdi), %ecx
movl 8(%rdi), %edx
movbel %eax, (%rsi)
movbel %ecx, 4(%rsi)
movl 12(%rdi), %ecx
movbel %edx, 8(%rsi)
movbel %ecx, 12(%rsi)
Differential Revision: https://reviews.llvm.org/D78997
Summary:
This helps detect some missed BFI updates during CodeGenPrepare.
This is debug build only and disabled behind a flag.
Fix a missed update in CodeGenPrepare::dupRetToEnableTailCallOpts().
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D77417
For empty directories (except the first one) we've been adding a file
with the same name as the directory to the result VFS mapping.
Differential Revision: https://reviews.llvm.org/D79551
Summary:
Avoid relocating DW_AT_call_pc, e.g. when a call PC is equal to the
function's low_pc as is the case in the test:
```
__Z5func1v:
0000000100007f94 b __Z5func2v
```
rdar://62952440
Reviewers: friss, JDevlieghere
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79536
When peeling out the epilogue we need to ignore illegal phis coming from stages
greater than the producer stage. Otherwise we end up with circular phi
dependencies.
Differential Revision: https://reviews.llvm.org/D79581
Summary:
Remove incorrect usage of getNumElements() from visitCallInst(). The
number of elements was being used to construct a DemandedElts bitfield.
This operation does not make sense for scalable vectors. Cast to
FixedVectorType
Identified by test case Clang :: CodeGen/aarch64-sve-intrinsics/acle_sve_mla.c
Reviewers: rengolin, efriedma, sdesmalen, c-rhodes, david-arm
Reviewed By: david-arm
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79524