This is based on the assumption that most simulated instructions don't define
more than one or two registers. This is true for example on x86, where
most instruction definitions don't declare more than one register write.
The default code region size has been increased from 8 to 16. This is based on
the assumption that, for small microbenchmarks, the typical code snippet size is
often less than 16 instructions.
mca::Instruction now uses bitfields to pack flags.
No functional change intended.
It breaks up the function pass manager in the codegen pipeline.
With empty parameters, it looks at the -mllvm flag -rewrite-map-file.
This is likely not in use.
Add a check that we only have one function pass manager in the codegen
pipeline.
Some tests relied on the fact that we had a module pass somewhere in the
codegen pipeline.
addr-label.ll crashes on ARM due to this change. This is because a
ARMConstantPoolConstant containing a BasicBlock to represent a
blockaddress may hold an invalid pointer to a BasicBlock if the
blockaddress is invalidated by its BasicBlock getting removed. In that
case all referencing blockaddresses are RAUW a constant int. Making
ARMConstantPoolConstant::CVal a WeakVH fixes the crash, but I'm not sure
that's the right fix. As a workaround, create a barrier right before
ISel so that IR optimizations can't happen while a
ARMConstantPoolConstant has been created.
Reviewed By: rnk, MaskRay, compnerd
Differential Revision: https://reviews.llvm.org/D99707
- This patch is the second (and hopefully final) part of providing HLASM syntax for inline asm statements for z/OS to LLVM (continuing on from https://reviews.llvm.org/D98276)
- This second part deals with providing label support
- As mentioned in https://reviews.llvm.org/D98276, if the first token is not a space we process the first token as a label, and the remaining tokens as a possible machine instruction
- To achieve this, a new `parseAsHLASMLabel` function is introduced. This function processes the first token, validates whether it is an "acceptable" label according to HLASM standards, and then emits it
- After handling and emitting the label, call the `parseAsMachineInstruction` instruction to process the remaining tokens as a machine instruction.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D103320
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
`NonLocalDeps` to `NonLocalDepsMap`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D102358
I accidentaly pushed a draft of D103280 that was discussed
during the review, but it was not supposed to be the final
version.
Rather than revert and recommit, I'm updating the existing
code. This way we have a record of the codegen diff that
would result if we decide to remove this predicate in the
future.
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.
In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB
heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB
heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB
heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB
heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB
heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```
While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
During reviewing D102277 it was decided to remove lazy options processing
from llvm-objcopy CopyConfig structure. This patch transforms processing of ELF
lazy options into the in-place processing.
Differential Revision: https://reviews.llvm.org/D103260
sext (vsetcc X, Y) --> vsetcc (zext X), (zext Y) --
(when the zexts are free and a bunch of other conditions)
We have a couple of similar folds to this already for vector selects,
but this pattern slips through because it is only a setcc.
The tests are based on the motivating case from:
https://llvm.org/PR50055
...but we need extra logic to get that example, so I've left that as
a TODO for now.
Differential Revision: https://reviews.llvm.org/D103280
The D35953, D62650 and D73691 introduced trimming of variables locations
in LiveDebugVariables pass, since there are some cases where after
the virtregrewrite we have exploded number of DBG_VALUEs created for some
inlined variables. As it looks, all problematic cases were regarding
inlined variables, so it seems reasonable to stop trimming the location
ranges for non-inlined variables.
It has very good impact on the llvm-locstats report.
Differential Revision: https://reviews.llvm.org/D102917
This patch fixes a bug in lowering scalable-vector types in RISC-V's
main calling convention. When scalable-vector types are split and passed
indirectly, the target is responsible for scaling the offset --
initially set to the known-minimum store size -- by the scalable factor.
Before this we were issuing overlapping loads or stores to the different
parts, leading to incorrect codegen.
Credit to @HsiangKai for spotting this.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D103262
This is a patch that replaces shufflevector and insertelement's placeholder value with poison.
Underlying motivation is to fix the semantics of shufflevector with undef mask to return poison instead
(D93818)
The consensus has been made in the late 2020 via mailing list as well as the thread in https://bugs.llvm.org/show_bug.cgi?id=44185 .
This patch is a simple syntactic change to the existing code, hence directly pushed as a commit.
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.
The test case this helps is from code like this, which can come up in
certain matrix operations:
for(i=..)
dst[i] = 0;
for(j=..)
dst[i] += src[i*n+j];
After LICM, this becomes:
for(i=..)
dst[i] = 0;
sum = 0;
for(j=..)
sum += src[i*n+j];
dst[i] = sum;
The first store is dead, and with this patch is now removed.
Differntial Revision: https://reviews.llvm.org/D100464
This patch custom lowers FP_TO_[US]INT and [US]INT_TO_FP conversions
between floating-point and boolean vectors. As the default action is
scalarization, this patch both supports scalable-vector conversions and
improves the code generation for fixed-length vectors.
The lowering for these conversions can piggy-back on the existing
lowering, which lowers the operations to a supported narrowing/widening
conversion and then either an extension or truncation.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103312
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
https://reviews.llvm.org/D95745 introduced a new `unwind` keyword for inline assembler expressions. Inline asms marked with the `unwind` keyword allows stack unwinding from inline assembly because the compiler emits unwinding information ("around" the inline asm) as it would for calls/invokes. Unwinding the stack from within non-unwind inline asm may cause UB.
Reviewed By: Amanieu
Differential Revision: https://reviews.llvm.org/D102642
As noted in PR45210: https://bugs.llvm.org/show_bug.cgi?id=45210
...the bug is triggered as Eli say when sext(idx) * ElementSize overflows.
```
// assume that GV is an array of 4-byte elements
GEP = gep GV, 0, Idx // this is accessing Idx * 4
L = load GEP
ICI = icmp eq L, value
=>
ICI = icmp eq Idx, NewIdx
```
The foldCmpLoadFromIndexedGlobal function simplifies GEP+load operation to icmp.
And there is a problem because Idx * ElementSize can overflow.
Let's assume that the wanted value is at offset 0.
Then, there are actually four possible values for Idx to match offset 0: 0x00..00, 0x40..00, 0x80..00, 0xC0..00.
We should return true for all these values, but currently, the new icmp only returns true for 0x00..00.
This problem can be solved by masking off (trailing zeros of ElementSize) bits from Idx.
```
...
=>
Idx' = and Idx, 0x3F..FF
ICI = icmp eq Idx', NewIdx
```
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D99481
This ensures that the operands of any gather/scatter instructions that
we attempt to push out of the loop are invariant, preventing invalid IR
from being generated.
This is similar to the fix in c590a9880d7a ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).
We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.
I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.
https://llvm.org/PR50500
extractelement is poison if the index is out-of-bounds, so just
scalarizing the load may introduce an out-of-bounds load, which is UB.
To avoid introducing new UB, we can mask the index so it only contains
valid indices.
Fixes PR50382.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D103077
When you try to define a new DEBUG_TYPE in a header file, DEBUG_TYPE
definition defined around the #includes in files include it could
result in redefinition warnings even compile errors.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102594
Using the proper API automatically sets `__stack_chk_guard` to `dso_local` if
`Reloc::Static`. This wasn't strictly necessary until recently when dso_local was
no longer implied by `TargetMachine::shouldAssumeDSOLocal` for
`__stack_chk_guard`. By using the proper API, we can avoid generating unnecessary
GOT relocations.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D102646
If the operand of the WhileLoopStart is flagged as killed, that
currently gets propogated to both the t2CMPri as the instruction is
reverted, and the newly created t2DoLoopStart. Only the second should
remain as killing the operand, the first dropping the flags.
On FreeBSD, absolute paths are passed unmodified in AT_EXECPATH, but
relative paths are resolved to absolute paths, and any symlinks will be
followed in the process. This means that the resource dir calculation
will be wrong if Clang is invoked as an absolute path to a symlink, and
this currently causes clang/test/Driver/rocm-detect.hip to fail on
FreeBSD. Thus, make sure to call realpath on the result, just like is
done on macOS.
Whilst here, clean up the old fallback auxargs loop to use the actual
type for auxargs rather than using lots of hacky casts that rely on
addresses and pointers being the same (which is not the case on CHERI,
and thus Arm's prototype Morello, although for little-endian systems it
happens to work still as the word-sized integer will be padded to a full
pointer, and it's someone academic given dereferencing past the end of
environ will give a bounds fault, but CheriBSD is new enough that the
elf_aux_info path will be used). This also makes the code easier to
follow, and removes the confusing double-increment of p.
Reviewed By: dim, arichardson
Differential Revision: https://reviews.llvm.org/D103346
This does not solve PR17101, but it is one of the
underlying diffs noted here:
https://bugs.llvm.org/show_bug.cgi?id=17101#c8
We could ease the one-use checks for the 'clear'
(no 'not' op) half of the transform, but I do not
know if that asymmetry would make things better
or worse.
Proofs:
https://rise4fun.com/Alive/uVB
Name: masked bit set
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp ne i32 %and, 0
%r = zext i1 %cmp to i32
=>
%s = lshr i32 %x, %y
%r = and i32 %s, 1
Name: masked bit clear
%sh1 = shl i32 1, %y
%and = and i32 %sh1, %x
%cmp = icmp eq i32 %and, 0
%r = zext i1 %cmp to i32
=>
%xn = xor i32 %x, -1
%s = lshr i32 %xn, %y
%r = and i32 %s, 1
Note: this is a re-post of a patch that I committed at:
rGa041c4ec6f7a
The commit was reverted because it exposed another bug:
rGb212eb7159b40
But that has since been corrected with:
rG8a156d1c2795189 ( D101191 )
Differential Revision: https://reviews.llvm.org/D72396
The implementation of subword atomics does not actually
guarantee the result is zero-extended, which now caused
build bot failures after https://reviews.llvm.org/D101342
was landed.
Follow the same strategy used for atomic loads/stores by converting the operands to equally-sized integer types.
This change prevents the atomic expansion pass from generating illegal LL/SC pairs when targeting AArch64: `expand-atomicrmw-xchg-fp.ll` would previously instantiate intrinsics such as `llvm.aarch64.ldaxr.p0f32` that cannot be lowered.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D103232
* Change linkage/visibility of __profn_ variables to match the reality
* alwaysinline.ll: Add "EnableValueProfiling", otherwise it doesn't test available_externally alwaysinline.
* Delete PR23499.ll - covered by other comdat tests.
We have special handling for a zext of a load <32b because the load does a zext
for free. In that case, we just select the G_ZEXT as if it were a copy but this
triggered the copy checking code to balk at the mismatched size.
This was being hidden because normally these get combined into G_ZEXTLOAD but
for atomics this doesn't happen. The test case here just uses a normal load
because the particular atomic isn't supported yet anyway.
When fulling unrolling with a non-latch exit, the latch block is
folded to unreachable. Replace this folding with the existing
changeToUnreachable() helper, rather than performing it manually.
This also moves the fold to happen after the manual DT update
for exit blocks. I believe this is correct in that the conversion
of an unconditional backedge into unreachable should not affect
the DT at all.
Differential Revision: https://reviews.llvm.org/D103340
This is cleaner than slicing the MxList to remove elements from
the beginning or end since that requires hardcoding the size.
I don't expect the size of the list to change, but we shouldn't
repeat it in multiple places.
This is to show that we currently only convert the terminator to
unreachable, but don't clean up instructions before it (unless
trivial DCE removes them).
Also clean up excessive whitespace in this test.
This does some non-functional cleanup of exit folding during
unrolling. The two main changes are:
* First rewrite latch->header edges, which is unrelated to exit
folding.
* Combine folding for latch and non-latch exits. After the
previous change, the only difference in their logic is that
for non-latch exits we currently only fold "known non-exit"
cases, but not "known exit" cases.
I think this helps a lot to clarify this code and prepare it for
future changes.
Differential Revision: https://reviews.llvm.org/D103333