This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: NeHuang
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D82315
Move fixed length SDIV tests from sve-fixed-length-int-arith.ll to sve-fixed-length-int-div.ll. The former uses CHECK lines that verify legalization decisions. That's overkill for the i8/i16 SDIV tests, since they have a tricky legalization.
Then it is trivial to make the output indented (the second parameter of
json::OStream::OStream specifies the indentation).
Reviewed By: jhenderson, echristo
Differential Revision: https://reviews.llvm.org/D86045
There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32.
Differential Revision: https://reviews.llvm.org/D86114
This ensures that we never encode an instruction which is unavailable,
such as if we explicitly insert a forbidden instruction when lowering.
This is particularly important on RISC-V given its high degree of
modularity, and will become increasingly important as new standard
extensions appear.
Reviewed By: asb, lenary
Differential Revision: https://reviews.llvm.org/D85015
Currently we don't do anything about these,
neither in InstCombine, nor in SimplifyCFG's sinking.
These happen exceedingly rarely, but i've seen them in the cases where
PHI-aware aggregate reconstruction would have fired if not for them.
This exposes the module optimization pipeline as a pass that can be
applied stand-alone when using 'opt'. This helps ml inliner training
scenarios, where we start with IR captured right before inlining,
perform the inlining (-scc-oz-module-inliner) and then want to continue
and observe the final IR (where this patch comes into play). We can then
apply llc on the resulting IR to continue compilation down to native.
Differential Revision: https://reviews.llvm.org/D86224
The normal scheme for tail folding reductions is to use:
loop:
p = phi(0, a)
mask = ...
x = masked_load(..., mask)
a = add(x, p)
s = select(mask, a, p)
This means we need to keep the register p and a alive out of the loop, plus
the mask. On a target with predicated operations we can instead generate
the phi as p = phi(0, s). This ensures the select in the loop and we can
fold select(m, add(a, b), c) to something like a vaddt c, a, b using the
m predicate. This in turn allows us to tail predicate the entire loop.
Differential Revision: https://reviews.llvm.org/D84741
The getSrcFromCopy helper nowadays return a MachineOperand pointer,
so talking about zero_reg was incorrect as it nowadays return
a nullptr when not finding a copy like instruction.
Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case.
That is problematic when we propagate a range from call site returned position to floating position.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86196
This reverts commit 455d5a8a065b4b93df11d1696dc1546c403465a5.
It broke UBSan:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-bootstrap-ubsan/builds/21386/steps/check-llvm%20ubsan/logs/stdio
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/tools/llvm-readobj/ELF/malformed-pt-dynamic.test:62:10: error: WARN3: expected string not found in input
# WARN3: error: '[[FILE]]': Invalid data was encountered while parsing the file
^
<stdin>:2:1: note: scanning from here
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/tools/llvm-readobj/ELFDumper.cpp:1956:46: runtime error: addition of unsigned offset to 0x0000020c5b30 overflowed to 0x0000020c5b2f
^
<stdin>:2:1: note: with "FILE" equal to "/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm_build_ubsan/test/tools/llvm-readobj/ELF/Output/malformed-pt-dynamic\\.test\\.tmp3"
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/tools/llvm-readobj/ELFDumper.cpp:1956:46: runtime error: addition of unsigned offset to 0x0000020c5b30 overflowed to 0x0000020c5b2f
^
<stdin>:2:117: note: possible intended match here
/b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/tools/llvm-readobj/ELFDumper.cpp:1956:46: runtime error: addition of unsigned offset to 0x0000020c5b30 overflowed to 0x0000020c5b2f
^
Input file: <stdin>
Check file: /b/sanitizer-x86_64-linux-bootstrap-ubsan/build/llvm-project/llvm/test/tools/llvm-readobj/ELF/malformed-pt-dynamic.test
For scalable vector shifts the prediacte is typically all active,
which gets selected to an unpredicated shift by immediate. When
code generating for fixed length vectors the predicate is based
on the vector length and so additional patterns are required to
make use of SVE's predicated shift by immediate instructions.
Differential Revision: https://reviews.llvm.org/D86204
When removing a non-constant store to a global in
CleanupPointerRootUsers(), the GlobalOpt pass could incorrectly return
false.
This was caught using the check introduced by D80916.
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D86149
Relanded since the buildbot issue was unrelated to this commit.
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
The byte swapping, when dealing with 4 byte (float) FP constants
in DwarfExpression::addConstantFP, added in commit ef8992b9f0189005
was not correct. It always performed byte swapping using an
uint64_t value. When dealing with 4 byte values the 4 interesting
bytes ended up in the big end of the uint64_t, but later we emitted
the 4 bytes at the little end. So we ended up with zeroes being
emitted and faulty debug information.
This patch simplifies things a bit, IMHO. Using the APInt
representation throughout the function, instead of looking at
the internal representation using getRawBytes and without using
reinterpret_cast etc. And using API.byteSwap() should result in
correct byte swapping independent of APInt being 4 or 8 bytes.
Differential Revision: https://reviews.llvm.org/D86272
The code that reports "PT_DYNAMIC segment offset + size exceeds the size of the file"
has an issue: it is possible to bypass the validation by overflowing the size + offset result.
Differential revision: https://reviews.llvm.org/D85519
This reverts commit dfd447c22043b0a64bf1d146735ca33f926bd22d.
After I pushed this commit, llvm-sphinx-docs started failing, due to:
Warning, treated as error:
extension 'recommonmark' has no setup() function;
is it really a Sphinx extension module?
I don't see how this commit may have caused that, but I'm still
reverting it since I don't know how to proceed with that
troubleshooting.
When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.
An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.
Differential Revision: https://reviews.llvm.org/D85887
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.
This change relaxes the check to allow PHI nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86141
Currently we have to set 'Machine' to something in our
YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets
and 'EM_386' for 32-bit targets. At the same time, in fact, in most
cases our tests do not need a machine type and we can use
'EM_NONE'.
This is cleaner, because avoids the need of using a particular machine.
In this patch I've made the 'Machine' key optional (the default value,
when it is not specified is `EM_NONE`) and removed it (where possible)
from yaml2obj, obj2yaml and llvm-readobj tests.
There are few tests left where I decided not to remove it, because
I didn't want to touch CHECK lines or doing anything more complex
than a removing a "Machine: *" line and formatting lines around.
Differential revision: https://reviews.llvm.org/D86202
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.
This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html
Reviewed By: leonardchan, rjmccall
Differential Revision: https://reviews.llvm.org/D85312
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
D = Arr2[i*8 + C1];
Arr1[i*64 + C2] += C3 * D;
Arr1[i*64 + C2 + 2048] += C4 * D;
}
```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D86248
Use the stack to save and restore the link register when there is no
available register to do it.
Differential Revision: https://reviews.llvm.org/D76069
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D86145
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81537
This commit introduced a non-trivial compile time regression that needs
to be addressed: https://reviews.llvm.org/D70365#2227627
Given that it is unclear how long that will take, I'll revert it for
now.
This reverts commit eedf18fc1f5fc71bb896204abf41fc5a2dbf25f7.
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611
This reverts commit b0b32e649011d9a60165b9b53eb2764b7da9c8ca.
- Rename AMDGPU SCC DWARF register to STATUS since the scalar
condition code is a bit within the STATUS register.
- Correct bit size of the VCC_64 register to 64 which is the size in
wave64 mode.
Differential Revision: https://reviews.llvm.org/D86259
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute. This patch
adds -force-remove-attribute that removes an attribute from function.
Differential Revision: https://reviews.llvm.org/D85586
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191
But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.
Differential Revision: https://reviews.llvm.org/D86113