When sampling from images with coordinates that only have 16 bit
accuracy, convert the image intrinsic call to use a16 or g16.
This does only happen if the target hardware supports it.
An alternative would be to always apply this combination, independent of
the target hardware and extend 16 bit arguments to 32 bit arguments
during legalization. To me, this sounds like an unnecessary roundtrip
that could prevent some further InstCombine optimizations.
Differential Revision: https://reviews.llvm.org/D85887
The check for the landingpad instructions was overly restrictive. In optimimized builds PHI nodes can appear
before the landingpad instructions, resulting in a fallback to SelectionDAG.
This change relaxes the check to allow PHI nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D86141
Currently we have to set 'Machine' to something in our
YAML descriptions. Usually we use 'EM_X86_64' for 64-bit targets
and 'EM_386' for 32-bit targets. At the same time, in fact, in most
cases our tests do not need a machine type and we can use
'EM_NONE'.
This is cleaner, because avoids the need of using a particular machine.
In this patch I've made the 'Machine' key optional (the default value,
when it is not specified is `EM_NONE`) and removed it (where possible)
from yaml2obj, obj2yaml and llvm-readobj tests.
There are few tests left where I decided not to remove it, because
I didn't want to touch CHECK lines or doing anything more complex
than a removing a "Machine: *" line and formatting lines around.
Differential revision: https://reviews.llvm.org/D86202
This patch moves FixedPointSemantics and APFixedPoint
from Clang to LLVM ADT.
This will make it easier to use the fixed-point
classes in LLVM for constructing an IR builder for
fixed-point and for reusing the APFixedPoint class
for constant evaluation purposes.
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-August/144025.html
Reviewed By: leonardchan, rjmccall
Differential Revision: https://reviews.llvm.org/D85312
The `UnrollMaxBlockToAnalyze` parameter is used at the stage when we have no
information about a loop body BB cost. In some cases, e.g. for simple loop
```
for(int i=0; i<32; ++i){
D = Arr2[i*8 + C1];
Arr1[i*64 + C2] += C3 * D;
Arr1[i*64 + C2 + 2048] += C4 * D;
}
```
current default parameter value is not enough to run deeper cost analyze so the
loop is not completely unrolled.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D86248
Use the stack to save and restore the link register when there is no
available register to do it.
Differential Revision: https://reviews.llvm.org/D76069
When hoisting simple values out from a loop, and an optsize attribute, a
convergent call, or an invoke instruction hindered the pass from
unswitching the loop, the pass would return an incorrect Modified
status.
This was caught using the check introduced by D80916.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D86085
Comparison against null is a common pattern that usually is followed by
error handling code and the likes. We now use AANonNull to simplify
these comparisons optimistically in order to make more code dead early
on.
Reviewed By: uenoku
Differential Revision: https://reviews.llvm.org/D86145
`AADereferenceable::getAssumedDereferenceableBytes()` is actually
deducing `dereferenceable_or_null`. We should not use that information
to deduce `nonnull`, since it doesn't imply `nonnull`.
This patch adds support for constrained scalar fp to int operations on
PowerPC. Besides, this fixes the FP exception bit of quad-precision
convert & truncate instructions.
Reviewed By: steven.zhang, uweigand
Differential Revision: https://reviews.llvm.org/D81537
This commit introduced a non-trivial compile time regression that needs
to be addressed: https://reviews.llvm.org/D70365#2227627
Given that it is unclear how long that will take, I'll revert it for
now.
This reverts commit eedf18fc1f5fc71bb896204abf41fc5a2dbf25f7.
This commits breaks certain OpenMP codes (on power) because it expanded
the Attributor scope without telling the Attributor about the SCC
extend. See: https://reviews.llvm.org/D85544#2227611
This reverts commit b0b32e649011d9a60165b9b53eb2764b7da9c8ca.
- Rename AMDGPU SCC DWARF register to STATUS since the scalar
condition code is a bit within the STATUS register.
- Correct bit size of the VCC_64 register to 64 which is the size in
wave64 mode.
Differential Revision: https://reviews.llvm.org/D86259
We don't need a std::string for a literal string, we can use a
StringRef.
The addition of StringRefs produces a Twine that we can just call
str() without converting to a SmallString ourselves. Twine will
do that internally.
-force-attribute adds an attribute to function via command-line.
However, there was no counter-part to remove an attribute. This patch
adds -force-remove-attribute that removes an attribute from function.
Differential Revision: https://reviews.llvm.org/D85586
There's a potential motivating case to increase this limit in PR47191:
http://bugs.llvm.org/PR47191
But first we should make it less hacky. The limit in InstCombine is directly tied
to this value because an increase there can cause asserts in the underlying value
tracking calls if not changed together. The usage in VectorUtils is independent,
but the comment suggests that we should use the same value unless there's a known
reason to diverge. There are similar limits in codegen analysis, but I think we
should leave those independent in case we intentionally want the optimization
power/cost to be different there.
Differential Revision: https://reviews.llvm.org/D86113
This patch was reverted in 7c182663a857fc87 due to some failures
observed on PCC based machines. Failures were due to Endianness issue and
long double representation issues.
Patch is revised to address Endianness issue. Furthermore, support
for emission of `DW_OP_implicit_value` for `long double` has been removed
(since it was unclean at the moment). Planning to handle this in
a clean way soon!
For more context, please refer to following review link.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
TargetRegisterInfo::getMinimalPhysRegClass() returns rtcGPR64RegClassID for X16
and X17, as it's the last matching class. This in turn gets passed to
AArch64RegisterBankInfo::getRegBankFromRegClass(), which hits an unreachable.
It seems sensible to handle this case, so copies from X16 and X17 work.
Copying from X17 is used in inline assembly in libunwind for pointer
authentication.
Differential Revision: https://reviews.llvm.org/D85720
llvm is missing support for DW_OP_implicit_value operation.
DW_OP_implicit_value op is indispensable for cases such as
optimized out long double variables.
For intro refer: DWARFv5 Spec Pg: 40 2.6.1.1.4 Implicit Location Descriptions
Consider the following example:
```
int main() {
long double ld = 3.14;
printf("dummy\n");
ld *= ld;
return 0;
}
```
when compiled with tunk `clang` as
`clang test.c -g -O1` produces following location description
of variable `ld`:
```
DW_AT_location (0x00000000:
[0x0000000000201691, 0x000000000020169b): DW_OP_constu 0xc8f5c28f5c28f800, DW_OP_stack_value, DW_OP_piece 0x8, DW_OP_constu 0x4000, DW_OP_stack_value, DW_OP_bit_piece 0x10 0x40, DW_OP_stack_value)
DW_AT_name ("ld")
```
Here one may notice that this representation is incorrect(DWARF4
stack could only hold integers(and only up to the size of address)).
Here the variable size itself is `128` bit.
GDB and LLDB confirms this:
```
(gdb) p ld
$1 = <invalid float value>
(lldb) frame variable ld
(long double) ld = <extracting data from value failed>
```
GCC represents/uses DW_OP_implicit_value in these sort of situations.
Based on the discussion with Jakub Jelinek regarding GCC's motivation
for using this, I concluded that DW_OP_implicit_value is most appropriate
in this case.
Link: https://gcc.gnu.org/pipermail/gcc/2020-July/233057.html
GDB seems happy after this patch:(LLDB doesn't have support
for DW_OP_implicit_value)
```
(gdb) p ld
p ld
$1 = 3.14000000000000012434
```
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D83560
D81345 appears to accidentally disables vectorization when explicitly
enabled. As PGSO isn't currently accessible from LoopAccessInfo, revert back to
the vectorization with versioning-for-unit-stride for PGSO.
Differential Revision: https://reviews.llvm.org/D85784
Previously we weren't adding the LegalizerInfo to the post-legalizer
combiner. Since that's fixed, we don't need to try to filter out the
one case that was breaking.
D85820 introduced a full path in the LLVM_SYSTEM_LIBS property of the
LLVMSupport target, which made the OCaml bindings fail to build, since
they use -l [system_lib] flags for every lib in LLVM_SYSTEM_LIBS, which
cannot work with absolute paths.
This patch solves the issue in a similar vain as ZLIB does it: it adds
the full library path to imported_libs, and adds a stripped down version
without directories, lib prefix and lib suffix to system_libs
In the future we should probably make some changes to LLVM_SYSTEM_LIBS,
since both zlib and ncurses do not necessarily have to be system libs
anymore due to the find_package / find_library bits introduced in
D85820 and D79219.
Patch By: haampie
Differential Revision: https://reviews.llvm.org/D86134
D85820 introduced a bug where LLVM_ENABLE_TERMINFO was set to true when
the library was found, even when the user had set
-DLLVM_ENABLE_TERMINFO=OFF.
Patch By: haampie
Differential Revision: https://reviews.llvm.org/D86173
If we have a mask, and a value x, where (x & mask) == x, we can drop the AND
and just use x.
This is about a 0.4% geomean code size improvement on CTMark at -O3 for AArch64.
In AArch64, this is most useful post-legalization. Patterns like this often
show up when legalizing s1s, which must be extended to larger types.
e.g.
```
%cmp:_(s32) = G_ICMP ...
%and:_(s32) = G_AND %cmp, 1
```
Since G_ICMP only produces a single bit, there's no reason to mask it with the
G_AND.
Differential Revision: https://reviews.llvm.org/D85463
canBeMovedDownwards checks if the "wait" counterpart of __tgt_target_data_begin_mapper can be moved downwards, returning a pointer to the instruction that might require/modify the data transferred, and returning null it the movement is not possible or not worth it. The function splitTargetDataBeginRTC receives that returned instruction and instead of moving the "wait" it creates it at that point.
Differential Revision: https://reviews.llvm.org/D86155