We have a special pattern for
(mul (and X, 0xffffffff), (and Y, 0xffffffff)), to optimize the
ANDs to shift. But if a sext_inreg coms first, we'll form a MULW
and limit the effectiveness of the special match. So this patch
adds a larger pattern to suppress the MULW formation by emitting
a sext.w and then the same output we use for the
(mul (and X, 0xffffffff), (and Y, 0xffffffff)). This should all
get CSEd.
This is the issue I was trying to fix with D99029, but that affected
many more tests.
The current linear expression decomposition handles zext/sext by
decomposing the casted operand, and then checking NUW/NSW flags
to determine whether the extension can be distributed. This has
some disadvantages:
First, it is not possible to perform a partial decomposition. If
we have zext((x + C1) +<nuw> C2) then we will fail to decompose
the expression entirely, even though it would be safe and
profitable to decompose it to zext(x + C1) +<nuw> zext(C2)
Second, we may end up performing unnecessary decompositions,
which will later be discarded because they lack nowrap flags
necessary for extensions.
Third, correctness of the code is not entirely obvious: At a high
level, we encounter zext(x -<nuw> C) in the form of a zext on the
linear expression x + (-C) with nuw flag set. Notably, this case
must be treated as zext(x) + -zext(C) rather than zext(x) + zext(-C).
The code handles this correctly by speculatively zexting constants
to the final bitwidth, and performing additional fixup if the
actual extension turns out to be an sext. This was not immediately
obvious to me.
This patch inverts the approach: An ExtendedValue represents a
zext(sext(V)), and linear expression decomposition will try to
decompose V further, either by absorbing another sext/zext into the
ExtendedValue, or by distributing zext(sext(x op C)) over a binary
operator with appropriate nsw/nuw flags. At each step we can
determine whether distribution is legal and abort with a partial
decomposition if not. We also know which extensions we need to
apply to constants, and don't need to speculate or fixup.
Use the CMake 3.13 features of CMakeConfigPackageHelpers to generate
LLVMConfigVersion.cmake with proper architecture detection, major+minor
version matching, etc.
Differential Revision: https://reviews.llvm.org/D99451
Remove VBROADCAST/MOVDDUP/splat-shuffle handling from foldShuffleOfHorizOp
This can all be handled by canonicalizeShuffleMaskWithHorizOp along as we check that the HADD/SUB are only used once (to prevent infinite loops on slow-horizop targets which will try to reuse the nodes again followed by a post-hop shuffle).
For example,
<https://lab.llvm.org/buildbot/#/builders/132/builds/3929>
has this diagnostic:
```
/opt/gcc/9.3.0/snos/include/g++/bits/stl_tree.h:780:8: error: static assertion failed: comparison object must be invocable as const
780 | is_invocable_v<const _Compare&, const _Key&, const _Key&>,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
In input dump annotations, `check:2'1` indicates diagnostic 1 for the
`CHECK` directive on check file line 2. Without this patch,
`-dump-input` computes the diagnostic index with the assumption that
FileCheck *consecutively* produces all diagnostics for the same
pattern. Already, that can be a false assumption, as in the examples
below. Moreover, it seems like a brittle assumption as FileCheck
evolves. Finally, it actually complicates the implementation even if
it makes it slightly more efficient.
This patch avoids that assumption. Examples below show results after
applying this patch. Before applying this patch, `'N` is omitted
throughout these examples because the implementation doesn't notice
there's more than one diagnostic per pattern.
First, `CHECK-LABEL` violates the assumption because `CHECK-LABEL`
tries to match twice, and other directives can match in between:
```
$ cat check
CHECK: foobar
CHECK-LABEL: foobar
$ FileCheck -vv check < input |& tail -8
<<<<<<
1: text
2: foobar
label:2'0 ^~~~~~
check:1 ^~~~~~
label:2'1 X error: no match found
3: text
>>>>>>
```
Second, `--implicit-check-not` is obviously processed many times among
other directives:
```
$ cat check
CHECK: foo
CHECK: foo
$ FileCheck -vv -dump-input=always -implicit-check-not=foo \
check < input |& tail -16
<<<<<<
1: text
not:imp1'0 X~~~~
2: foo
check:1 ^~~
not:imp1'1 X
3: text
not:imp1'1 ~~~~~
4: foo
check:2 ^~~
not:imp1'2 X
5: text
not:imp1'2 ~~~~~
6:
eof:2 ^
>>>>>>
```
Reviewed By: thopre, jhenderson
Differential Revision: https://reviews.llvm.org/D97813
While explicit sext instructions were handled correctly, the
implicit sext that occurs if the offset is smaller than the
pointer size blindly assumed that sext(X * Scale + Offset) is the
same as sext(X) * Scale + Offset, which is obviously not correct.
Fix this by extracting the code that handles linear expression
extension and reusing it for the implicit sext as well.
A number of variables need to be correctly initialized on entry
to GetLinearExpression() for the implementation to behave reasonably.
The fact that SExtBits can currenlty be non-zero on entry is a bug,
as demonstrated by the added test: For implicit sexts by the GEP,
we do currently skip legality checks.
Currently, we'd produce an incorrect decomposition, because we
already recursively called GetLinearExpression(), so the Scale=1,
Offset=0 will not necessarily be relative to the shl itself.
Now, this doesn't actually matter for functional correctness,
because such a shift is poison anyway, so its okay to return
an incorrect decomposition. It's still unnecessarily confusing
though, and we can easily avoid this by checking the bitwidth
earlier.
Nowrap flags between mul and shl differ in that mul nsw allows
multiplication of 1 * INT_MIN, while shl nsw does not. This means
that it is always fine to transfer shl nowrap flags to muls, but
not necessarily the other way around. In this case the NUW/NSW
results refer to mul/add operations, so it's fine to retain the
flags from the shl.
See if the combined shuffle mask is equivalent to an identity shuffle, typically this is due to repeated LHS/RHS ops in horiz-ops, but isTargetShuffleEquivalent might see other patterns as well.
This is another small step towards getting rid of foldShuffleOfHorizOp and relying on canonicalizeShuffleMaskWithHorizOp and generic shuffle combining.
This is a small patch to make FoldBranchToCommonDest poison-safe by default.
After fc3f0c9c, only two syntactic changes are needed to fix unit tests.
This does not cause any assembly difference in testsuite as well (-O3, X86-64 Manjaro).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99452
This adds me as a Google representative for the LLVM security group.
This was proposed, discussed, and voted on in the differential revision
linked below; please see it for more information.
Differential Revision: https://reviews.llvm.org/D99232
It's unlikely that FMADD and FMSUB would have different scheduling
information so merge them.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D99140
During context promotion, intermediate nodes that are on a call path but do not come with a profile can be promoted together with their parent nodes. Do not print sample context string for such nodes since they do not have profile.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D99441
I've used IALU for the simplest operations from Zbb:
min, minu, max, maxu, sext.b, sext.h, zext.h, andn, orn, xnor
I've put add.uw in IALU32 and slli.uw in ShiftImm32.
Remaining instructions have received new classes.
All 3 sh*add are grouped together. sh*add.uw are grouped together.
Rotate left and right are together. Everything else got their own
class containing one instruction.
I think what I have here is the minimum granularity we need. I
could be convinced that we need more classes.
Reviewed By: evandro
Differential Revision: https://reviews.llvm.org/D99040
Handle (x << s) != (y << s) where x != y and the shifts are
non-wrapping. Once again, this establishes parity with the
corresponing mul fold that already exists. The shift case is
more powerful because we don't need to guard against multiplies
by zero.
This handles the pattern X != X << C for non-zero X and C and a
non-overflowing shift. This establishes parity with the corresponing
fold for multiplies.
IR values convert to check prefix FileCheck variables for IR checks. For example, nameless values, e.g., %0, convert to check prefix TMP FileCheck variables, e.g., [[TMP0:%.*]]. This check prefix may clash with named values that have the same name and that causes auto-generated tests to fail. Currently a warning is emitted to change the names of the IR values but this is not always possible, if for example they are generated by clang. Manual intervention to fix the FileCheck variable names is too tedious. This patch add a parameter to prefix conflicting FileCheck variable names with a user-provided string to automate the process.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D99415
This is mainly for clarity: It doesn't make sense to do any
negative/positive checks when dealing with a nuw add/mul. These
only make sense to nsw add/mul.
In the process of adding the tests, several bugs were
found in the implementation and interface of the API
and they were fixed.
Some utilities from the core tests (core.ml) were moved
into a separate file for reuse.
The following new functions have been added:
`dibuild_create_global_variable_expression`,
`dibuild_create_constant_value_expression` and
`llmetadata_null`. The third one already existed but
is now exposed publicly.
Differential Revision: https://reviews.llvm.org/D99403
If the result of an atomic operation is not used then it can be more
efficient to build a reduction across all lanes instead of a scan. Do
this for GFX10, where the permlanex16 instruction makes it viable. For
wave64 this saves a couple of dpp operations. For wave32 it saves one
readlane (which are generally bad for performance) and one dpp
operation.
Differential Revision: https://reviews.llvm.org/D98953
This patch adds a few test cases where currently NoAlias is returned,
but the pointers can alias if the multiply overflows while computing
a GEP index value.
This was originally just an XFAIL test, but I modified it
to check output. To make that bot-friendly, I'm moving it
to the x86 dir since it specified an x86 target.
Add the constraint when destination EEW not equals the source EEW for
correctness.
The RVV spec has three register overlap rules and I implement the first
stricter constraint because the others are difficult to enforce.
Reviewed By: frasercrmck, craig.topper
Differential Revision: https://reviews.llvm.org/D98920
This reverts commit 3c8473ba534daa3 and includes test diffs to
maintain testing status.
There's at least 1 place that was not updated with 7202f47508 ,
so we can crash mismatching select and intrinsics as shown in
PR49730.
The tests, test/Transforms/InstCombine/AArch64/sve-*,
have been shown to not be AArch64 specific. These tests
have been renamed and moved to reflect this.
Differential Revision: https://reviews.llvm.org/D99253