This should probably be implied for all the speculatable ones. I think
the only ones where this plausibly doesn't apply is s_sendmsghalt and
maybe kill.
It is possible that we can try to negate the same value multiple times.
For example, PHI nodes may happen to have multiple incoming values
(all of which must be the same value) for the same incoming basic block.
It may happen that we try to negate such a PHI node, and succeed,
and that might result in having now-different incoming values..
To avoid that, and in general to reduce the amount of duplicated
work we might be doing, let's introduce a cache where
we'll track results of negating each value.
The added test was previously failing -verify after -instcombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46362
I originally reverted the patch because it was causing performance
issues, but now I think it's just enabling simplify-cfg to do
something that I don't want instead :)
Sorry for the noise.
This reverts commit 3e39760f8eaad4770efa05824768e67237915cf5.
(a[0] + a[1] + a[2] + a[3]) - (b[0] + b[1] + b[2] +b[3]) -->
(a[0] - b[0]) + (a[1] - b[1]) + (a[2] - b[2]) + (a[3] - b[3])
This should be the last step in solving PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
We started emitting reduction intrinsics with:
D80867/ rGe50059f6b6b3
So it's a relatively easy pattern match now to re-order those ops.
Also, I have not seen any complaints for the switch to intrinsics
yet, so I'll propose to remove the "experimental" tag from the
intrinsics soon.
Differential Revision: https://reviews.llvm.org/D81491
This is a hacky, but low-risk fix to avoid the infinite loop in PR46271:
https://bugs.llvm.org/show_bug.cgi?id=46271
As discussed there, the problem is that FoldOpIntoSelect() can get into a conflict
with a transform that wants to pull a 'not' op through min/max via
SimplifyDemandedVectorElts(). We need to relax our matching of min/max to include
undefined elements in vector constants to avoid that. Alternatively, we could
improve or cripple the demanded elements analysis, but that could create even
more problems.
The likely better, safer alternative will be to create min/max intrinsics, so
we can remove all of the hacks related to min/max matching in instcombine.
Differential Revision: https://reviews.llvm.org/D81698
Summary:
"X % C == 0" is optimized to "X & C-1 == 0" (where C is a power-of-two)
However, "X % Y" can also be represented as "X - (X / Y) * Y" so if I rewrite the initial expression:
"X - (X / C) * C == 0" it's not currently optimized to "X & C-1 == 0", see godbolt: https://godbolt.org/z/KzuXUj
This is my first contribution to LLVM so I hope I didn't mess things up
Reviewers: lebedev.ri, spatel
Reviewed By: lebedev.ri
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79369
Having the input dumped on failure seems like a better
default: I debugged FileCheck tests for a while without knowing
about this option, which really helps to understand failures.
Remove `-dump-input-on-failure` and the environment variable
FILECHECK_DUMP_INPUT_ON_FAILURE which are now obsolete.
Differential Revision: https://reviews.llvm.org/D81422
This is intended to preserve the logic of the existing transform,
but remove unnecessary restrictions on uses and types.
https://rise4fun.com/Alive/pYfR
Pre: C1 <= width(C1) - 8
%B = sext i8 %A
%C = lshr %B, C1
%r = trunc %C to i8
=>
%r = ashr i8 %A, trunc(umin(C1, 7))
Summary:
This transformation is correct for a builtin call to 'free(p)', but not
for 'operator delete(p)'. There is no guarantee that a user replacement
'operator delete' has no effect when called on a null pointer.
However, the principle behind the transformation *is* correct, and can
be applied more broadly: a 'delete p' expression is permitted to
unconditionally call 'operator delete(p)'. So do that in Clang under
-Oz where possible. We do this whether or not 'p' has trivial
destruction, since the destruction might turn out to be trivial after
inlining, and even for a class-specific (but non-virtual,
non-destroying, non-array) 'operator delete'.
Reviewers: davide, dnsampaio, rjmccall
Reviewed By: dnsampaio
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D79378
We can simplify
```
icmp <pred> phi(C1, C2, ...), C
```
with
```
phi(icmp(C1, C), icmp(C2, C), ...)
```
provided that all comparison of constants are constants themselves.
Differential Revision: https://reviews.llvm.org/D81151
Reviewed By: lebedev.ri
As mentioned in the post-commit comments of D81013 -
the mask check API has to assume the shuffle is
not length-changing, but we have not ruled that out
in this code. Use the ShuffleVectorInst call instead.
Remove the function Instruction::setProfWeight() and make
use of Instruction::copyMetadata(.., {LLVMContext::MD_prof}).
This is correct for all use cases of setProfWeight() as it
is applied to CallBase instructions only.
This change results in prof metadata copied intact even if
the source has "VP". The old pair of calls
extractProfTotalWeight() + setProfWeight() resulted in
setting branch_weights if the source had "VP" data.
Reviewers: yamauchi, davidxl
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80987
Whilst trying to compile this test to assembly:
CodeGen/aarch64-sve-intrinsics/acle_sve_reinterpret.c
I discovered some warnings were firing in InstCombiner::visitBitCast
due to calls to getNumElements() for scalable vector types. These
calls only really made sense for fixed width vectors so I have fixed
up the code appropriately.
Differential Revision: https://reviews.llvm.org/D80559
We've started (D80598) the process of migrating away from the inline operand lists in statepoints to using explicit operand bundles. Update a few tests to reflect the new preference. More to come, these were simply the ones outside any obvious grouping.
The -reassociate pass tends to transform this kind of pattern into
something that is worse for vectorization and codegen. See PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
Follows-up the FP version of the same transform:
rGa0ce2338a083
The -reassociate pass tends to transform this kind of pattern into
something that is worse for vectorization and codegen. See PR43953:
https://bugs.llvm.org/show_bug.cgi?id=43953
This intrinsic implements IEEE-754 operation roundToIntegralTiesToEven,
and performs rounding to the nearest integer value, rounding halfway
cases to even. The intrinsic represents the missed case of IEEE-754
rounding operations and now llvm provides full support of the rounding
operations defined by the standard.
Differential Revision: https://reviews.llvm.org/D75670
We have to assume undef could be an snan, which would need quieting so
returning qnan is safer than undef. Also consider strictfp, and don't
care if the result rounded.
This eliminates a use of 'B', so it can enable follow-on transforms
as well as improve analysis/codegen.
The PhaseOrdering test was added for D61726, and that shows
the limits of instcombine vs. real reassociation. We would
need to run some form of CSE to collapse that further.
The intermediate variable naming here is intentional because
there's a test at llvm/test/Bitcode/value-with-long-name.ll
that would break with the usual nameless value. I'm not sure
how to improve that test to be more robust.
The naming may also be helpful to debug regressions if this
change exposes weaknesses in the reassociation pass for example.
This really belongs in InstructionSimplify since it doesn't introduce
new instructions. Put it in instcombine to avoid increasing the number
of passes considering target intrinsics.
I also noticed that we seem to now be interpreting strictfp attributes
on call sites, so try to handle that.