Fixes a bug introduced by D91589.
When folding `(sext (not i1 x)) -> (add (zext i1 x), -1)`, we try to replace the not first when possible. If we replace the not in-visit, then the now invalidated node will be returned, and subsequently we will return an invalid sext. In cases where the not is replaced in-visit we can simply return SDValue, as the not in the current sext should have already been replaced.
Thanks @jgorbe, for finding the below reproducer.
The following reduced test case crashes clang when built with `clang -O1 -frounding-math`:
```
template <class> class a {
int b() { return c == 0.0 ? 0 : -1; }
int c;
};
template class a<long>;
```
A debug build of clang produces this "assertion failed" error:
```
clang: /home/jgorbe/code/llvm/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:264: void {anonymous}::DAGCombiner::AddToWorklist(llvm::
SDNode*): Assertion `N->getOpcode() != ISD::DELETED_NODE && "Deleted Node added to Worklist"' failed.
```
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D93274
If we happen to extract a non-dword subreg that breaks the
logic of the function and it may shrink the dmask because
it does not recognize the use of a lane(s).
This bug is next to impossible to trigger with the current
lowering in the BE, but it breaks in one of my future patches.
Differential Revision: https://reviews.llvm.org/D93782
Some predicates, can be considered the same as long as the operands are
flipped. For example, a > b gives the same result as b > a. This maps
instructions in a greater than form, to their appropriate less than
form, swapping the operands in the IRInstructionData only, allowing for
more flexible matching.
Tests:
llvm/test/Transforms/IROutliner/outlining-isomorphic-predicates.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87310
Certain instructions, such as adds and multiplies can have the operands
flipped and still be considered the same. When we are analyzing
structure, this gives slightly more flexibility to create a mapping from
one region to another. We can add both operands in a corresponding
instruction to an operand rather than just the exact match. We then try
to eliminate items from the set, until there is only one valid mapping
between the regions of code.
We do this for adds, multiplies, and equality checking. However, this is
not done for floating point instructions, since the order can still
matter in some cases.
Tests:
llvm/test/Transforms/IROutliner/outlining-commutative-fp.ll
llvm/test/Transforms/IROutliner/outlining-commutative.ll
llvm/unittests/Analysis/IRSimilarityIdentifierTest.cpp
Reviewers: jroelofs, paquette
Differential Revision: https://reviews.llvm.org/D87311
The source pointer type is not necessarily the same as the result
pointer type, so we can't simply return the original null pointer,
it might be a different one.
Effectively, this is what we were previously already doing when
the GEP was used in conjunction with a load or store, but this
fold can also be applied more generally:
> The only in bounds address for a null pointer in the default
> address-space is the null pointer itself.
This patch extends the SDNode ISel support for RVV from only the
vector/vector instructions to include the vector/scalar and
vector/immediate forms.
It uses splat_vector to carry the scalar in each case, except when
XLEN<SEW (RV32 SEW=64) when a custom node `SPLAT_VECTOR_I64` is used for
type-legalization and to encode the fact that the value is sign-extended
to SEW. When the scalar is a full 64-bit value we use a sequence to
materialize the constant into the vector register.
The non-intrinsic ISel patterns have also been split into their own
file.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Fraser Cormack <fraser@codeplay.com>
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93312
If the GEP isn't inbounds, then accessing a GEP of null location
is generally not UB.
While this is a minimal fix, the GEP of null handling should
probably be its own fold.
Every basic block section symbol created by -fbasic-block-sections will contain
".__part." to know that this symbol corresponds to a basic block fragment of
the function.
This patch solves two problems:
a) Like D89617, we want function symbols with suffixes to be properly qualified
so that external tools like profile aggregators know exactly what this
symbol corresponds to.
b) The current basic block naming just adds a ".N" to the symbol name where N is
some integer. This collides with how clang creates __cxx_global_var_init.N.
clang creates these symbol names to call constructor functions and basic
block symbol naming should not use the same style.
Fixed all the test cases and added an extra test for __cxx_global_var_init
breakage.
Differential Revision: https://reviews.llvm.org/D93082
The current state of the transform is still not enough to support
my motivational pattern, because it has one more "induction variable".
I have delayed posting this patch, because originally even just rewriting
the loop as countable wasn't enough to nicely transform my motivational pattern,
because i expected that extra IV to be rewritten afterwards,
but it wasn't happening until i fixed that in D91800.
So, this patch allows the 'left-shift until bittest' loop idiom
as long as the inserted ops are cheap,
and lifts any and all extra use checks on the instructions.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D92754
If the bitmask is for sign bit, instcombine would have canonicalized
the pattern into a proper sign bit check. Supporting that is still
simple, but requires a bit of a roundtrip - we first have to use
`decomposeBitTestICmp()`, and the rest again just works.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91726
The handing of the case where the mask is a constant is trivial,
if said constant is a power of two, the bit in question is log2(mask),
rest just works.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91725
The motivation here is the following inner loop in fp16/fp24 -> fp32 expander,
that runs as part of the floating-point DNG decompression in RawSpeed library:
cd380bb9a2/src/librawspeed/decompressors/DeflateDecompressor.cpp (L112-L115)
```
while (!(fp32_fraction & (1 << 23))) {
fp32_exponent -= 1;
fp32_fraction <<= 1;
}
```
(https://godbolt.org/z/r13YMh)
As one might notice, that loop is currently uncountable, and that whole code stays scalar.
Yet, it is rather trivial to make that loop countable:
https://godbolt.org/z/do8WMz
and we can prove that via alive2:
https://alive2.llvm.org/ce/z/7vQnji (ha nice, isn't it?)
... and that allow for the whole fp16->fp32 code to vectorize:
https://godbolt.org/z/7hYr13
Now, while i'd love to get there, i feel like i should take it in steps.
For now, this introduces support for the most basic case,
where the bit position is known as a variable,
and the loop *will* go away (has no live-outs other than the recurrence,
no extra instructions in the loop).
I have added sufficient (i believe) test coverage,
and alive2 is happy with those transforms.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D91038
This should've been in 7ad666798f12456d9 but wasn't.
Squashes these twoc commits:
Revert "[clang][cli] Let denormalizer decide how to render the option based on the option class"
This reverts commit 70410a264949101ced3ce3458f37dd4cc2f5af85.
Revert "[clang][cli] Implement `getAllArgValues` marshalling"
This reverts commit 63a24816f561a5d8e28ca7054892bd8602618be4.
When there are constants that have the same structural location, but not
the same value, between different regions, we cannot simply outline the
region. Instead, we find the constants that are not the same in each
location, and promote them to arguments to be passed into the respective
functions. At each call site, we pass the constant in as an argument
regardless of type.
Added/Edited Tests:
llvm/test/Transforms/IROutliner/outlining-constants-vs-registers.ll
llvm/test/Transforms/IROutliner/outlining-different-constants.ll
llvm/test/Transforms/IROutliner/outlining-different-globals.ll
Reviewers: paquette, jroelofs
Differential Revision: https://reviews.llvm.org/D87294
Also include a special case pattern to use vmv.v.x vd, zero when
the argument is 0.0.
Reviewed By: khchen
Differential Revision: https://reviews.llvm.org/D93672
741978d727 made clang produce output that's 2x as large at least in
sanitizer builds. https://reviews.llvm.org/D83892#2470185 has a
standalone repro.
This reverts the following commits:
Revert "[clang][cli] Port CodeGenOpts simple string flags to new option parsing system"
This reverts commit 95d3cc67caac04668ef808f65c30ced60ed14f5d.
Revert "[clang][cli] Port LangOpts simple string based options to new option parsing system"
This reverts commit aec2991d083a9c5b92f94d84a7b3a7bbed405af8.
Revert "[clang][cli] Streamline MarshallingInfoFlag description"
This reverts commit 27b7d646886d499c70dec3481dfc3c82dfc43dd7.
Revert "[clang][cli] Port LangOpts option flags to new option parsing system"
This reverts commit 383778e2171b4993f555433745466e211e713548.
Revert "[clang][cli] Port CodeGen option flags to new option parsing system"
This reverts commit 741978d727a445fa279d5952a86ea634adb7dc52.
Update the documentation and add a test.
Build failed: Change SIZE_MAX to std::numeric_limits<int64_t>::max().
Differential Revision: https://reviews.llvm.org/D93419
Current approach doesn't work well in cases when multiple paths are predicted to be "cold". By "cold" paths I mean those containing "unreachable" instruction, call marked with 'cold' attribute and 'unwind' handler of 'invoke' instruction. The issue is that heuristics are applied one by one until the first match and essentially ignores relative hotness/coldness
of other paths.
New approach unifies processing of "cold" paths by assigning predefined absolute weight to each block estimated to be "cold". Then we propagate these weights up/down IR similarly to existing approach. And finally set up edge probabilities based on estimated block weights.
One important difference is how we propagate weight up. Existing approach propagates the same weight to all blocks that are post-dominated by a block with some "known" weight. This is useless at least because it always gives 50\50 distribution which is assumed by default anyway. Worse, it causes the algorithm to skip further heuristics and can miss setting more accurate probability. New algorithm propagates the weight up only to the blocks that dominates and post-dominated by a block with some "known" weight. In other words, those blocks that are either always executed or not executed together.
In addition new approach processes loops in an uniform way as well. Essentially loop exit edges are estimated as "cold" paths relative to back edges and should be considered uniformly with other coldness/hotness markers.
Reviewed By: yrouban
Differential Revision: https://reviews.llvm.org/D79485
Adds ARMBankConflictHazardRecognizer. This hazard recognizer
looks for a few situations where the same base pointer is used and
then checks whether the offsets lead to a bank conflict. Two
parameters are also added to permit overriding of the target
assumptions:
arm-data-bank-mask=<int> - Mask of bits which are to be checked for
conflicts. If all these bits are equal in the offsets, there is a
conflict.
arm-assume-itcm-bankconflict=<bool> - Assume that there will be bank
conflicts on any loads to a constant pool.
This hazard recognizer is enabled for Cortex-M7, where the Technical
Reference Manual states that there are two DTCM banks banked using bit
2 and one ITCM bank.
Differential Revision: https://reviews.llvm.org/D93054
Some member functions of class TargetTransformInfoImplBase in
TargetTransformInfoImpl.h are marked const while others are not. Yet all
of the should be marked const since they are just providing default TTI
values. This patch fixes the inconsistency.
Authored-by: Jinzheng Tu <b1f6c1c4@gmail.com>
Reviewed By: simoll
Differential revision: https://reviews.llvm.org/D93573
This patch defines vfwmacc, vfwnmacc, vfwmsc, vfwnmsac intrinsics
and lower to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Differential Revision: https://reviews.llvm.org/D93693
Currently llvm-readelf might print "OS Specific/Processor Specific/<unknown>"
hint when dumping the ELF file type. The patch teaches llvm-readobj to do the same.
This fixes https://bugs.llvm.org/show_bug.cgi?id=40868
I am removing `Object/elf-unknown-type.test` test because it is not in the right place,
it is outdated and very limited.
The `readobj/ELF/file-types.test` checks the functionality much better.
Differential revision: https://reviews.llvm.org/D93689
Define vmerge/vfmerge intrinsics and lower to V instructions.
Include support for vector-vector vfmerge by vmerge.vvm.
We work with @rogfer01 from BSC to come out this patch.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D93674
Define the vfmin, vfmax IR intrinsics for the respective V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: Evandro Menezes <evandro.menezes@sifive.com>
Differential Revision: https://reviews.llvm.org/D93673
Introduce `Vec` records, each bundling all information related to a single SIMD
lane interpretation. This lets TableGen definitions take a single Vec parameter
from which they can extract information rather than taking multiple redundant
parameters. This commit refactors all of the SIMD load and store instruction
definitions to use the new `Vec`s. Subsequent commits will similarly refactor
additional instruction definitions.
Differential Revision: https://reviews.llvm.org/D93660
Returning int64_t was arbitrarily limiting for wide integer types, and
the functions should handle the full generality of the IR.
Also changes the full form which returns the originally defined
vreg. Add another wrapper for the common case of just immediately
converting to int64_t (arguably this would be useful for the full
return value case as well).
One possible issue with this change is some of the existing uses did
break without conversion to getConstantVRegSExtVal, and it's possible
some without adequate test coverage are now broken.
This patch defines vfmadd/vfnmacc, vfmsac/vfnmsac, vfmadd/vfnmadd,
and vfmsub/vfnmsub lower to V instructions.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Differential Revision: https://reviews.llvm.org/D93691
This patch defines vwmacc[u|su|us] intrinsics and lower to V instructions.
We work with @rogfer01 from BSC to come out this patch.
Authored-by: Roger Ferrer Ibanez <rofirrim@gmail.com>
Co-Authored-by: ShihPo Hung <shihpo.hung@sifive.com>
Differential Revision: https://reviews.llvm.org/D93675
It does not seem to fold offsets but this is not specific
to the flat scratch as getPtrBaseWithConstantOffset() does
not return the split for these tests unlike its SDag
counterpart.
Differential Revision: https://reviews.llvm.org/D93670
Adjust SITargetLowering::allowsMisalignedMemoryAccessesImpl for
unaligned flat scratch support. Mostly needed for global isel.
Differential Revision: https://reviews.llvm.org/D93669