Complete support for fast8:
- amend shadow size and mapping in runtime
- remove fast16 mode and -dfsan-fast-16-labels flag
- remove legacy mode and make fast8 mode the default
- remove dfsan-fast-8-labels flag
- remove functions in dfsan interface only applicable to legacy
- remove legacy-related instrumentation code and tests
- update documentation.
Reviewed By: stephan.yichao.zhao, browneee
Differential Revision: https://reviews.llvm.org/D103745
This patch optimizes (and r i) to
(BCLRI (BCLRI r, i0), i1) in which i = ~((1<<i0) | (1<<i1)).
or
(BCLRI (ANDI r, i0), i1) in which i = i0 & ~(1<<i1).
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103743
Needs to be discussed more.
This reverts commit 255a5c1baa6020c009934b4fa342f9f6dbbcc46
This reverts commit df2056ff3730316f376f29d9986c9913b95ceb1
This reverts commit faff79b7ca144e505da6bc74aa2b2f7cffbbf23
This reverts commit d2a9020785c6e02afebc876aa2778fa64c5cafd
This uses 3 bits of data instead of 7. I'm wondering if we can use
bitfields for the lookup table key where this would matter.
I also name the shift_amount template to log2 since it is used
with more than just an srl now.
This allows to lower an LDS variable into a kernel structure
even if there is a constant expression used from different
kernels.
Differential Revision: https://reviews.llvm.org/D103655
In the situation where we need to replace a constant operand C from a constant expression CE
by an instruction NI, it not possible without converting CE itself into an instruction. This
utility helps to convert the given set of constant expression operands from an instruction I
into a corresponding set of instructions.
The current use-case for this utility is from the patches - https://reviews.llvm.org/D103225
and https://reviews.llvm.org/D103655.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103661
Noticed via code inspection. We changed the semantics of the IR when we added mustprogress, and we appear to have not updated this location.
Differential Revision: https://reviews.llvm.org/D103834
Fix a bug introduced by f6f6f6375d1a4bced8a6e79a78726ab32b8dd879.
Now for empty PHIs, instead of crashing on assert(hasVal()) in
Optional's internals, we'll return NoAlias, as we did before that patch.
Differential Revision: https://reviews.llvm.org/D103831
getRelocatedSection interface should not check that the object file is
relocatable, as executable files may have relocations preserved with
`--emit-relocs` linker flag. The relocations are useful in context of post-link
binary analysis for function reference identification. For example, BOLT relies
on relocations to perform function reordering.
Reviewed By: MaskRay, jhenderson
Differential Revision: https://reviews.llvm.org/D102296
So far, support for x86_64-linux-gnux32 has been handled by explicit
comparisons of Triple.getEnvironment() to GNUX32. This worked as long as
x86_64-linux-gnux32 was the only X32 environment to worry about, but we
now have x86_64-linux-muslx32 as well. To support this, this change adds
an isX32() function and uses it. It replaces all checks for GNUX32 or
MuslX32 by isX32(), except for the following:
- Triple::isGNUEnvironment() and Triple::isMusl() are supposed to treat
GNUX32 and MuslX32 differently.
- computeTargetTriple() needs to be able to transform triples to add or
remove X32 from the environment and needs to map GNU to GNUX32, and
Musl to MuslX32.
- getMultiarchTriple() completely lacks any Musl support and retains the
explicit check for GNUX32 as it can only return x86_64-linux-gnux32.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D103777
Unrolling with more iterations than MaxTripCount is pointless, as
those iterations can never be executed. As such, we clamp ULO.Count
to MaxTripCount if it is known. This means we no longer need to
consider iterations after MaxTripCount for exit folding, and the
CompletelyUnroll flag becomes independent of ULO.TripCount.
Differential Revision: https://reviews.llvm.org/D103748
The motivation here is simple loops with unsigned induction variables w/non-one steps. A toy example would be:
for (unsigned i = 0; i < N; i += 2) { body; }
Given C/C++ semantics, we do not get the nuw flag on the induction variable. Given that lack, we currently can't compute a bound for this loop. We can do better for many cases, depending on the contents of "body".
The basic intuition behind this patch is as follows:
* A step which evenly divides the iteration space must wrap through the same numbers repeatedly. And thus, we can ignore potential cornercases where we exit after the n-th wrap through uint32_max.
* Per C++ rules, infinite loops without side effects are UB. We already have code in SCEV which relies on this. In LLVM, this is tied to the mustprogress attribute.
Together, these let us conclude that the trip count of this loop must come before unsigned overflow unless the body would form a well defined infinite loop.
A couple notes for those reading along:
* I reused the loop properties code which is overly conservative for this case. I may follow up in another patch to generalize it for the actual UB rules.
* We could cache the n(s/u)w facts. I left that out because doing a pre-patch which cached existing inference showed a lot of diffs I had trouble fully explaining. I plan to get back to this, but I don't want it on the critical path.
Differential Revision: https://reviews.llvm.org/D103118
Include known bits support so we know we don't need to zext the
output if the input was already zero extended.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D103757
This is a modified version of a patch by tolziplohu with a style change, and most importantly, a revised commit message.
inttoptr for a non-integral address space is currently ill defined in the LangRef. Figuring out exactly what the dynamic semantics of such a cast would be is hard, and not yet settled. Despite that, we still need to go ahead and implement something in RS4GC for a couple of reasons.
First, as a simple consistency argument. We're apparently added support for constexpr inttoptrs a while back, and even have tests which exercised them. Having a lack of constant folding trigger a crash during lowering is non-ideal.
Second, and more fundementally, the optimizer is allowed to insert undefined constructs in unreachable code. At the same time, we can't assume that dynamically dead code is always pruned before lowering. As a result, we must assume that inttoptrs can occur (even if completely ill defined) along dead paths. We need the lowering to not crash. The stackmaps produced can be garbage (as the assumption is the code is dynamically dead), but the lowering itself can't crash.
Differential Revision: https://reviews.llvm.org/D103492
Add in the ability of parsing symbol table for 64 bit object.
Reviewed By: jhenderson, DiggerLin
Differential Revision: https://reviews.llvm.org/D85774
We need to adjust the FMF propagation on at least
one of these transforms as discussed in:
https://llvm.org/PR49654
...so this should make it easier to intersect flags.
This can cause the vectorizer to generate interleaved scalar
code which might be ok for some CPUs, but definitely not all.
Disable it to restore the previous scalar behavior.
Differential Revision: https://reviews.llvm.org/D103787
The non-DOT printing does not include the successors of VPregionBlocks.
This patch use the same style for printing successors as for
VPBasicBlock.
I think the printing of successors could be a bit improved further, as
at the moment it is hard to ensure a check line matches all successors.
But that can be done as follow-up.
Reviewed By: a.elovikov
Differential Revision: https://reviews.llvm.org/D103515
* Merged some functions into a single function, to make the costs more obvious.
* Moved scalable-mem-op-cost-model.ll -> sve-ldst.ll to be more consistent with other filenames.
This fixes an issue in BasicTTIImpl.h where it tries to do a
cast<FixedVectorType> on a scalable vector type in order to get the
scalarization cost. Because scalarization of scalable vectors is not
supported, we return Invalid instead.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103798
This patch is an extension of D103421. It allows the InstCombiner to
generate the negated form of integer scalable-vector splats. It can
technically handle fixed-length vectors too but those are completely
covered by the preceding logic.
This enables extra combining opportunities for scalable vector types.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103801
This allows to convert the add instruction to s_addk_i32 and
v_add_nc_u32 instead of needing v_add_co_u32 when converting to a VALU
instruction.
Differential Revision: https://reviews.llvm.org/D103322
This testcase is failing on z/OS because the regex doesn't match the spelling. This patch modifies the testcase to use the error substitution so it will pass on all platforms.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D103804
This patch extends the various "isXXX" functions of the `Constant` class
to include scalable-vector splats.
In several "isXXX" functions, code that was separately inspecting
`ConstantVector` and `ConstantDataVector` was unified to use
`getSplatValue`, which already includes support for said splats.
In the varous "isNotXXX" functions, code was added to check whether the
scalar splat value -- if any -- satisfies the predicate.
An extra fix for `isNotMinSignedValue` was included, as it previously
crashed when passed a scalable-vector type because it unconditionally
cast to `FixedVectorType`
These changes address numerous missed optimizations, a compiler crash
mentioned above and -- perhaps most egregiously -- an infinite loop in
InstCombine due to the compiler breaking canonical form when it failed
to pick up on a splat in a select instruction.
Test cases have been added to cover as many of these functions as
possible, though existing coverage is slim; it doesn't appear that there
are any in-tree uses of `Constant::isNegativeZeroValue`, for example.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D103421
Before packing LDS globals into a sorted structure, make sure that
their alignment is properly updated based on their size. This will make
sure that the members of sorted structure are properly aligned, and
hence it will further reduce the probability of unaligned LDS access.
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D103261
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.
Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.
Differential Revision: https://reviews.llvm.org/D103082
This patch abstract Calls in Inliner:run() to InlineOrder.
With this patch, it's possible to customize the inlining order,
e.g. use queue or priority queue.
Reviewed By: kazu
Differential Revision: https://reviews.llvm.org/D103315
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:
newbound = min(n, c)
while (iv < n) { while(iv < newbound) {
A A
if (iv < c) B
B C
C }
} if (iv != n) {
while (iv < n) {
A
C
}
}
Differential Revision: https://reviews.llvm.org/D102234
This patch marks the induction increment of the main induction variable
of the vector loop as NUW when not folding the tail.
If the tail is not folded, we know that End - Start >= Step (either
statically or through the minimum iteration checks). We also know that both
Start % Step == 0 and End % Step == 0. We exit the vector loop if %IV +
%Step == %End. Hence we must exit the loop before %IV + %Step unsigned
overflows and we can mark the induction increment as NUW.
This should make SCEV return more precise bounds for the created vector
loops, used by later optimizations, like late unrolling.
At the moment quite a few tests still need to be updated, but before
doing so I'd like to get initial feedback to make sure I am not missing
anything.
Note that this could probably be further improved by using information
from the original IV.
Attempt of modeling of the assumption in Alive2:
https://alive2.llvm.org/ce/z/H_DL_g
Part of a set of fixes required for PR50412.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D103255