Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.
Differential Revision: https://reviews.llvm.org/D106578
The current implementation of displaying .stack_size information
presumes that each entry represents a single function but this is not
always the case. For example with the use of ICF multiple functions can
be represented with the same code, meaning that the address found in a
.stack_size entry corresponds to multiple function symbols.
This change allows multiple function names to be displayed when
appropriate.
Differential Revision: https://reviews.llvm.org/D105884
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.
Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.
D106284 did this for SelectionDAG. This is the corresponding fix for
GlobalISel.
Differential Revision: https://reviews.llvm.org/D106451
Codegen for the raw/struct buffer access intrinsics would update the
offset in the MMO to reflect the combined offset, if it was known to be
constant. If the combined offset was not known to be constant, or if
there was an index, it would set the offset in the MMO to 0. This is
unsafe because it makes it look like the access does not alias with
another access with a fixed non-zero offset.
Fix these cases by setting the pointer in the MMO to null, to reflect
the fact that we do not have any known IR value pointer + constant
offset for the access.
Differential Revision: https://reviews.llvm.org/D106284
The register class required for some MVE loads/stores is more
constrained than the register we use when creating postinc. Make sure we
constrain the register class to keep the code correct.
isSafeToSpeculateStore() looks for a preceding store to the same
location to make sure that introducing a new store of the same
value is safe. It currently bails on intervening mayHaveSideEffect()
instructions. However, I believe just checking mayWriteToMemory()
is sufficient there -- we just need to make sure that we know which
value was stored, we don't care if we can unwind in the meantime.
While looking into this, I started having some doubts about the
correctness of the transform with regard to thread safety. While
we don't try to hoist non-simple stores, I believe we also need
to make sure that the preceding store is simple as well. Otherwise
we could introduce a spurious non-atomic write after an atomic write
-- under our memory model this would result in a subsequent undef
atomic read, even if the second write stores the same value as the
first.
Example: https://alive2.llvm.org/ce/z/q_3YAL
Differential Revision: https://reviews.llvm.org/D106742
Fixes more casts to `<FixedVectorType>` for the cases where the
instruction is a Insert/ExtractElementInst.
For fixed-width, this part of truncateToMinimalBitWidths is tested by
AArch64/type-shrinkage-insertelt.ll. I attempted to write a test case for this part
of truncateToMinimalBitWidths which uses scalable vectors, but was unable to add
one. The tests in type-shrinkage-insertelt.ll rely on scalarization to create extract
element instructions for instance, which is not possible for scalable vectors.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D106163
The pattern for vector_splice with Index equal or bigger than
zero was misplaced in the AddedComplexity = 1 pattern in the AArch64
tablegen file. This patch fixes it by removing vector_splice pattern
from inside AddedComplexity = 1.
Need to fix several cost-related problems. The final type may be defined
incorrectly because of to early definition (we may end up with the wider
type), the CommonCost should not be redefined in ExtractElements
cost related calculations and the shuffle of the final insertelements
vectors should be calculated as a cost of single vector permutations
+ costs of two vector permutations for other n-1 incoming vectors.
Differential Revision: https://reviews.llvm.org/D106578
Tests with multiple benchmarks, like Embench [1], showed that the
CallPenalty magic number has the most influence on inlining decisions
when optimizing for size.
On the other hand, there was no good default value for this parameter.
Some benchmarks profited strongly from a reduced call penalty. On
example is the picojpeg benchmark compiled for RISC-V, which got 6%
smaller with a CallPenalty of 10 instead of 12. Other benchmarks
increased in size, like matmult.
This commit makes the compromise of turning the magic number constant of
CallPenalty into a configurable value. This introduces the flag
`--inline-call-penalty`. With that flag users can fine tune the inliner
to their needs.
The CallPenalty constant was also used for loops. This commit replaces
the CallPenalty constant with a new LoopPenalty constant that is now
used instead.
This is a slimmed down version of https://reviews.llvm.org/D30899
[1]: https://github.com/embench/embench-iot
Differential Revision: https://reviews.llvm.org/D105976
Instead of getting the VPValue for the stored IR values through the
current plan, use the stored value of the recipes directly.
This way, the correct VPValues are used if the store recipes have been
modified in the VPlan and the IR value is not correct any longer. This
can happen, e.g. due to D105008.
Add folds to instcombine to support the removal of select instruction when the masked_load is guaranteed to zero the same lanes, i.e. select(mask, mload(,,mask,0), 0) -> mload(,,mask,0).
Patch originally authored by @paulwalker-arm
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D106376
This patch implements vector_splice in tablegen for all cases when the
Immediate is positive and lower than the known minimum value of
a scalable vector.
Vector_splice can be implemented using SVE instruction EXT.
For instance :
@llvm.experimental.vector.splice(Vector_1, Vector_2, Imm)
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <B, C, D, E>
EXT Vector_1, Vector_2, Imm // Vector_1 = B, C, D + Vector_2 = E
Depends on D105633
Differential Revision: https://reviews.llvm.org/D106273
This patch implements vector_splice in tablegen for:
a) when the immediate is equal to -1 (Imm==1) and uses:
INSR + LASTB
For instance :
@llvm.experimental.vector.splice(Vector_1, Vector_2, -1)
@llvm.experimental.vector.splice(<A,B,C,D>, <E,F,G,H>, 1) ==> <D, E, F, G>
LAST RegLast, Vector_1 // RegLast = D
INSR Res, (Vector_1 >> 1), RegLast // Res = D + E, F, G
Differential Revision: https://reviews.llvm.org/D105633
I have added a new FastMathFlags parameter to getArithmeticReductionCost
to indicate what type of reduction we are performing:
1. Tree-wise. This is the typical fast-math reduction that involves
continually splitting a vector up into halves and adding each
half together until we get a scalar result. This is the default
behaviour for integers, whereas for floating point we only do this
if reassociation is allowed.
2. Ordered. This now allows us to estimate the cost of performing
a strict vector reduction by treating it as a series of scalar
operations in lane order. This is the case when FP reassociation
is not permitted. For scalable vectors this is more difficult
because at compile time we do not know how many lanes there are,
and so we use the worst case maximum vscale value.
I have also fixed getTypeBasedIntrinsicInstrCost to pass in the
FastMathFlags, which meant fixing up some X86 tests where we always
assumed the vector.reduce.fadd/mul intrinsics were 'fast'.
New tests have been added here:
Analysis/CostModel/AArch64/reduce-fadd.ll
Analysis/CostModel/AArch64/sve-intrinsics.ll
Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll
Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll
Differential Revision: https://reviews.llvm.org/D105432
This patch extends support for (scalable-vector) splats in the
DAGCombiner via the `ISD::matchBinaryPredicate` function, which enable a
variety of simple combines of constants.
Users of this function may now have to distinguish between
`BUILD_VECTOR` and `SPLAT_VECTOR` vector operands. The way of dealing
with this in-tree follows the approach added for
`ISD::matchUnaryPredicate` implemented in D94501.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D106575
to encode the constants for DW_AT_data_member_location.
Summary: In DWARF v3, DW_FORM_data4/8 in
DW_AT_data_member_location are interpreted as location
list pointers. Interpreting constants as pointers is
not expected, so we use DW_FORM_udata to encode the
constants.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D105687
Ensure that libSupport does not carry any static global initializer.
libSupport can be embedded in use cases where we don't want to load all
cl::opt unless we want to parse the command line.
ManagedStatic can be used to enable lazy-initialization of globals.
Summary: yaml2obj shouldn't create the string table that isn't needed
- doing so wastes time and disk space.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D106420
Ensure that libSupport does not carry any static global initializer.
libSupport can be embedded in use cases where we don't want to load all
cl::opt unless we want to parse the command line.
ManagedStatic can be used to enable lazy-initialization of globals.
The motivation for this caching wasn't clear, remove it in an effort to
simplify the code and make libSupport free of global dynamic constructor.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D106206
If the branch isn't `unpredictable`, and it is predicted to *not* branch
to the block we are considering speculatively executing,
then it seems counter-productive to execute the code that is predicted not to be executed.
Reviewed By: spatel
Differential Revision: https://reviews.llvm.org/D106650
Makes clang crash: https://reviews.llvm.org/D105008#2903350
This reverts commit d2a73fb44ea0b8c981e4b923f811f18793fc4770.
Also revert a minor formatting follow-up:
This reverts commit 82834a673246f27a541ffcc57e0eb65b008102ef.
Begin replacing individual getMemIntrinsicNode calls and setup (for X86ISD::VBROADCAST_LOAD + X86ISD::SUBV_BROADCAST_LOAD opcodes) with this getBROADCAST_LOAD helper.
This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D106707
If the rotation amount is a known splat, perform the modulo on the splat source, and then perform the splat. That way the amount-extension performed later by LowerScalarVariableShift can fold the splats away without any multiple-use issues.
Fixes one of the concerns raised on D104156
Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.
Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.
From LangRef:
> if the parameter or return pointer is null, poison value is
> returned or passed instead. The nonnull attribute should be
> combined with the noundef attribute to ensure a pointer is not
> null or otherwise the behavior is undefined.
Dropping noundef is sufficient to prevent UB. Including nonnull
in this method just muddies the semantics.