The constant trunc/ext may not be the optimal pre-condition,
but I think that handles the common cases.
Example of Alive2 proof:
https://alive2.llvm.org/ce/z/sREeLC
This is another step towards canonicalizing to the intrinsics.
Narrowing was identified as source of potential regression for
abs(), so we need to handle this for min/max - see:
https://llvm.org/PR48816
If this is not enough, we could process intrinsics in
the trunc-driven matching in canEvaluateTruncated().
This patch is a follow up to D94821 to ensure the correct behavior of the
general directive structure checker.
This patch add the generation of the Enter function declaration for clauses in
the TableGen backend.
This helps to ensure each clauses declared in the TableGen file has at least
a basic check.
Reviewed By: kiranchandramohan
Differential Revision: https://reviews.llvm.org/D95108
When we have a zeroext parameter, emit G_ASSERT_ZEXT.
Add a check that we actually emit it.
This is a 0.1% code size win on CTMark/7zip and CTMark/consumer-typeset at -Os.
Differential Revision: https://reviews.llvm.org/D95567
Part of the gold test added in 1487747e990ce9f8851f3d92c3006a74134d7518
relies on more recent fixes to gold that fix the plugin behavior with
--export-dynamic-symbol and --dynamic-list. Extract those parts of the
new test into a v1.16 test.
A follow up patch will add support for commuting operands or
changing opcode to vfmacc and friends.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95662
When replacing the dst reg with the src reg, we need to make sure that we
propagate the dst reg's register class through to the src.
Otherwise, we aren't meeting the requirements for G_ASSERT_ZEXT, and so the
verifier will fail.
Differential Revision: https://reviews.llvm.org/D95708
Rather than materializing the 0xffff immediate for the AND, use
a shift left to remove the upper bits and then shift in zeros
from the right.
This pattern occurs when type legalizing an i16 right shift.
I've implemented this with custom selection code for a number of
reasons. I've limited this to the AND having a single use. We need
to compensate for SimplifyDemandedBits altering the AND mask. I'm
using *W opcodes on RV64. We may want to generlize this in the
future. For all these reason it seemed easiest to do it this way.
Reviewed By: luismarques
Differential Revision: https://reviews.llvm.org/D95774
Instead of using ConstraintSystem::negate when adding new constraints,
flip the condition in IR.
The main advantage is that EQ predicates can be represented by 2
constraints, which makes negating based on the constraint tricky. The IR
condition can easily negated.
This would assert with amdgpu-spill-sgpr-to-vgpr disabled when trying to
spill the FP.
Fixes: SWDEV-262704
Reviewed By: RamNalamothu
Differential Revision: https://reviews.llvm.org/D95768
Under the softfp calling convention, we are often left with
VMOVRRD(extract(bitcast(build_vector(a, b, c, d)))) for the return value
of the function. These can be simplified to a,b or c,d directly,
depending on the value of the extract.
Big endian is a little different because the bitcast switches the lanes
around, meaning we end up with b,a or d,c.
Differential Revision: https://reviews.llvm.org/D94989
If we instantiate self-referenced anonymous records in foreach and
multiclass, the NAME value will point to incorrect record. It's because
anonymous name is resolved too early.
This patch adds AnonymousNameInit to represent an anonymous record name.
When instantiating an anonymous record, it will update the referred name.
Differential Revision: https://reviews.llvm.org/D95309
The AArch64 DAG combine added by D90945 & D91433 extends the index
of a scalable masked gather or scatter to i32 if necessary.
This patch removes the combine and instead adds shouldExtendGSIndex, which
is used by visitMaskedGather/Scatter in SelectionDAGBuilder to query whether
the index should be extended before calling getMaskedGather/Scatter.
Reviewed By: david-arm
Differential Revision: https://reviews.llvm.org/D94525
D90687 introduced a crash:
llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int):
Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() &&
"No decisions should have been taken at this point"' failed.
when compiling the following C code:
typedef struct {
char a;
} b;
b *c;
int d, e;
int f() {
int g = 0;
for (; d; d++) {
e = 0;
for (; e < c[d].a; e++)
g++;
}
return g;
}
with:
clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -
This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.
Reported by @iajbar.
Reviewed By: fhahn
Differential Revision: https://reviews.llvm.org/D94869
This adds a DAG combine for converting sext_inreg of VGetLaneu into
VGetLanes, providing the types match correctly.
Differential Revision: https://reviews.llvm.org/D95073
Now that Loop Peeling has been fixed (80cdd30eb90c3509bf315f1fa1369483e2448bbd),
enable the dominance check by default.
This reverts commit 3b5d36ece21f9baf96d82944b0165cb352443bee.
We can only legally extract from the lowest 128-bit subvector, so extract the correct subvector to allow us to handle 256/512-bit vector element extracts.
Under SoftFP calling conventions, we can be left with
extract(bitcast(BUILD_VECTOR(VMOVDRR(a, b), ..))) patterns that can
simplify to a or b, depending on the extract lane.
Differential Revision: https://reviews.llvm.org/D94990
Correct integer constants like `1UL << 63` to `UINT64_C(1) << 63` in
order to make them work on 32-bit machines. Tested on both an i386
and x86_64 machines.
Reviewed By: mgorny
Differential Revision: https://reviews.llvm.org/D95724
If we determine that the invariant path through the loop has no effects,
we can directly branch to the exit block, instead to unswitching first.
Besides avoiding some extra work (unswitching first, then deleting the
loop again) this allows to be more aggressive than regular unswitching
with respect to cost-modeling. This approach should always be be
desirable.
This is similar in spirit to D93734, just that it uses the previously
added checks for loop-unswitching.
I tried to add the required no-op checks from scratch, as we only check
a subset of the loop. There is potential to unify the checks with
LoopDeletion, at the cost of adding a predicate whether a block should
be considered.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D95468
The reduction of a sanitizer build failure when enabling the dominance check (D95335) showed that loop peeling also needs to take care of scope duplication, just like loop unrolling (D92887).
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D95544
This primarily occurs with isel patterns using vnot. This reduces
the number of variants in the isel tables.
We generally canonicalize build_vectors of constants to the RHS. I think
we might fail if there is a bitcast on the build_vector, but that
should be easy to fix if we can find a case. Usually the
bitcast is introduced by type legalization or lowering. It's
likely canonicalization would have already occured.
To set non-default rounding mode user usually calls function 'fesetround'
from standard C library. This way has some disadvantages.
* It creates unnecessary dependency on libc. On the other hand, setting
rounding mode requires few instructions and could be made by compiler.
Sometimes standard C library even is not available, like in the case of
GPU or AI cores that execute small kernels.
* Compiler could generate more effective code if it knows that a particular
call just sets rounding mode.
This change introduces new IR intrinsic, namely 'llvm.set.rounding', which
sets current rounding mode, similar to 'fesetround'. It however differs
from the latter, because it is a lower level facility:
* 'llvm.set.rounding' does not return any value, whereas 'fesetround'
returns non-zero value in the case of failure. In glibc 'fesetround'
reports failure if its argument is invalid or unsupported or if floating
point operations are unavailable on the hardware. Compiler usually knows
what core it generates code for and it can validate arguments in many
cases.
* Rounding mode is specified in 'fesetround' using constants like
'FE_TONEAREST', which are target dependent. It is inconvenient to work
with such constants at IR level.
C standard provides a target-independent way to specify rounding mode, it
is used in FLT_ROUNDS, however it does not define standard way to set
rounding mode using this encoding.
This change implements only IR intrinsic. Lowering it to machine code is
target-specific and will be implemented latter. Mapping of 'fesetround'
to 'llvm.set.rounding' is also not implemented here.
Differential Revision: https://reviews.llvm.org/D74729
A couple patterns used bitconvert on the immAllOnesV, but
the isel matching uses ISD::isBuildVectorAllOnes which
is able to look through bitcasts. So isel patterns don't need
to do it explicitly.
This reverts commit 6e58539659aea0ee621c7e267d825aa82d4e7e96.
This failed in http://lab.llvm.org:8011/#/builders/123/builds/2676. I guess
were're still missing some symbols, but unfortunately the specific error is
masked by a bug in python/lit that hides stderr. This test will have to remain
disabled on Windows until I can get help to debug it further.
We need to add a mask to the shift amount for these operations
to use the FSR/FSL instructions. We were previously doing this
in isel patterns, but custom lowering will make the mask
visible to optimizations earlier.
This testcase was failing on windows due to missing definitions. This commit
adds definitions of the missing symbols (as absolute symbols) to eliminate the
errors.
If we're going to end up expanding anyway, we should do it early
so we don't create extra operations to handle the bytes added by
promotion.
This is helfpul on RISCV where we might have to promote i16 all
the way to i64.
Differential Revision: https://reviews.llvm.org/D95756
This patch removes some options that have been duplicated in
LTOCodeGenerator and instead use lto::Config directly to manage the
options.
This is a cleanup after 6a59f0560648.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D95738