This patch teaches SCEVExpander to directly preserve LCSSA.
As it is currently, SCEV does not look through PHI nodes in loops,
as it might break LCSSA form. Once SCEVExpander can preserve
LCSSA form, it should be safe for SCEV to look through PHIs.
To preserve LCSSA form, this patch uses formLCSSAForInstructions
on operands of newly created instructions, if the definition is inside
a different loop than the new instruction.
The final value we return from expandCodeFor may also need LCSSA
phis, depending on the insert point. As no user for it exists there yet,
create a temporary instruction at the insert point, which can be passed
to formLCSSAForInstructions. This temporary instruction is removed
after LCSSA construction.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D71538
There's a slight difference in functionality with the new CHECK lines:
before, we allowed either -0.0 or 0.0 for maxnum/minnum. That matches
the definition, but we should always get a deterministic result from
constant folding within the compiler, so now we assert that we got
the single expected result in all cases.
A list of target features is disabled when there is no hardware
floating-point support. This is the case when one of the following
options is passed to clang:
- -mfloat-abi=soft
- -mfpu=none
This option list is missing, however, the extension "+nofp" that can be
specified in -march flags, such as "-march=armv8-a+nofp".
This patch also disables unsupported target features when nofp is passed
to -march.
Differential Revision: https://reviews.llvm.org/D82948
This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.
This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.
It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)
Original Patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79163
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types. Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization. Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.
For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.
To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.
Original patch by Pierre van Houtryve
Differential Revision: https://reviews.llvm.org/D79162
I have added tests to:
CodeGen/AArch64/sve-intrinsics-int-arith.ll
for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:
1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.
Differential revision: https://reviews.llvm.org/D84016
Optimize some specific immediates selection by materializing them with sub/mvn
instructions as opposed to loading them from the constant pool.
Patch by Ben Shi, powerman1st@163.com.
Differential Revision: https://reviews.llvm.org/D83745
Previous patches fixed up all the warnings in this test:
llvm/test/CodeGen/AArch64/sve-sext-zext.ll
and this change simply checks that no new warnings are added in future.
Differential revision: https://reviews.llvm.org/D83205
In DAGTypeLegalizer::SplitVecOp_EXTRACT_SUBVECTOR I have replaced
calls to getVectorNumElements with getVectorMinNumElements, since
this code path works for both fixed and scalable vector types. For
scalable vectors the index will be multiplied by VSCALE.
Fixes warnings in this test:
sve-sext-zext.ll
Differential revision: https://reviews.llvm.org/D83198
In addition to removing phi nodes this patch removes any
landing pad that the dead exit block might have. Without
this fix Verifier complains about a new switch instruction
jumps to a block with a landing pad.
Differential Revision: https://reviews.llvm.org/D84320
cmake was still considering the empty value of ${fake_version_inc}
even if it was not defined.
Reviewed By: vsapsai
Differential Revision: https://reviews.llvm.org/D82847
This introduces the printRelocationsHelper() which now contains the common
code used by both GNU and LLVM output styles.
Differential revision: https://reviews.llvm.org/D83935
If the mask input to getV4X86ShuffleImm8 only refers to a single source element (+ undefs) then canonicalize to a full broadcast.
getV4X86ShuffleImm8 defaults to inline values for undefs, which can be useful for shuffle widening/narrowing but does leave SimplifyDemanded* calls thinking the shuffle depends on unnecessary elements.
I'm still investigating what we should do more generally to avoid these undemanded elements, but broadcast cases was a simpler win.
The test output files whose atime is altered in the test were getting
accessed by Spotlight indexing on macOS, causing them to get an updated
atime and leading to the test not behaving as expected.
Reviewed By: jhenderson, steven_wu
Differential Revision: https://reviews.llvm.org/D84700
We can implement find_first_unset_in() in the same function
if every BitWord we use is first flipped.
Differential Revision: https://reviews.llvm.org/D84717
This introduces the same bug llvm.amdgcn.s.setreg has where if the
user specified an immediate outside of the valid 16-bit range, it will
select into a verifier error.
Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.
Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.
Differential Revision: https://reviews.llvm.org/D84556
We already had CMPXCH8B feature on this CPU for the frontend so
this doesn't have much effect.
The FeatureSlowUAMem16 only matters if someone compiles with
-march=lakemont -msse which doesn't make sense, but is consistent
with all our pre-sse4.2 CPUs. Maybe the feature flag should be
FeatureFastUAMem16 and set on the newer CPUs instead.
Currently GlobalISel doesn't force all VGPR phi operands to VGPRs, so
this hit a case where it was queried with a VGPR and SGPR. This could
arguably be a verifier error, but it's currently not.
Add wrapper classes to to access record's fields. This makes it easier to
pass record information to the diverse functions for code generation.
Reviewed By: jdenny
Differential Revision: https://reviews.llvm.org/D84612
Rather than expanding truncating stores so that vectors are stored one
lane at a time, lower them to a sequence of instructions using
narrowing operations instead, when possible. Since the narrowing
operations have saturating semantics, but truncating stores require
truncation, mask the stored value to manually truncate it before
narrowing. Also, since narrowing is a binary operation, pass in the
original vector as the unused second argument.
Differential Revision: https://reviews.llvm.org/D84377
GlobalISel let through a call to null, which would then fold into the
source operand like any other inline immediate. The SelectionDAG
lowering deletes calls to null and undef as a workaround from before
calls were supported. We should probably drop the special handling
case in the DAG lowering now, since the middle end optimizers delete
null calls anyway.
This needs an implicit def of the super-register in case one of the
lanes isn't defined, similar to copyPhysReg (or the not-VGPR spill
case below). This showed up in GlobalISel testing since it currently
doesn't fold out many undef instructions.
These should probably be inferred from the function on parse, but the
target specific infrastructure currently does not give you a way to do
this. SILowerSGPRSpills early exits without this reporting spills,
which makes it difficult to write a MIR test for.
Substitutions are already reported in the diagnostics appearing before
the input dump in the case of failed directives, and they're reported
in traces (produced by `-vv -dump-input=never`) in the case of
successful directives. However, those reports are not always
convenient to view while investigating the input dump, so this patch
adds the substitution report to the input dump too. For example:
```
$ cat check
CHECK: hello [[WHAT:[a-z]+]]
CHECK: [[VERB]] [[WHAT]]
$ FileCheck -vv -DVERB=goodbye check < input |& tail -8
<<<<<<
1: hello world
check:1 ^~~~~~~~~~~
2: goodbye word
check:2'0 X~~~~~~~~~~~ error: no match found
check:2'1 with "VERB" equal to "goodbye"
check:2'2 with "WHAT" equal to "world"
>>>>>>
```
Without this patch, the location reported for a substitution for a
directive match is the directive's full match range. This location is
misleading as it implies the substitution itself matches that range.
This patch changes the reported location to just the match range start
to suggest the substitution is known at the start of the match. (As
in the above example, input dumps don't mark any range for
substitutions. The location info in that case simply identifies the
right line for the annotation.)
Reviewed By: mehdi_amini, thopre
Differential Revision: https://reviews.llvm.org/D83650
Summary:
Simplify ChildrenGetter to a simple wrapper around a GraphDiff call.
GraphDiff already handles nullptr in children, so the special casing in
clang can also be removed.
Reviewers: kuhar, dblaikie
Subscribers: llvm-commits, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D84713