CommandLine.h is indirectly included in ~50% of TUs when building
clang, and VirtualFileSystem.h is large.
(Already remarked by jhenderson on D70769.)
No behavior change.
Differential Revision: https://reviews.llvm.org/D100957
This patch changes the lowering of SELECT_CC from Legal to Expand for scalable
vector and adds support for scalable vectors in performSelectCombine.
When selecting the nodes to lower in visitSELECT it checks if it is possible to
use SELECT_CC in cases where SETCC is followed by SELECT. visistSELECT checks
if SELECT_CC is legal or custom to replace SELECT by SELECT_CC.
SELECT_CC used to be legal for scalable vector, so the node changes to
SELECT_CC. This used to crash the compiler as there is no support for SELECT_CC
with scalable vectors. So now the compiler lowers to VSELECT instead of
SELECT_CC.
Differential Revision: https://reviews.llvm.org/D100485
If you gave clang the options `--target=arm-pc-windows-msvc` and
`-march=armv8-a+crypto` together, the crypto extension would not be
enabled in the compilation, and you'd see the following warning
message suggesting that the 'armv8-a' had been ignored:
clang: warning: ignoring extension 'crypto' because the 'armv7-a' architecture does not support it [-Winvalid-command-line-argument]
This happens because Triple::getARMCPUForArch(), for the Win32 OS,
unconditionally returns "cortex-a9" (an Armv7 CPU) regardless of
MArch, which overrides the architecture setting on the command line.
I don't think that the combination of Windows and AArch32 _should_
unconditionally outlaw the use of the crypto extension. MSVC itself
doesn't think so: you can perfectly well compile Thumb crypto code
using its AArch32-targeted compiler.
All the other default CPUs in the same switch statement are
conditional on a particular MArch setting; this is the only one that
returns a particular CPU _regardless_ of MArch. So I've fixed this one
by adding a condition, so that if you ask for an architecture *above*
v7, the default of Cortex-A9 no longer overrides it.
Reviewed By: mstorsjo
Differential Revision: https://reviews.llvm.org/D100937
This patch adds incrementally-better support for SPLAT_VECTOR in a
handful of vector combines by changing a few more
isBuildVectorAllOnes/isBuildVectorAllZeros to the equivalent
isConstantSplatVectorAllOnes/Zeros calls.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D100851
This patch fixes a case missed out by D100574, in which RVV scalable
stack offset computations may require three live registers in the case
where the offset's fixed component is 12 bits or larger and has a
scalable component.
Instead of adding an additional emergency spill slot, this patch further
optimizes the scalable stack offset computation sequences to reduce
register usage.
By emitting the sequence to compute the scalable component before the
fixed component, we can free up one scratch register to be reallocated
by the sequence for the fixed component. Doing this saves one register
and thus one additional emergency spill slot.
Compare:
$x5 = LUI 1
$x1 = ADDIW killed $x5, -1896
$x1 = ADD $x2, killed $x1
$x5 = PseudoReadVLENB
$x6 = ADDI $x0, 50
$x5 = MUL killed $x5, killed $x6
$x1 = ADD killed $x1, killed $x5
versus:
$x5 = PseudoReadVLENB
$x1 = ADDI $x0, 50
$x5 = MUL killed $x5, killed $x1
$x1 = LUI 1
$x1 = ADDIW killed $x1, -1896
$x1 = ADD $x2, killed $x1
$x1 = ADD killed $x1, killed $x5
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D100847
When llvm-rc invokes clang for preprocessing, it uses a target
triple derived from the default target. The test verifies that
e.g. _WIN32 is defined when preprocessing.
If running clang with e.g. -target ppc64le-windows-msvc, that
particular arch/OS combination isn't hooked up, so _WIN32 doesn't
get defined in that configuration. Therefore, the preprocessing
test fails.
Instead make llvm-rc inspect the architecture of the default target.
If it's one of the known supported architectures, use it as such,
otherwise set a default one (x86_64). (Clang can run preprocessing
with an x86_64 target triple, even if the x86 backend isn't
enabled.)
Also remove superfluous llvm:: specifications on enums in llvm-rc.cpp.
Allow opting out from preprocessing with a command line argument.
Update tests to pass -no-preprocess to make it not try to use clang
(which isn't a build level dependency of llvm-rc), but add a test that
does preprocessing under clang/test/Preprocessor.
Update a few options to allow them both joined (as -DFOO) and separate
(-D BR), as rc.exe allows both forms of them.
With the verbose flag set, this prints the preprocessing command
used (which differs from what rc.exe does).
Tests under llvm/test/tools/llvm-rc only test constructing the
preprocessor commands, while tests under clang/test/Preprocessor test
actually running the preprocessor.
Differential Revision: https://reviews.llvm.org/D100755
Don't use createBinary() but call the WindowsResource class directly.
The createBinary() function references all supported object file
types and ends up pulling way more from all the underlying libraries
than what is necessary.
This shrinks a stripped llvm-cvtres from 4.6 MB to 463 KB.
Differential Revision: https://reviews.llvm.org/D100833
We were missing some instruction costs when converting vectors of
floating point half types into integers, so I've added those here.
I also manually generated assembly code for each FP->int case and
looked at the number of instructions generated, which meant
adjusting some of the existing costs too.
I've updated an existing test to reflect the new costs:
Analysis/CostModel/AArch64/sve-fptoi.ll
Differential Revision: https://reviews.llvm.org/D99935
This reverts commit ea1a0d7c9ae3e5232a4163fc67efad4aabd51f2b.
While this is strictly more powerful, it is also strictly slower.
InstSimplify intentionally does not perform many folds that it
is allowed to perform, if doing so requires a KnownBits calculation
that will be repeated in InstCombine.
Maybe it's worthwhile to do this here, but that needs a more
explicitly stated motivation, evaluated in a review.
New registers FRM, FFLAGS and FCSR was defined. They represent
corresponding system registers. The new registers are necessary to
properly order floating point instructions in non-default modes.
Differential Revision: https://reviews.llvm.org/D99083
This is a follow-up to https://reviews.llvm.org/D100387.
std::vector is not the best storage container here. My local benchmark (counting
the number of instruction when compiling the sqlite3 amalgamation) yields the
following:
- std::vector<BitVector> -> 5,860,885,896
- SmallVector<BitWord, 0> -> 5,858,991,997
- SmallVector<BitWord> -> 5,817,679,224
Differential Revision: https://reviews.llvm.org/D100744
PHIElimination may insert copy instructions in multiple basic
blocks. Moving debug locations across basic block boundaries would be
misleading as illustrated by the test case.
rdar://75463656
Differential Revision: https://reviews.llvm.org/D100886
This is just a cleanup of the very high level stuff. I'm sure there is
more to update here but I'll leave that to others and/or a followup.
Differential Revision: https://reviews.llvm.org/D100888
FunctionAnalysisManagerCGSCCProxy should not be preserved if any of its
keys may be invalid. Since we are not removing/adding functions in
FuncAttrs, it's fine to preserve it.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D100893
Changing global flags can break builds of projects that include/build
llvm as a sub-project, as the effect is global. Ideally we would
disable this warning at the directory level instead, but the obvious
way (disabling warning D9025) isn't supported. At least we can limit
the effect to only MSVC.
Patch by Jim Radford.
Differential Revision: https://reviews.llvm.org/D100900
This reverts commit 13ec913bdf500e2354cc55bf29e2f5d99e0c709e.
This commit introduces new uses of the overflow checking intrinsics that
depend on implementations in compiler-rt, which Windows users generally
do not link against. I filed an issue (somewhere) to make clang
auto-link the builtins library to resolve this situation, but until that
happens, it isn't reasonable for the optimizer to introduce new link
time dependencies.
It used to be that all of our intrinsics were call instructions, but over time, we've added more and more invokable intrinsics. According to the verifier, we're up to 8 right now. As IntrinsicInst is a sub-class of CallInst, this puts us in an awkward spot where the idiomatic means to check for intrinsic has a false negative if the intrinsic is invoked.
This change switches IntrinsicInst from being a sub-class of CallInst to being a subclass of CallBase. This allows invoked intrinsics to be instances of IntrinsicInst, at the cost of requiring a few more casts to CallInst in places where the intrinsic really is known to be a call, not an invoke.
After this lands and has baked for a couple days, planned cleanups:
Make GCStatepointInst a IntrinsicInst subclass.
Merge intrinsic handling in InstCombine and use idiomatic visitIntrinsicInst entry point for InstVisitor.
Do the same in SelectionDAG.
Do the same in FastISEL.
Differential Revision: https://reviews.llvm.org/D99976
This is a more convoluted form of the same pattern "sext of NSW trunc",
but in this case the operand of trunc was a right-shift,
and the truncation chops off just the zero bits that were shifted-in.
We already special-cased a few interesting patterns,
but that is strictly less powerful than using KnownBits.
So instead get the known bits for the operand of `and`,
and iff all the unset bits of the `and`-mask are known to be zeros
in the operand, we can omit said `and`.
Introduced the cost of thre reverse shuffles for AArch64, currently just
copied the costs for PermuteSingleSrc.
Differential Revision: https://reviews.llvm.org/D100871
I'd reverted this in commit 3b6acb179708ea2f3caf95ace0f134fcbc460333 due to buildbot failures. This patch contains the fix for said issue. I'd forgotten to handle the case where two phis in the same block have different operand order. We canonicalize away from this, but it's still valid IR. The tests included in this change (as opposed to simply having test output changed), crashed without the fix.
Original commit message follows...
This extends the phi handling in isKnownNonEqual with a special case based on invertible recurrences. If we can prove the recurrence is invertible (which many common ones are), we can recurse through the start operands of the recurrence skipping the phi cycle.
(Side note: Instcombine currently does not push back through these cases. I will implement that in a follow up change w/separate review.)
Differential Revision: https://reviews.llvm.org/D99912
af7925b4dd65 added a custom DAG combine for recognizing fp-to-ints of
extract_subvectors that could be lowered to f64x2.convert_low_i32x4_{s,u}
instructions. This commit extends the combines to recognize equivalent
extract_subvectors of fp-to-ints as well.
Differential Revision: https://reviews.llvm.org/D100790