The first source has the same EEW as the destination, but we're
using earlyclobber which prevents them from ever being the same
register.
To workaround this, add a special TIED pseudo to use whenever the
first source and merge operand are the same value. This allows
us to use a single operand for the merge operand and first source
which we can then tie to the destination. A tied source disables
earlyclobber for that operand.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D103211
This patch uses the `getSymbolIndexForFunctionAddress` helper function to print function names for BB address map entries.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102900
These tests will show how (and r i) will be optimized to
(BCLRI (BCLRI r, i0), i1) or (BCLRI (ANDI r, i0), i1) by future
commits.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103359
Clang writes object files by first writing to a .tmp file and then
renaming to the final .obj name. On Windows, if a compile is killed
partway through the .tmp files don't get deleted.
Currently it seems like RemoveFileOnSignal takes care of deleting the
tmp files on Linux, but on Windows we need to call
setDeleteDisposition on tmp files so that they are deleted when
closed.
This patch switches to using TempFile to create the .tmp files we write
when creating object files, since it uses setDeleteDisposition on Windows.
This change applies to both Linux and Windows for consistency.
Differential Revision: https://reviews.llvm.org/D102876
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103429
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103
It's still in use in a few places so we can't delete it yet but there's not
many at this point.
Differential Revision: https://reviews.llvm.org/D103352
This gives a nice message about the location of errors in a large
tablegen file, which is much more useful for users
Differential Revision: https://reviews.llvm.org/D102740
WindowsSupport.h is a public header, however if it gets included, will cause a compile error indicating that llvm/Config/config.h cannot be found, because config.h is a private header. However there is no actual dependency on the private things in this header, so it can be changed to the public config header.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D103370
- A lot of lit tests simply specify the arch minus the triple. On z/OS, this could result in a scenario of some-other-triple-unknown-ibm-zos. This points to an incorrect triple + arch combo.
- To prevent this, isOSzOS change is switched in favour of isOSBinFormatGOFF.
- This is because, the GOFF format is set only if the triple is systemz and if the operating system is GOFF. And currently, there are no other architectures/os's using the GOFF file format.
- An argument could be made that the problematic tests be fixed to explicitly specify the arch-vendor-triple string, but there's a large number of these tests, and adding this stricter scope ensures that we aren't instantiating the incorrect instance of the AsmParser for other platforms when run on z/OS.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D103343
As the existing test unreachable.ll shows, we should be doing more
work to avoid entering unreachable blocks: we should not stop
vectorization just because a PHI incoming value from an unreachable
block cannot be vectorized. We know that particular value will never
be used so we can just replace it with poison.
This patch transforms the sequence
lea (reg1, reg2), reg3
sub reg3, reg4
to two sub instructions
sub reg1, reg4
sub reg2, reg4
Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).
Differential Revision: https://reviews.llvm.org/D101970
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense. This triggers an assertion complaining
it's not loop-invariant.
Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.
Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.
Differential Revision: https://reviews.llvm.org/D102959
As suggested in https://bugs.llvm.org/show_bug.cgi?id=50527, this
moves the DenseMapInfo for APInt and APSInt into the respective
headers, removing the need to include APInt.h and APSInt.h from
DenseMapInfo.h.
We could probably do the same from StringRef and ArrayRef as well.
Differential Revision: https://reviews.llvm.org/D103422
This guarantees they meet this overlap exception:
"The destination EEW is smaller than the source EEW and the overlap
is in the lowest-numbered part of the source register group"
Being a single register guarantees the overlap is always in the
lowerst-number part of the group.
Reviewed By: frasercrmck, khchen
Differential Revision: https://reviews.llvm.org/D103351
Compares are considered a narrowing operation for register overlap.
I believe for LMUL<=1 they meet this exception to allow overlap
"The destination EEW is smaller than the source EEW and the overlap is in the
lowest-numbered part of the source register group"
Both the result and the sources will occupy a single register for
LMUL<=1 so the overlap would always be in the "lowest-numbered part".
Reviewed By: frasercrmck, HsiangKai
Differential Revision: https://reviews.llvm.org/D103336
D101841 added this test. It appears to generate different outcome on different platforms.
Make it to only call -coro-split instead of entire O2 pipeline to simplify the test flow.
Hope this will make the test more robust.
Reviewed By: djtodoro
Differential Revision: https://reviews.llvm.org/D103418
Implemented better scheme for perfect/shuffled matches of the gather
nodes which allows to fix the performance regressions introduced by
earlier patches. Starting detecting matches for broadcast nodes and
extractelement gathering.
Differential Revision: https://reviews.llvm.org/D102920
This change adds tests specifically for --parent-recurse-depth, --quiet
and -o. The test for -o found a typo in an error message which is also
fixed in this change.
Differential Revision: https://reviews.llvm.org/D103250
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.
This patch is a result of discussion in D102574.
Differential Revision: https://reviews.llvm.org/D102698
The current loop or any of its sub-loops may be infinite. Unless the
function or the loops are marked as mustprogress, this in itself makes
the loop *not* dead.
This patch moves the logic to check whether the current loop is finite
or mustprogress to `isLoopDead` and also extends it to check the
sub-loops. This should fix PR50511.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D103382
If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff5574.
This patch updates the code to freeze the index, unless it is proven to
not be poison.
Reviewed By: nlopes
Differential Revision: https://reviews.llvm.org/D103378
This patch extends the RISC-V lowering of the 'fastcc' calling
convention to vector types, both fixed-length and scalable. Without this
patch, any function passing or returning vector types by value would
throw a compiler error.
Vectors are handled in 'fastcc' much as they are in the default calling
convention, the noticeable difference being the extended set of scalar
GPR registers that can be used to pass vectors indirectly.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D102505
This patch adds TargetStackID::WasmLocal. This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.
For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1. SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there. Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.
Differential Revision: https://reviews.llvm.org/D101140
Currently, X86 backend only has a global one-size-fits-all `FeatureFastVariableShuffle` feature,
which controls profitability of both the cross-lane and per-lane variable shuffles.
I guess, this has been fine so far.
But at least on AMD Zen 3, while per-line variable shuffles (e.g. `VPSHUFB`)
are as fast as as shuffles with fixed/immediate mask,
while lane-crossing shuffles, e.g. `VPERMPS` is performing worse.
So to get the benefits of variable-mask shuffles, but not the drawbacks of lane-crossing shuffles,
as suggested by @RKSimon, split the feature flag into two.
Differential Revision: https://reviews.llvm.org/D103274
The pipes.quote function quotes using single quotes, the same goes
for the newer shlex.quote (which is the preferred form in Python 3).
This isn't suitable for quoting in command lines on Windows (and the
documentation for shlex.quote even says it's only usable for Unix
shells).
In general, the python subprocess.list2cmdline function should do
proper quoting for the platform's current shell. However, it doesn't
quote the ';' char, which we pass within some arguments to run.py.
Therefore use the custom reimplementation from lit.TestRunner which
is amended to quote ';' too.
The fact that arguemnts were quoted with single quotes didn't matter
for command lines that were executed by either bash or the lit internal
shell, but if executing things directly using subprocess.call, as in
_supportsVerify, the quoted path to %{cxx} fails to be resolved by the
Windows shell.
This unlocks 114 tests that previously were skipped on Windows.
Differential Revision: https://reviews.llvm.org/D103310
The test CodeGen/PowerPC/vector-constrained-fp-intrinsics.ll checks code
generation for constrained floating point intrinsics. Many test cases in
it were implemented using operations on constants. Constant folding of
constrained intrinsics would make these test cases almost useless,
because they would check only constant loading.
To keep the tests useful, operations on constants were replaced with
operations on function parameters.
Differential Revision: https://reviews.llvm.org/D103259
This reverts commit 4f2fd3818b0eb26806f366bc37369349aeedcaf9.
The Linux kernel fails to build after this commit. See
https://reviews.llvm.org/D99481 for a reproducer.
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
The code gen for f32 to i32 bitcast is not currently the most efficient;
this patch removes some unneccessary instructions gerneated.
Differential revision: https://reviews.llvm.org/D100782