If we cannot otherwise use a VMOVimm/VMOVFPimm/VMVNimm, fall back to
producing a VDUP(const) as opposed to a constant pool load. This will at
least be smaller codesize and can allow the VDUP to be folded into other
instructions.
Differential Revision: https://reviews.llvm.org/D103808
- Add `-enable-ocl-mangling-mismatch-workaround` to work around the
mismatch on OCL name mangling so far.
Reviewed By: yaxunl, rampitec
Differential Revision: https://reviews.llvm.org/D103920
This patch https://reviews.llvm.org/D102876 caused some lit regressions on z/OS because tmp files were no longer being opened based on binary/text mode. This patch passes OpenFlags when creating tmp files so we can open files in different modes.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D103806
G_INSERT legalization is incomplete and doesn't work very
well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES
padding with undef values (although this can get pretty large).
For the case of load/store narrowing, this is still performing the
load/stores in irregularly sized pieces. It might be cleaner to split
this down into equal sized pieces, and rely on load/store merging to
optimize it.
Introduce a new cl::opt to hide "cold" blocks from CFG DOT graphs.
Use BFI to get block relative frequency. Hide the block if the
frequency is below the threshold set by the command line option value.
Reviewed By: davidxl, hoy
Differential Revision: https://reviews.llvm.org/D103640
The comment states the following, for calculating the Line variable:
> Draw a line starting from when we only have 1k left and increasing
> linearly to double the current weight.
However, the value was not calculated as described. Instead, it would
result in a negative value, which resulted in the function always
returning 0 afterwards.
```
// Invariant: CurrentSize <= MaxSize - 200
// Invariant: CurrentWeight >= 0
int Line = (-2 * CurrentWeight) * (MaxSize - CurrentSize + 1000);
// {Line <= 0}
```
This commit fixes the issue and linearly interpolates as described.
Patch by Loris Reiff. Thanks!
Differential Revision: https://reviews.llvm.org/D96207
When narrowing G_ADD and G_SUB, handle types that aren't a multiple of
the type we're narrowing to. This allows us to handle types like s96
on 64 bit targets.
Note that the test here has a couple of dead instructions because of
the way the setup legalizes. I wasn't able to come up with a way to
write this test that avoids that easily.
Differential Revision: https://reviews.llvm.org/D97811
When narrowing G_INSERT, handle types that aren't a multiple of the
type we're narrowing to. This comes up if we're narrowing something
like an s96 to fit in 64 bit registers and also for non-byte multiple
packed types if they come up.
This implementation handles these cases by extending the extra bits to
the narrow size and truncating the result back to the destination
size.
Differential Revision: https://reviews.llvm.org/D97791
This reverts commit e772216e708937988c039420d2c559568f91ae27
(and fixup 7f6c878a2c035eb6325ab228d9bc2d257509d959).
The build is broken with gcc5 host compiler:
In file included from
from mlir/lib/Dialect/Utils/StructuredOpsUtils.cpp:9:
tools/mlir/include/mlir/IR/BuiltinAttributes.h.inc:424:57: error: type/value mismatch at argument 1 in template parameter list for 'template<class ItTy, class FuncTy, class FuncReturnTy> class llvm::mapped_iterator'
std::function<T(ptrdiff_t)>>;
^
tools/mlir/include/mlir/IR/BuiltinAttributes.h.inc:424:57: note: expected a type, got 'decltype (seq<ptrdiff_t>(0, 0))::const_iterator'
As suggested on rG937c4cffd024, use llvm_unreachable for unhandled integer types (which shouldn't be possible) instead of breaking and dropping down to the existing fatal error handler.
Helps silence static analyzer warnings.
The first source has the same EEW as the destination, but we're
using earlyclobber which prevents them from ever being the same
register. This patch attempts to work around this.
-For unmasked .wv, add a special TIED pseudo that pretends like
the first operand and the destination must be the same register. This
disables the earlyclobber for that source. Mark the instruction
as convertible to 3 address form which will switch it to the
original untied pseudo when the TwoAddressInstructionPass decides
that keeping them tied would require an extra copy. This uses
code in RISCVInstrInfo.cpp to do the conversion to the untied
opcode.
The untie test case show that we can generate the untied version.
Not sure it was profitable to do it in this case, but they have
really simple IR.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D103552
In 0.9 these were defined to leave elements other than 0 in the
destination unmodified. They were changed to use the tail policy
in 0.10. I missed that update.
I assume no one has noticed because in order cores treat tail
agnostic the same as tail undisturbed. I believe Spike and QEMU do
the same.
Reviewed By: arcbbb, frasercrmck
Differential Revision: https://reviews.llvm.org/D103736
In the interests of disabling misc-no-recursion across LLVM (this seems
like a stylistic choice that is not consistent with LLVM's
style/development approach) this NFC preliminary change adjusts all the
.clang-tidy files to inherit from their parents as much as possible.
This change specifically preserves all the quirks of the current configs
in order to make it easier to review as NFC.
I validatad the change is NFC as follows:
for X in `cat ../files.txt`;
do
mkdir -p ../tmp/$(dirname $X)
touch $(dirname $X)/blaikie.cpp
clang-tidy -dump-config $(dirname $X)/blaikie.cpp > ../tmp/$(dirname $X)/after
rm $(dirname $X)/blaikie.cpp
done
(similarly for the "before" state, without this patch applied)
for X in `cat ../files.txt`;
do
echo $X
diff \
../tmp/$(dirname $X)/before \
<(cat ../tmp/$(dirname $X)/after \
| sed -e "s/,readability-identifier-naming\(.*\),-readability-identifier-naming/\1/" \
| sed -e "s/,-llvm-include-order\(.*\),llvm-include-order/\1/" \
| sed -e "s/,-misc-no-recursion\(.*\),misc-no-recursion/\1/" \
| sed -e "s/,-clang-diagnostic-\*\(.*\),clang-diagnostic-\*/\1/")
done
(using sed to strip some add/remove pairs to reduce the diff and make it easier to read)
The resulting report is:
.clang-tidy
clang/.clang-tidy
2c2
< Checks: 'clang-diagnostic-*,clang-analyzer-*,-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,-misc-non-private-member-variables-in-classes,-readability-identifier-naming,-misc-no-recursion'
---
> Checks: 'clang-diagnostic-*,clang-analyzer-*,-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,-misc-non-private-member-variables-in-classes,-misc-no-recursion'
compiler-rt/.clang-tidy
2c2
< Checks: 'clang-diagnostic-*,clang-analyzer-*,-*,clang-diagnostic-*,llvm-*,-llvm-header-guard,misc-*,-misc-unused-parameters,-misc-non-private-member-variables-in-classes'
---
> Checks: 'clang-diagnostic-*,clang-analyzer-*,-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,-misc-non-private-member-variables-in-classes,-llvm-header-guard'
flang/.clang-tidy
2c2
< Checks: 'clang-diagnostic-*,clang-analyzer-*,-*,llvm-*,-llvm-include-order,misc-*,-misc-no-recursion,-misc-unused-parameters,-misc-non-private-member-variables-in-classes'
---
> Checks: 'clang-diagnostic-*,clang-analyzer-*,-*,llvm-*,misc-*,-misc-unused-parameters,-misc-non-private-member-variables-in-classes,-llvm-include-order,-misc-no-recursion'
flang/include/flang/Lower/.clang-tidy
flang/include/flang/Optimizer/.clang-tidy
flang/lib/Lower/.clang-tidy
flang/lib/Optimizer/.clang-tidy
lld/.clang-tidy
lldb/.clang-tidy
llvm/tools/split-file/.clang-tidy
mlir/.clang-tidy
The `clang/.clang-tidy` change is a no-op, disabling an option that was never enabled.
The compiler-rt and flang changes are no-op reorderings of the same flags.
(side note, the .clang-tidy file in parallel-libs is broken and crashes
clang-tidy because it uses "lowerCase" as the style instead of "lower_case" -
so I'll deal with that separately)
Differential Revision: https://reviews.llvm.org/D103842
shuffle(concat(x,undef),concat(y,undef)) -> concat(shuffle(x,y),shuffle(x,y))
If the original shuffle references any of the upper (undef) subvector elements, ensure the split shuffle masks uses undef instead of an out-of-bounds value.
Fixes PR50609
error: definition of implicit copy constructor for 'LoopNest' is
deprecated because it has a user-declared copy assignment operator
[-Werror,-Wdeprecated-copy]
LoopNest &operator=(const LoopNest &) = delete;
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D103752
This patch simplifies the implementation of Sequence and makes it compatible with llvm::reverse.
It exposes the reverse iterators through rbegin/rend which prevents a dangling reference in std::reverse_iterator::operator++().
Differential Revision: https://reviews.llvm.org/D102679
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.
This reverts commit 36ec97f76ac0d8be76fb16ac521f55126766267d.
The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.
Reverting to unbreak people's builds until that can be addressed.
This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d80698c53ceaa64bbe6e3b7e7bbf99ee.
`VPIntrinsic::getDeclarationForParams` creates a vp intrinsic
declaration for parameters you want to call it with. This is in
preparation of a new builder class that makes emitting vp intrinsic code
nearly as convenient as using a plain ir builder (aka `VectorBuilder`,
to be used by D99750).
Reviewed By: frasercrmck, craig.topper, vkmr
Differential Revision: https://reviews.llvm.org/D102686
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.
Also, a crash problem on legacy pass manager is fixed.
Reviewed By: Whitney
Differential Revision: https://reviews.llvm.org/D99149
Fixes getTypeConversion to return `TypeScalarizeScalableVector` when a scalable vector
type cannot be legalized by widening/splitting. When this is the method of legalization
found, getTypeLegalizationCost will return an Invalid cost.
The getMemoryOpCost, getMaskedMemoryOpCost & getGatherScatterOpCost functions already call
getTypeLegalizationCost and will now also return an Invalid cost for unsupported types.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D102515
This patch allows that scalable vector can also use the fold that already
exists for fixed vector, only when the lane index is lower than the minimum
number of elements of the vector.
Differential Revision: https://reviews.llvm.org/D102404
Based off the worse case numbers generated by D103695, we were overestimating the cost of a number of vector truncations:
AVX2: v2i32->v2i8, v2i64->v2i16 + v4i64->v4i32
AVX1: v2i32->v2i8, v4i64->v4i16 + v16i16->v16i8
Once we have a working set of conversion costs, the intention is to cleanup the tables and use legalized types a lot more to reduce the number of entries we currently have.
If the `-enable-strict-reductions` flag is set to true, then currently we will
always choose to vectorize the loop with strict in-order reductions. This is
not necessary where we allow the reordering of FP operations, such as
when loop hints are passed via metadata.
This patch moves useOrderedReductions so that we can also check whether
loop hints allow reordering, in which case we should use the default
behaviour of vectorizing with unordered reductions.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D103814
This sets the AllowTruncation flag on isConstOrConstSplat in
isNullOrNullSplat, allowing it to see truncated constant zeroes on
architectures such as AArch64, where only a i32.i64 are legal. As a
truncation of 0 is always 0, this should always be valid, allowing some
extra folding to happen including some of the cases from D103755.
Differential Revision: https://reviews.llvm.org/D103756