This change replaces the InitialLength of pub-tables with Format and
Length. All the InitialLength fields have been removed.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85880
In DAGTypeLegalizer::GenWidenVectorStores the algorithm assumes it only
ever deals with fixed width types, hence the offsets for each individual
store never take 'vscale' into account. I've changed the main loop in
that function to use TypeSize instead of unsigned for tracking the
remaining store amount and offset increment. In addition, I've changed
the loop to use the new IncrementPointer helper function for updating
the addresses in each iteration, since this handles scalable vector
types.
Whilst fixing this function I also fixed a minor issue in
IncrementPointer whereby we were not adding the no-unsigned-wrap flag
for the add instruction in the same way as the fixed width case does.
Also, I've added a report_fatal_error in GenWidenVectorTruncStores,
since this code currently uses a sequence of element-by-element scalar
stores.
I've added new tests in
CodeGen/AArch64/sve-intrinsics-stores.ll
CodeGen/AArch64/sve-st1-addressing-mode-reg-imm.ll
for the changes in GenWidenVectorStores.
Differential Revision: https://reviews.llvm.org/D84937
In narrowExtractedVectorLoad there is an optimisation that tries to
combine extract_subvector with a narrowing vector load. At the moment
this produces warnings due to the incorrect calls to
getVectorNumElements() for scalable vector types. I've got this
working for scalable vectors too when the extract subvector index
is a multiple of the minimum number of elements. I have added a
new variant of the function:
MachineFunction::getMachineMemOperand
that copies an existing MachineMemOperand, but replaces the pointer
info with a null version since we cannot currently represent scaled
offsets.
I've added a new test for this particular case in:
CodeGen/AArch64/sve-extract-subvector.ll
Differential Revision: https://reviews.llvm.org/D83950
Currently only two test failures remain on Sparc, both
`sparcv9-sun-solaris2.11` and `sparc64-unknown-linux-gnu`:
LLVM :: DebugInfo/Generic/debug-label-inline.ll
LLVM :: Linker/subprogram-linkonce-weak.ll
They seem related in that debug info isn't generated for instruction
bundles (like `retl+add` in the delay slot).
I've filed separate bugs for both files (Bug 47129 and 47131), though it's
probably the same issue.
This patch `XFAIL`s the tests.
Tested on `sparcv9-sun-solaris2.11` and `amd64-pc-solaris2.11`.
Differential Revision: https://reviews.llvm.org/D85827
This reverts commit e441b7a7a0a72c28daf5a8e594559c667e5b4534.
This patch causes a compile error in tensorflow opensource project. The stack trace looks like:
Point of crash:
llvm/include/llvm/Analysis/LoopInfoImpl.h : line 35
(gdb) ptype *this
type = const class llvm::LoopBase<llvm::BasicBlock, llvm::Loop> [with BlockT = llvm::BasicBlock, LoopT = llvm::Loop]
(gdb) p *this
$1 = {ParentLoop = 0x0, SubLoops = std::vector of length 0, capacity 0, Blocks = std::vector of length 0, capacity 1,
DenseBlockSet = {<llvm::SmallPtrSetImpl<llvm::BasicBlock const*>> = {<llvm::SmallPtrSetImplBase> = {<llvm::DebugEpochBase> = {Epoch = 3}, SmallArray = 0x1b2bf6c8, CurArray = 0x1b2bf6c8,
CurArraySize = 8, NumNonEmpty = 0, NumTombstones = 0}, <No data fields>}, SmallStorage = {0xfffffffffffffffe, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}}, IsInvalid = true}
(gdb) p *this->DenseBlockSet->CurArray
$2 = (const void *) 0xfffffffffffffffe
I will try to get a case from tensorflow or use creduce to get a small case.
The method optionMatches() constructs 9865 std::string instances when comparing different
options. Many of these instances exceed the size of the internal storage and force memory
allocations. This patch adds an early exit check that eliminates most of the string allocations
while keeping the code simple.
Example inputs:
Prefix: /, Name: Fr
Prefix: -, Name: Fr
Prefix: -, Name: fsanitize-address-field-padding=
Prefix: -, Name: fsanitize-address-globals-dead-stripping
Prefix: -, Name: fsanitize-address-poison-custom-array-cookie
Prefix: -, Name: fsanitize-address-use-after-scope
Prefix: -, Name: fsanitize-address-use-odr-indicator
Prefix: -, Name: fsanitize-blacklist=
Differential Revision: D85538
From the code after the 'break', they are processing 64bit scalar and
vector bitcast. So I think the break-condition should be (cond1 || cond2)
This means we only execute following code if (64bit and dest-is-vector).
Also remove a previous fix which is not needed with this new fix.
(introduced in: 1349a04ef5f594dda705ec80474dda4837f26dba)
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D85804
This patch implements the builtins for the vector shifts (shl, srl, sra), and
adds the appropriate test cases for these builtins. The builtins utilize the
vector shift instructions introduced within ISA 3.1.
Differential Revision: https://reviews.llvm.org/D83338
This is a retry of rL300977 which was reverted because of infinite loops.
We have fixed all of the known places where that would happen, but there's
still a chance that this patch will cause infinite loops.
This matches the demanded bits behavior in the DAG and should fix:
https://bugs.llvm.org/show_bug.cgi?id=32706
Differential Revision: https://reviews.llvm.org/D32255
These are not correctness issues.
In visitUDivOperand(), if the (potential) divisor is undef, then udiv is
already UB, so it is not incorrect to keep undef as shift amount.
But, that is suboptimal.
We could instead simply drop that select, picking the other operand.
Afterwards, getLogBase2() could assert that there is no undef in divisor.
While x*undef is undef, shift-by-undef is poison,
which we must avoid introducing.
Also log2(iN undef) is *NOT* iN undef, because log2(iN undef) u< N.
See https://bugs.llvm.org/show_bug.cgi?id=47133
Testing is performed when targeting 128, 256 and 512-bit wide vectors.
For 128-bit vectors, the original behavior of using NEON instructions is
preserved.
Differential Revision: https://reviews.llvm.org/D85479
This is mostly a straight port from SelectionDAG. We re-use the actual bit-test
analysis part from SwitchLoweringUtils, which was factored out earlier to
support jump-tables.
Differential Revision: https://reviews.llvm.org/D85233
This recommits the following patches now that D85684 has landed
1cf6f210a2e [IR] Disable select ? C : undef -> C fold in ConstantFoldSelectInstruction unless we know C isn't poison.
469da663f2d [InstSimplify] Re-enable select ?, undef, X -> X transform when X is provably not poison
122b0640fc9 [InstSimplify] Don't fold vectors of partial undef in SimplifySelectInst if the non-undef element value might produce poison
ac0af12ed2f [InstSimplify] Add test cases for opportunities to fold select ?, X, undef -> X when we can prove X isn't poison
9b1e95329af [InstSimplify] Remove select ?, undef, X -> X and select ?, X, undef -> X transforms
Similar to the Two op + select patterns that were added recently, this
adds some patterns for select + fma to turn them into predicated
operations.
Differential Revision: https://reviews.llvm.org/D85824
Pull out element equivalence code from isShuffleEquivalent/isTargetShuffleEquivalent, I've also removed many of the index modulos where possible.
First step toward simply adding some additional equivalence tests.
We need to produce a setcc instruction which has an 8-bit result.
This gets rid of a bunch of cases that were using the s1->s8/s16/s32/s64
handling in selectZExt.
I'm not very familiar with GlobalISel yet so I'm not yet sure
the best way to do things. I'd especially like feedback on the
best way to handle the currently split 32-bit and 64-bit mode
handling.
Differential Revision: https://reviews.llvm.org/D85814
Commit 9385aaa84851 ("[sancov] Fix PR33732") added zeroext to
__sanitizer_cov_trace(_const)?_cmp[1248] parameters for x86_64 only,
however, it is useful on other targets, in particular, on SystemZ: it
fixes swap-cmp.test.
Therefore, use it on all targets. This is safe: if target ABI does not
require zero extension for a particular parameter, zeroext is simply
ignored. A similar change has been implemeted as part of commit
3bc439bdff8b ("[MSan] Add instrumentation for SystemZ"), and there were
no problems with it.
Reviewed By: morehouse
Differential Revision: https://reviews.llvm.org/D85689
Widen the scope of memory operations that are allowed to be tail predicated
to include gathers and scatters, such that loops that are auto-vectorized
with the option -enable-arm-maskedgatscat (and actually end up containing
an MVE gather or scatter) can be tail predicated.
Differential Revision: https://reviews.llvm.org/D85138
This patch makes the 'AddrSize' field optional. If the address size is
missing, yaml2obj will infer it from the object file.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D85805
On z/OS, the information is stored in the Common System Data Area
(CSD). It is the number of CPs allocated to the current LPAR.
Reviewers: aganea, hubert.reinterpertcast, MaskRay
Reviewed By: hubert.reinterpertcast
Differential Revision: https://reviews.llvm.org/D85531
When TTI was updated to use an explicit cost, TCK_CodeSize was used
although the default implicit cost would have been the hand-wavey
cost of size and latency. So, revert back to this behaviour. This is
not expected to have (much) impact on targets since most (all?) of
them return the same value for SizeAndLatency and CodeSize.
When optimising for size, the logic has been changed to query
CodeSize costs instead of SizeAndLatency.
This patch also adds a testing option in the unroller so that
OptSize thresholds can be specified.
Differential Revision: https://reviews.llvm.org/D85723