For the following case:
t8: i32 = or t7, t4
t10: i32 = ORRWrs t8, t8, TargetConstant:i32<73>
Current code wrongly returns (t8 >> shiftConstant) as the
UsefulBits of t8, which in fact is (t8 | (t8 >> shiftConstant)).
Reviewed by: sdesmalen, mdchen
Differential Revision: https://reviews.llvm.org/D102759
- Background here is that that these sets of tests are "invalid" to be run on z/OS
- The reason is because these test constructs that HLASM never supports (HLASM doesn't support GNU style directives)
- Usually tests are geared towards a particular target via the use of a triple that targets just that platform, but these tests require the use of a "default triple"
- Thus, we mark these tests as "UNSUPPORTED" for z/OS since we don't want to run these for z/OS
Reviewed By: yusra.syeda, abhina.sreeskantharajan
Differential Revision: https://reviews.llvm.org/D105204
This follows up patches for the unsigned siblings:
0c400e895306
c7b658aeb526
We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).
(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)
This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.
As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:
https://alive2.llvm.org/ce/z/Y8Xrrm
; sgt
define i1 @src(i8 %a, i8 %c) {
%c2 = add i8 %c, 1
%t = add i8 %a, %c2
%ov = icmp sgt i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c) {
%c_off = sub i8 127, %c ; SMAX
%ov = icmp ult i8 %a, %c_off
ret i1 %ov
}
https://alive2.llvm.org/ce/z/c8uhnk
; slt
define i1 @src(i8 %a, i8 %c) {
%t = add i8 %a, %c
%ov = icmp slt i8 %t, %c
ret i1 %ov
}
define i1 @tgt(i8 %a, i8 %c) {
%c_offnot = xor i8 %c, 127 ; SMAX
%ov = icmp ugt i8 %a, %c_offnot
ret i1 %ov
}
This patch adds a new ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.
Differential Revision: https://reviews.llvm.org/D104630
This patch fixes PR50823.
The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104903
We already have a fold for variable index with constant vector,
but if we can determine a scalar splat value, then it does not
matter whether that value is constant or not.
We overlooked this fold in D102404 and earlier patches,
but the fixed vector variant is shown in:
https://llvm.org/PR50817
Alive2 agrees on that:
https://alive2.llvm.org/ce/z/HpijPC
The same logic applies to scalable vectors.
Differential Revision: https://reviews.llvm.org/D104867
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.
This patch avoids vectorizing blocks that have extract instructions with scalable
vector..
Differential Revision: https://reviews.llvm.org/D104809
Improve codegen when lowering the common vector shuffle case from the
vectorizer (op1[last]:op2[0:last-1]). This patch only handles this
common case as it is difficult to handle this more generally when using
fixed length vectors, due to being unable to use the SVE ext instruction.
Differential Revision: https://reviews.llvm.org/D105289
Loads of <4 x i8> vectors were modeled as extremely expensive. And while we
don't have a load instruction that supports this, it isn't that expensive to
create a vector of i8 elements. The codegen for this was fixed/optimised in
D105110. This now tweaks the cost model and enables SLP vectorisation of my
motivating case loadi8.ll.
Differential Revision: https://reviews.llvm.org/D103629
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.
This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.
Differential Revision: https://reviews.llvm.org/D105129
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).
This is an extension of the work in D97063
Differential Revision: https://reviews.llvm.org/D103321
Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
FeatureBitset is 4 64-bit values in an array. It's better passed by
reference rather than copying it.
I may be adding FeatureBitset as an argument to another function
and noticed this while working on that.
Summary: The patch adds the StringTable dumping to
llvm-readobj. Currently only XCOFF is supported.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D104613
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.
Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.
Differential Revision: https://reviews.llvm.org/D105395
Compiling LLVM with Clang modules and libc++ identified that
`Support/Printable.h` and `ADL/SmallVector.h` were using features that
live in these headers.
Differential Revision: https://reviews.llvm.org/D105402
Specifically the CreateMaskedStore and CreateMaskedScatter APIs.
The CreateMaskedLoad and CreateMaskedGather APIs will need an
additional type argument.
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.
Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105374
This adds simple patterns for signed and unsigned saturating extract
narrow instructions. They combine a min/max/truncate into a single
instruction, providing that the immediates on the min/max are correct
for the saturation type. This is just handled in tablegen with some
extra patterns.
v2i64->v2i32 is not handled here as the min/max nodes are not legal,
making the lowering quite different.
Differential Revision: https://reviews.llvm.org/D103263
Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save.
Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55,
> The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers
> in descending order, starting with GPR31.
This patch is based on @jsji 's initial draft.
Tested on test-suite and SPEC, found no degradation.
Reviewed By: jsji, ZarkoCA, xingxue
Differential Revision: https://reviews.llvm.org/D100167
D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for
isDeclarationForLinker `GlobalValue`s. It missed a case for imported
declarations (`doImportAsDefinition` is false while `isPerformingImport` is
true). This can lead to a linker error for a default visibility symbol in
`ld.lld -shared`.
When `ClearDSOLocalOnDeclarations` is true, we check
`isPerformingImport() && !doImportAsDefinition(&GV)` along with
`GV.isDeclarationForLinker()`. The new condition checks an imported declaration.
This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin.
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D104986
This reverts commit 8cd35ad854ab4458fd509447359066ea3578b494.
It breaks `TestMembersAndLocalsWithSameName.py` on GreenDragon and
Mikael Holmén points out in D104827 that bitcode files created with the
patch cannot be parsed with binaries built before it.
We're trying to match a few pointer computation patterns here for
re-association opportunities.
1) Isolating a constant operand to be on the RHS, e.g.:
G_PTR_ADD(BASE, G_ADD(X, C)) -> G_PTR_ADD(G_PTR_ADD(BASE, X), C)
2) Folding two constants in each sub-tree as long as such folding
doesn't break a legal addressing mode.
G_PTR_ADD(G_PTR_ADD(BASE, C1), C2) -> G_PTR_ADD(BASE, C1+C2)
AArch64 code size improvements on CTMark with -Os:
Program before after diff
pairlocalalign 251048 251044 -0.0%
consumer-typeset 421820 421812 -0.0%
kc 431348 431320 -0.0%
SPASS 413404 413300 -0.0%
clamscan 384396 384220 -0.0%
tramp3d-v4 370640 370412 -0.1%
lencod 432096 431772 -0.1%
bullet 479400 478796 -0.1%
sqlite3 288504 288072 -0.1%
7zip-benchmark 573796 570768 -0.5%
Geomean difference -0.1%
Differential Revision: https://reviews.llvm.org/D105069
Mimics similar change for InstCombine:
ce192ced2b901be67444c481ab5ca0d731e6d982 / D104602
All these uses are in blocks that aren't reachable from function's entry,
and said blocks are removed by SimplifyCFG itself,
so we can't really test this change.
This tries to bail out if the PHI is in a `catchswitch` BB in
InstCombine. A PHI cannot be combined into a non-PHI instruction if it
is in a `catchswitch` BB, because `catchswitch` BB cannot have any
non-PHI instruction other than `catchswitch` itself.
The given test case started crashing after D98058.
Reviewed By: lebedev.ri, rnk
Differential Revision: https://reviews.llvm.org/D105309
Somewhat related to D105338.
While it is up for discussion whether or not volatile store traps,
so far there has been no complaints that volatile load/cmpxchg/atomicrmw also may trap.
And even if simplifycfg currently concervatively believes that to be the case,
instcombine does not: https://godbolt.org/z/5vhv4K5b8
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D105343
In MASM, the ifdef family of directives treats its argument literally, without expanding it as a text macro. Add support for this, and also replace the special handling that was previously used for echo.
Reviewed By: thakis
Differential Revision: https://reviews.llvm.org/D104196