binop i1 (cmp Pred (ext X, Index0), C0), (cmp Pred (ext X, Index1), C1)
-->
vcmp = cmp Pred X, VecC
ext (binop vNi1 vcmp, (shuffle vcmp, Index1)), Index0
This is a larger pattern than the existing extractelement folds because we can't
reasonably vectorize the sub-patterns with constants based on cost model calcs
(it doesn't usually make sense to replace a single extracted scalar op with
constant operand with a vector op).
I salvaged as much of the existing logic as I could, but there might be better
ways to share and reduce code.
The motivating case from PR43745:
https://bugs.llvm.org/show_bug.cgi?id=43745
...is the special case of a 2-way reduction. We tried to get SLP to handle that
particular pattern in D59710, but that caused crashing and regressions.
This patch is more general, but hopefully safer.
The v2f64 test with SSE2 surprised me - the cost model accounting looks like this:
OldCost = 0 (free extract of f64 at index 0) + 1 (extract of f64 at index 1) + 2 (scalar fcmps) + 1 (and of bools) = 4
NewCost = 2 (vector fcmp) + 1 (shuffle) + 1 (vector 'and') + 1 (extract of bool) = 5
Differential Revision: https://reviews.llvm.org/D82474
Also fix an SSA violation in a test the MIRParser/verifier fails to
catch. It's illegal to define a subregister in SSA. For the purpose of
the test, it just needs to define the super-register to use the
subregister in the use operand.
Extracts the atomic pseudo-instructions' splitting from `riscv-expand-pseudo`
/ `RISCVExpandPseudo` into its own pass, `riscv-expand-atomic-pseudo` /
`RISCVExpandAtomicPseudo`. This allows for the expansion of atomic operations
to continue to happen late (the new pass is added in `addPreEmitPass2`, so
those expansions continue to happen in the same place), while the remaining
pseudo-instructions can now be expanded earlier and benefit from more
optimization passes. The nonatomics pass is now added in `addPreSched2`.
Differential Revision: https://reviews.llvm.org/D79635
Context:
--------
There are places in LLVM where we need to pack typed fields into opaque values.
For instance, the `XXXInst` classes in `llvm/include/llvm/IR/Instructions.h` that extract informations from `Value::SubclassData` via `getSubclassDataFromInstruction()`.
The bit twiddling is done manually: this impairs readability and prevent consistent handling of out of range values (e.g. 435b458ad0/llvm/include/llvm/IR/Instructions.h (L564))
More importantly, the bit pattern is scattered throughout the implementation making it hard to pack additionnal fields or check for overlapping bits.
Design decisions:
-----------------
The Bitfield structs are to be declared together so it is clear which bits are used or not.
The code is designed with simplicity in mind, hence a few limitations:
- Storage is limited to a single integer,
- Enum values have to be `unsigned`,
- Storage type has to be `unsigned`,
- There are no automatic detection of overlapping fields (packed bitfield declaration should help though),
- The interface is C like so `storage` needs to be passed in everytime (code is simpler and lifetime considerations more obvious)
RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-June/142196.html
Differential Revision: https://reviews.llvm.org/D81580
This patch proposes a naming convention for operations that take
a general predicate (and are thus predicated) that specifies
what happens to the false lanes.
Currently the _PRED suffix is used, which doesn't really say much other
than that it takes a predicate. In some instances this means it has
merging predication and in other cases it means zeroing-predication.
This patch also changes the order of operands to
AArch64ISD::DUP_MERGE_PASSTHRU, to pass the predicate as the first
operand, which is in line with all other predicates nodes. It takes the
passthru value as an explicit passthru value, which is always passed as
the last operand.
Reviewers: paulwalker-arm, cameron.mcinally, eli.friedman, dancgr, efriedma
Reviewed By: paulwalker-arm
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81850
Commit a945037e8fd0c30e250a62211469eea6765a36ae moved the printing of the
"PLEASE submit a bug report" message to the crash handler, but that means we
don't print it when forcing a crash using FORCE_CLANG_DIAGNOSTICS_CRASH. Fix
this by adding a function to get the bug report message and printing it when
forcing a crash.
Differential Revision: https://reviews.llvm.org/D81672
If a constant is only allsignbits in the demanded/active bits, then sign extend it to an allsignbits bool pattern for OR/XOR ops.
This also requires SimplifyDemandedBits XOR handling to be modified to call ShrinkDemandedConstant on any (non-NOT) XOR pattern to account for non-splat cases.
Next step towards fixing PR45808 - with this patch we now get a <-1,-1,0,0> v4i64 constant instead of <1,1,0,0>.
Differential Revision: https://reviews.llvm.org/D82257
Summary:
performPostLD1Combine will introduce either a LD1LANEpost
or LD1DUPpost node, which will cause selection failure if the
return type is a scalable vector.
Reviewers: sdesmalen, c-rhodes, efriedma
Reviewed By: efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82670
Pre-commit for D82257, this adds a DemandedElts arg to ShrinkDemandedConstant/targetShrinkDemandedConstant which will allow future patches to (optionally) add vector support.
SHT_GROUP sections contain a reference to a symbol indicating their
"signature" symbol. The symbol table containing this symbol is referred
to by the group section's sh_link field. If llvm-objcopy is instructed
to remove the symbol table, it will emit an error.
This fixes https://bugs.llvm.org/show_bug.cgi?id=46153.
Reviewed By: jhenderson, Higuoxing
Differential Revision: https://reviews.llvm.org/D82274
Before this patch, the diagnostic message is printed to `errs()` directly, which makes it difficult to use `FailedWithMessage()` in unit testing.
In this patch, we add a custom error handler for YAMLParser, which helps collect diagnostic messages and make it easy to use `FailedWithMessage()` to check error messages.
Reviewed By: jhenderson, MaskRay
Differential Revision: https://reviews.llvm.org/D82630
Summary:
If we ever assign co_await to a temporary variable, such as foo(co_await expr),
we generate AST that looks like this: MaterializedTemporaryExpr(CoawaitExpr(...)).
MaterializedTemporaryExpr would emit an intrinsics that marks the lifetime start of the
temporary storage. However such temporary storage will not be used until co_await is ready
to write the result. Marking the lifetime start way too early causes extra storage to be
put in the coroutine frame instead of the stack.
As you can see from https://godbolt.org/z/zVx_eB, the frame generated for get_big_object2 is 12K, which contains a big_object object unnecessarily.
After this patch, the frame size for get_big_object2 is now only 8K. There are still room for improvements, in particular, GCC has a 4K frame for this function. But that's a separate problem and not addressed in this patch.
The basic idea of this patch is during CoroSplit, look for every local variable in the coroutine created through AllocaInst, identify all the lifetime start/end markers and the use of the variables, and sink the lifetime.start maker to the places as close to the first-ever use as possible.
Reviewers: lewissbaker, modocache, junparser
Reviewed By: junparser
Subscribers: hiraditya, llvm-commits, rsmith, ChuanqiXu, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D82314
We already fold (v2i64 scalar_to_vector(aext)) -> (v2i64 bitcast(v4i32 scalar_to_vector(x))), this adds support for similar aextload cases and also handles v2f64 cases that wrap the i64 extension behind bitcasts.
Fixes the remaining issue with PR39016
Generalize the vector operand extraction code for shuffle/pack ops - we can assume that the vector operands are the same width as the result, and any non-vector values can be reused directly in the smaller width op.
Assemble/disassemble RISC-V V extension instructions according to
latest version spec in https://github.com/riscv/riscv-v-spec/.
I have tested this patch using GNU toolchain. The encoding is aligned
to GNU assembler output. In this patch, there is a test case for each
instruction at least.
The V register definition is just for assemble/disassemble. Its type
is not important in this stage. I think it will be reviewed and modified
as we want to do codegen for scalable vector types.
This patch does not include Zvamo, Zvlsseg, and Zvediv.
Differential revision: https://reviews.llvm.org/D69987