The vrgather.vv instruction uses a vector of indices with the same
SEW as operand 0. The vrgather.vx instructions use a scalar index
operand of XLen bits.
By splitting this into 2 intrinsics we are able to use LLVMatchType
in the definition to avoid specifying the type for the index operand
when creating the IR for the intrinsic. For .vv it will match the
operand 0 type. And for .vx it will match the type of the vl operand
we already needed to specify a type for.
I'm considering splitting more intrinsics. This was a somewhat
odd one because the .vx doesn't use the element type, it always
use XLen.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D95979
loadRegFromStackSlot()/storeRegToStackSlot() can generate aligned access
instructions for stack slots even if the stack is unaligned, based on the
assumption that the stack can be realigned.
However, this doesn't work for fixed slots, which are e.g. used for
spilling XMM registers in a non-leaf function with
`__attribute__((preserve_all))`.
When compiling such code with `-mstack-alignment=8`, this causes general
protection faults.
Fix it by only considering stack realignment for non-fixed slots.
Note that this changes the output of three existing tests which spill AVX
registers, since AVX requires higher alignment than the ABI provides on
stack frame entry.
Reviewed By: rnk, jyknight
Differential Revision: https://reviews.llvm.org/D73126
As mentioned in TODO comment, casting double to float causes NaNs to change bits.
To avoid the change, this patch adds support for single-floating-point immediate value on MachineCode.
Patch by Yuta Saito.
Differential Revision: https://reviews.llvm.org/D77384
`-flto -gsplit-dwarf -g -O[123]` may create .debug_gnu_pubnames with 0 DIE
offset entries. llvm-dwarfdump -debug-gnu-pubnames/ld.lld --gdb-index errors for that.
```
.section .debug_gnu_pubnames,"",@progbits
.long .LpubNames_end2-.LpubNames_begin2 # Length of Public Names Info
.LpubNames_begin2:
.short 2 # DWARF Version
.long .Lcu_begin2 # Offset of Compilation Unit Info
.long 57 # Compilation Unit Length
.long 0 # DIE offset
.byte 16 # Attributes: TYPE, EXTERNAL
.asciz "absl" # External Name
.long 0 # DIE offset
.byte 16 # Attributes: TYPE, EXTERNAL
.asciz "absl::base_internal" # External Name
.long 0 # End Mark
```
If we know that the scalar epilogue is required to run, modify the CFG to end the middle block with an unconditional branch to scalar preheader. This is instead of a conditional branch to either the preheader or the exit block.
The motivation to do this is to support multiple exit blocks. Specifically, the current structure forces us to identify immediate dominators and *which* exit block to branch from in the middle terminator. For the multiple exit case - where we know require scalar will hold - these questions are ill formed.
This is the last change needed to support multiple exit loops, but since the diffs are already large enough, I'm going to land this, and then enable separately. You can think of this as being NFCI-ish prep work, but the changes are a bit too involved for me to feel comfortable tagging the change that way.
Differential Revision: https://reviews.llvm.org/D94892
These attributes were all incorrect or inappropriate for LLVM to infer:
- inaccessiblememonly is generally wrong; user replacement operator new
can access memory that's visible to the caller, as can a new_handler
function.
- willreturn is generally wrong; a custom new_handler is not guaranteed
to terminate.
- noalias is inappropriate: Clang has a flag to determine whether this
attribute should be present and adds it itself when appropriate.
- noundef and nonnull on the return value should be specified by the
frontend on all 'operator new' functions if we want them, not here.
In any case, inferring attributes on functions declared 'nobuiltin' (as
these are when Clang emits them) seems questionable.
Several of the new attributes here were incorrect, and even the ones
that are generally correct were being added even to nobuiltin calls.
This reverts commit bb3f169b59e1c8bd7fd70097532220bbd11e9967.
MemorySSA currently treats lifetime.end intrinsics as not aliasing
anything. This breaks MemorySSA-based MemCpyOpt, because we'll happily
move a read of a pointer below a lifetime.end intrinsic, as no clobber
is reported.
I think the MemorySSA modelling here isn't correct: lifetime.end(p)
has approximately the same effect as doing a memcpy(p, undef), and
should be treated as a clobber.
This patch removes the special handling of lifetime.end, leaving
alias analysis to handle it appropriately.
Differential Revision: https://reviews.llvm.org/D95763
Based on the comments in the code, the idea is that AsmPrinter is
unable to produce entry value blocks of arbitrary length, such as
DW_OP_entry_value [DW_OP_reg5 DW_OP_lit1 DW_OP_plus]. But the way the
Verifier check is written it also disallows DW_OP_entry_value
[DW_OP_reg5] DW_OP_lit1 DW_OP_plus which seems to overshoot the
target.
Note that this patch does not change any of the safety guards in
LiveDebugValues — there is zero behavior change for clang. It just
allows us to legalize more complex expressions in future patches.
rdar://73907559
Differential Revision: https://reviews.llvm.org/D95990
The upstream callers (the vectorizers) were fixed with:
bbed5f2f8a04 ( D95690 )
77adbe6a8c71
We should remove this pass entirely now that reduction
legalization/lowering is expected to work just as well,
but we need to confirm that the shuffle ops do not
regress (for x86 in particular).
This should be the last step needed to close:
https://llvm.org/PR23116
When widening, each half of the v2s16 operands needs to be sign extended
for G_ASHR or zero extended for G_LSHR.
Differential Revision: https://reviews.llvm.org/D96048
SALU min/max s32 instructions exist so use them. This means that
regbankselect can handle min/max much like add/sub/mul/shifts.
Differential Revision: https://reviews.llvm.org/D96047
This patch extends the condition collection logic to allow adding
conditions from pre-headers to loop headers, by allowing cases where the
target block dominates some of its predecessors.
If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.
Differential Revision: https://reviews.llvm.org/D95391
It was discussed a few years ago and agreed that it makes sense to
remove this assertion as other targets do not perform similar register
size checking in inline assembly constraint logic, so the check just
adds a needless barrier on AVR.
This patch removes the assertion and removes 'XFAIL' from two Generic
CodeGen tests for AVR as a result.
This modified patch avoids redirecting the unit in which a subprogram is
created if type units are enabled -- DIEs were getting children allocated
from different units memory pools. Original commit message:
[DWARF] Create subprogram's DIE in DISubprogram's unit
This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.
Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.
Differential Revision: https://reviews.llvm.org/D94976
This new f16 shuffle under Neon would hit an assert in
GeneratePerfectShuffle as it would try to treat a f16 vector as an i8.
Add f16 handling, treating them like an i16.
Differential Revision: https://reviews.llvm.org/D95446
This patch implements generation of remaining header search arguments.
It's done manually in C++ as opposed to TableGen, because we need the flexibility and don't anticipate reuse.
This patch also tests the generation of header search options via a round-trip. This way, the code gets exercised whenever Clang is built and tested in asserts mode. All `check-clang` tests pass.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D94472
As noted in https://reviews.llvm.org/D93459, the formatting of
multi-line descriptions of clEnumValN and the likes is unfavorable.
Thus this patch adds support for correctly indenting these.
Reviewed By: serge-sans-paille
Differential Revision: https://reviews.llvm.org/D93494
When SGPRs are spilled to VGPRs, they can overwrite any lane. We need
to preserve the value of inactive lanes in function calls, so we save
the register even if it is marked as caller saved.
Also, teach buildPrologSpill to work when no registers are free like in
CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir and update the comment on
findScratchNonCalleeSaveRegister as it is not used anymore to realign
the stack pointer since D95865.
Differential Revision: https://reviews.llvm.org/D95946
The collapseLoops method implements a transformations facilitating the implementation of the collapse-clause. It takes a list of loops from a loop nest and reduces it to a single loop that can be used by other methods that are implemented on just a single loop, such as createStaticWorkshareLoop.
This patch shares some changes with D92974 (such as adding some getters to CanonicalLoopNest), used by both patches.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93268
Similar to the G_PTR_ADD + G_LOAD twiddling we do in `preISelLower`.
The imported patterns expect scalars only, so they can't handle things like
```
G_STORE %ptr1, %ptr2
```
To get around this, use s64 instead.
(This probably makes a good portion of the manual selection code for G_STORE
dead.)
This is a 0.2% geomean code size improvement on CTMark at -Os.
(Best is consumer-typeset @ -0.7%)
Differential Revision: https://reviews.llvm.org/D95908
When we have a zeroext parameter coming in on the stack, build
```
%x = G_LOAD ...
%x_assert_zext = G_ASSERT_ZEXT %x, narrow_size
%trunc = G_TRUNC %x_assert_zext
```
Rather than just loading into the truncated type.
This allows us to optimize cases like this: https://godbolt.org/z/vfjhW8
Differential Revision: https://reviews.llvm.org/D95805
This patch detaches SampleProfileLoader from class
SampleCoverageTracker. We plan to move SampleProfileLoader
to a template class. This would remain SampleCoverageTracker
as a class.
Also make callsiteIsHot() as a file static function.
Differential Revision: https://reviews.llvm.org/D95823
These two cases have identical implementations other than an
unreachable part of `G_ADD` that checks if the scalar we're narrowing
is a vector. Combining them to avoid unnecessary divergence.
This was only adding undef to the use if the copy itself had a
subregister index. It did not consider the subrange liveness if the
use had a subreg index to begin with.
If we had a pair of copies inside a loop which introduced new liveness
to a subregister which was undef before the loop, we would have a
dummy phi-only segment remaining across the loop body. Later, this
false segment would confuse RenameIndependentSubregs causing it to
introduce IMPLICIT_DEFs with broken value numbering.
It seems always adding the lanes to ShrinkMask is OK, so any
conditions should be purely a compile time filter.