This patch updates LowerMatrixIntrinsics to preserve the alignment
specified at the original load/stores and the align attribute for the
pointer argument of the column.major.load/store intrinsics.
We can always use the specified alignment for the load of the first
column. For subsequent columns, the alignment may need to be reduced.
For ConstantInt strides, compute the offset for the start of the column in
bytes and use commonAlignment to get the largest valid alignment.
For non-ConstantInt strides, we need to take the common alignment of the
initial alignment and the element size in bytes.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, rjmccall
Reviewed By: rjmccall
Differential Revision: https://reviews.llvm.org/D81960
There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D80712
Summary:
As half-precision floating point arguments and returns were previously
coerced to either float or int32 by clang's codegen, the CMSE handling
of those was also performed in clang's side by zeroing the unused MSBs
of the coercer values.
This patch moves this handling to the backend's calling convention
lowering, making sure the high bits of the registers used by
half-precision arguments and returns are zeroed.
Reviewers: chill, rjmccall, ostannard
Reviewed By: ostannard
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D81428
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.
Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.
This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.
Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer
Reviewed By: ostannard
Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75169
Adds aarch64-sve-vector-bits-{min,max} to allow the size of SVE
data registers (in bits) to be specified. This allows the code
generator to make assumptions it normally couldn't. As a starting
point this information is used to mark fixed length vector types
that can fit within the specified size as legal.
Reviewers: rengolin, efriedma
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80384
For a loop, a join block is a block that is reachable along multiple
disjoint paths from the exiting block of a loop. If the exit condition
of the loop is divergent, then such join blocks must also be marked
divergent. This currently fails in some cases because not all join
blocks are identified correctly.
The workaround is to conservatively mark every join block of any
branch (not necessarily the exiting block of a loop) as divergent.
https://bugs.llvm.org/show_bug.cgi?id=46372
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D81806
Currently the matrix lowering turns volatile loads/stores into
non-volatile ones. This patch updates the lowering to preserve the
volatile bit.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D81498
We're missing a plain English explanation of how this pass is supposed
to operate -- add one to the file comment.
Differential Revision: https://reviews.llvm.org/D80929
This needed two fixes:
* 32-bit instructions were read in the wrong order. The machine code
swaps the two 16-bit instruction words, which wasn't undone when
decoding instructions.
* Jump and call instructions don't encode the lowest address bit,
which is always zero. Therefore, the address needed to be shifted by
one to fix that.
Differential Revision: https://reviews.llvm.org/D81961
Added NextPowerOf2() routine to TypeSize and rewritten the code
in getVectorTypeBreakdown to avoid warnings being generated.
Differential Revision: https://reviews.llvm.org/D81578
This patch adjust the load/store matrix intrinsics, formerly known as
llvm.matrix.columnwise.load/store, to improve the naming and allow
passing of extra information (volatile).
The patch performs the following changes:
* Rename columnwise.load/store to column.major.load/store. This is more
expressive and also more in line with the naming in Clang.
* Changes the stride arguments from i32 to i64. The stride can be
larger than i32 and this makes things more uniform with the way
things are handled in Clang.
* A new boolean argument is added to indicate whether the load/store
is volatile. The lowering respects that when emitting vector
load/store instructions
* MatrixBuilder is updated to require both Alignment and IsVolatile
arguments, which are passed through to the generated intrinsic. The
alignment is set using the `align` attribute.
The changes are grouped together in a single patch, to have a single
commit that breaks the compatibility. We probably should be fine with
updating the intrinsics, as we did not yet officially support them in
the last stable release. If there are any concerns, we can add
auto-upgrade rules for the columnwise intrinsics though.
Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache, rjmccall, ftynse
Reviewed By: anemet, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D81472
Instead of asserting the number of elements is the same, we should be
comparing the element counts instead. In addition, when looking at
concats of extract_subvectors it's fine to use getVectorMinNumElements()
for scalable vectors.
I discovered these warnings when compiling the structured loads tests in
this file:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81936
The rearranges PerformANDCombine and PerformORCombine to try and make
sure we don't call isConstantSplat on any i1 vectors. As pointed out in
D81860 it may not be very well defined in those cases.
In EVT::getVectorElementCount() when the type is not simple we
should return getExtendedVectorElementCount() from the function
instead of constructing the ElementCount object manually.
I discovered this warning in an existing test:
test/CodeGen/AArch64/sve-intrinsics-loads.ll
Differential Revision: https://reviews.llvm.org/D81927
There are now quite a few SVE tests in LLVM and Clang that do not
emit warnings related to invalid use of EVT::getVectorNumElements()
and VectorType::getNumElements(). For these tests I have added
additional checks that there are no warnings in order to prevent
any future regressions.
Differential Revision: https://reviews.llvm.org/D80712
This also enables running the AArch64 SLSHardening pass with GlobalISel,
so add a test for that.
Differential Revision: https://reviews.llvm.org/D81403
The enum values for AArch64 registers are not all consecutive.
Therefore, the computation
"__llvm_slsblr_thunk_x" + utostr(Reg - AArch64::X0)
is not always correct. utostr(Reg - AArch64::X0) will not generate the
expected string for the registers that do not have consecutive values in
the enum.
This happened to work for most registers, but does not for AArch64::FP
(i.e. register X29).
This can get triggered when the X29 is not used as a frame pointer.
Differential Revision: https://reviews.llvm.org/D81997
This is fixing warning from clang:
warning: private field 'ModuleSlice' is not used [-Wunused-private-field]
SmallPtrSetImpl<Function *> &ModuleSlice;
^
Differential Revision: https://reviews.llvm.org/D82027
Summary:
For PPC BinaryOperator of fp128 will become libcall, we shouldn't
convert loop to CTR loop if the loop contain libCall.
But currently, in the PPCTTIImpl::mightUseCTR() function, we only deal
with BinaryOperator for ppc_fp128, don't deal with the fp128.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D81353
This patch enables yaml2elf emit the .debug_abbrev section.
The generated .debug_abbrev is verified using `llvm-dwarfdump`.
Known issues that will be addressed later:
- Current implementation doesn't support generating multiple abbreviation tables in one .debug_abbrev section.
Reviewed By: jhenderson, grimar
Differential Revision: https://reviews.llvm.org/D81820
Summary: A bug is reported in bugzilla-45628, where the swap_with_shift case can’t be matched to a single HW instruction xxswapd as expected.
In fact the case matches the idiom of rotate. We have MatchRotate to handle an ‘or’ of two operands and generate a rot[lr] if the case matches the idiom of rotate. While PPC doesn’t support ROTL v1i128. We can custom lower ROTL v1i128 to the vector_shuffle. The vector_shuffle will be matched to a single HW instruction during the phase of instruction selection.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D81076
It seems to be a hardware defect that the half inline constants do not
work as expected for the 16-bit integer operations (the inverse does
work correctly). Experimentation seems to show these are really
reading the 32-bit inline constants, which can be observed by writing
inline asm using op_sel to see what's in the high half of the
constant. Theoretically we could fold the high halves of the 32-bit
constants using op_sel.
The *_asm_all.s MC tests are broken, and I don't know where the script
to autogenerate these are. I started manually fixing it, but there's
just too many cases to fix. This also does break the
assembler/disassembler support for these values, and I'm not sure what
to do about it. These are still valid encodings, so it seems like you
should be able to use them in some way. If you wrote assembly using
them, you could have really meant it (perhaps to read the high bits
with op_sel?). The disassembler will print the invalid literal
constant which will fail to re-assemble. The behavior is also
different depending on the use context. Consider this example, which
was previously accepted and encoded using the inline constant:
v_mad_i16 v5, v1, -4.0, v3
; encoding: [0x05,0x00,0xec,0xd1,0x01,0xef,0x0d,0x04]
In contexts where an inline immediate is required (such as on gfx8/9),
this will now be rejected. For gfx10, this will produce the literal
encoding and change the printed format:
v_mad_i16 v5, v1, 0xc400, v3
; encoding: [0x05,0x00,0x5e,0xd7,0x01,0xff,0x0d,0x04,0x00,0xc4,0x00,0x00]
This is just another variation of the issue that we don't perfectly
handle round trip assembly/disassembly due to not tracking how
immediates were encoded. This doesn't matter much in practice, since
compilers don't emit the suboptimal encoding. I doubt any users are
relying on this behavior (although I did make use of the old behavior
to figure out what was wrong).
Fixes bug 46302.
In BTF, pointee type pruning is used to reduce cluttering
too many unused types into prog BTF. For example,
struct task_struct {
...
struct mm_struct *mm;
...
}
If bpf program does not access members of "struct mm_struct",
there is no need to bring types for "struct mm_struct" to BTF.
This patch fixed a bug where an incorrect pruning happened.
The test case like below:
struct t;
typedef struct t _t;
struct s1 { _t *c; };
int test1(struct s1 *arg) { ... }
struct t { int a; int b; };
struct s2 { _t c; }
int test2(struct s2 *arg) { ... }
After processing test1(), among others, BPF backend generates BTF types for
"struct s1", "_t" and a placeholder for "struct t".
Note that "struct t" is not really generated. If later a direct access
to "struct t" member happened, "struct t" BTF type will be generated
properly.
During processing test2(), when processing member type "_t c",
BPF backend sees type "_t" already generated, so returned.
This caused the problem that "struct t" BTF type is never generated and
eventually causing incorrect type definition for "struct s2".
To fix the issue, during DebugInfo type traversal, even if a
typedef/const/volatile/restrict derived type has been recorded in BTF,
if it is not a type pruning candidate, type traversal of its base type continues.
Differential Revision: https://reviews.llvm.org/D82041
Summary:
Assume all usages of this function are explicitly fixed-width operations
and cast to FixedVectorType
Reviewers: efriedma, sdesmalen, c-rhodes, majnemer, dblaikie
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D80262
Summary:
Fix invalid usages of getNumElements identified by test case
LLVM.Transforms/InstCombine::vscale_extractelement.ll.
changesLength: Since the length of the llvm::SmallVector shufflemask
is related to the minimum number of elements in a scalable vector, it is
fine to just get the Min field of the ElementCount
isIdentityWithExtract: Since it is not possible to express the mask
needed for this pattern for scalable vectors, we can just bail before
calling getNumElements()
Reviewers: efriedma, sdesmalen, fpetrogalli, gchatelet, yrouban, craig.topper
Reviewed By: sdesmalen
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81969
I don't have any testcases showing it happening,
and i haven't succeeded in creating one,
but i'm also not positive it can't ever happen,
and i recall having something that looked like
that in the very beginning of Negator creation.
But since we now already have a negation cache,
we can now detect such cases practically for free.
Let's do so instead of "relying" on stack overflow :D
It is possible that we can try to negate the same value multiple times.
For example, PHI nodes may happen to have multiple incoming values
(all of which must be the same value) for the same incoming basic block.
It may happen that we try to negate such a PHI node, and succeed,
and that might result in having now-different incoming values..
To avoid that, and in general to reduce the amount of duplicated
work we might be doing, let's introduce a cache where
we'll track results of negating each value.
The added test was previously failing -verify after -instcombine.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46362
Summary:
This commit fixes a bug in the FixBrTables pass in which an
unconditional branch from the switch header block to the jump table
block was not removed before the blocks were combined. The result was
an invalid CFG in the MachineFunction. This commit also switches from
using bespoke branch analysis and deletion code to using the standard
utilities for the same.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81909
Summary:
This invariant is being violated in the test case
https://reviews.llvm.org/D77849, related to the use of the relatively
new ability for callbr to have return values, and MachineBasicBlocks
with INLINEASM_BR terminators to emit live out register defs.
As noted in the comment, this triggers invariant violations in
MachineVerifier via `llc -verify-machineinstrs` or
`llc -verify-regalloc`, since only MachineInstrs that are terminators
are allowed to follow the first terminator.
https://reviews.llvm.org/D75098 may rework this very assertion if we're
spilling via a (proposed) TCOPY MachineInstr.
Reviewers: void, efriedma, arsenm
Reviewed By: efriedma
Subscribers: qcolombet, wdng, hiraditya, llvm-commits, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D78166
Summary:
llvm::SplitEdge was failing an assertion that the BasicBlock only had
one successor (for BasicBlocks terminated by CallBrInst, we typically
have multiple successors). It was surprising that the earlier call to
SplitCriticalEdge did not handle the critical edge (there was an early
return). Removing that triggered another assertion relating to creating
a BlockAddress for a BasicBlock that did not (yet) have a parent, which
is a simple order of operations issue in llvm::SplitCriticalEdge (a
freshly constructed BasicBlock must be inserted into a Function's basic
block list to have a parent).
Thanks to @nathanchance for the report.
Fixes: https://github.com/ClangBuiltLinux/linux/issues/1018
Reviewers: craig.topper, jyknight, void, fhahn, efriedma
Reviewed By: efriedma
Subscribers: eli.friedman, rnk, efriedma, fhahn, hiraditya, llvm-commits, nathanchance, srhines
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81607
Let's have one canonical place to define ResultCode instances and their
labels.
Also make ResultCode's `__init__` function self-registering to better
support custom ResultCodes.
Summary: Rename --elf-cg-profile to --cg-profile and keep --elf-cg-profile as an alias of --cg-profile.
Reviewers: jhenderson, MaskRay, espindola, hans
Reviewed By: jhenderson, MaskRay
Subscribers: emaste, rupprecht, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81855
When the zext gets promoted, it used to retain the original location,
which pessimizes the debugging experience causing an unexpected
jump in stepping at -Og.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46120 (which also
contains a full C repro).
Differential Revision: https://reviews.llvm.org/D81437
Summary:
Add a flag to omit the xray_fn_idx to cut size overhead and relocations
roughly in half at the cost of reduced performance for single function
patching. Minor additions to compiler-rt support per-function patching
without the index.
Reviewers: dberris, MaskRay, johnislarry
Subscribers: hiraditya, arphaman, cfe-commits, #sanitizers, llvm-commits
Tags: #clang, #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D81995
In order to support hot-patching, we need to make sure the first emitted instruction in a function is a two-byte+ op. This is already the case on x86_64, which seems to always emit two-byte+ ops. However on 32-bit targets this wasn't the case.
PATCHABLE_OP now lowers to a XCHG AX, AX, (66 90) like MSVC does. However when targetting pentium3 (/arch:SSE) or i386 (/arch:IA32) targets, we generate MOV EDI,EDI (8B FF) like MSVC does. This is for compatiblity reasons with older tools that rely on this two byte pattern.
Differential Revision: https://reviews.llvm.org/D81301