1. Better sorting of scalars to be gathered. Trying to insert
constants/arguments/instructions-out-of-loop at first and only then
the instructions which are inside the loop. It improves hoisting of
invariant insertelements instructions.
2. Better detection of shuffle candidates in gathering function.
3. The cost of insertelement for constants is 0.
Part of D57059.
Differential Revision: https://reviews.llvm.org/D103458
Our initial motivating case was memcpy's with alignments > 16. The
loads/stores, to which small memcpy's expand, are kept together in
several places so that we get a sequence like this for a 64 bit copy:
LD w0
LD w1
ST w0
ST w1
The load/store optimiser can generate a LDP/STP w0, w1 from this because
the registers read/written are consecutive. In our case however, the
sequence is optimised during ISel, resulting in:
LD w0
ST w0
LD w0
ST w0
This instruction reordering allows reuse of registers. Since the registers
are no longer consecutive (i.e. they are the same), it inhibits LDP/STP
creation. The approach here is to perform renaming:
LD w0
ST w0
LD w1
ST w1
to enable the folding of the stores into a STP. We do not yet generate
the LDP due to a limitation in the renaming implementation, but plan to
look at that in a follow-up so that we fully support this case. While
this was initially motivated by certain memcpy's, this is a general
approach and thus is beneficial for other cases too, as can be seen
in some test changes.
Differential Revision: https://reviews.llvm.org/D103597
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.
This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.
This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103884
Upon encountering loads/stores on types whose size is not a multiple of 8 bits the SROA pass would either trip an assertion or use logic that was not meant to work with such irregularly-sized types.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D99435
Loads in the first half of the chapter are missing the type argument.
Patched By: klao (Mihaly Barasz)
Reviewed By: Jim
Differential Revision: https://reviews.llvm.org/D90326
The C-string section splitting support added in f9649d123db triggered an assert
("Duplicate canonical symbol at address") when multiple symbols were defined at
the the same offset within a C-string block (this triggered on arm64, where we
always add a block start symbol). The bug was caused by a failure to update the
record of the last canonical symbol address. The fix was to maintain this record
correctly, and move the auto-generation of the block-start symbol above the
handling for symbols defined in the object itself so that all symbols
(auto-generated and defined) are processed in address order.
This patch adds initial support for using the new pass manager when
doing ThinLTO via libLTO.
Reviewed By: steven_wu
Differential Revision: https://reviews.llvm.org/D102627
These types are (presumably) never used in the generated TableGen files.
The `default` switch case silences any compiler warnings for these
missing types so it's easy to miss.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103883
This patch is a simple fix which registers CONCAT_VECTORS as
custom-lowered for scalable mask vectors. This follows the pattern of
all other scalable-vector types, as the default expansion of
CONCAT_VECTORS cannot handle scalable types, and even if it did it'd go
through the stack and generate worse code.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103896
Command to see the differences:
diff -u <(sed -n 's#^HANDLE_DI_FLAG *([^,]*, *\([^()]*\)) *\(//.*\)\?$#\1#p' <llvm/include/llvm/IR/DebugInfoFlags.def | grep -vw Largest) <(sed -n 's#^ *LLVMDIFlag\([^ ]*\) *= (\?[0-9].*$#\1#p' <llvm/include/llvm-c/DebugInfo.h)
OCaml binding is more seriously out of sync but I have not tried to sync it.
Reviewed By: aprantl
Differential Revision: https://reviews.llvm.org/D103910
When `-fstack-clash-protection` is enabled and stack has to be realigned, some parts of redzone is written prior the probe, so probe might overwrite content already written in redzone. To avoid it, we have to make sure the first probe is at full probe size or is the last probe so that we can skip redzone.
It also fixes violation of ABI under PPC where `r1` isn't updated atomically.
This fixes https://bugs.llvm.org/show_bug.cgi?id=49903.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D100290
With Twine now ubiquitous after rG92a79dbe91413f685ab19295fc7a6297dbd6c824,
it needs support for string_view when building clang with newer C++ standards.
This is similar to how StringRef is handled.
Differential Revision: https://reviews.llvm.org/D103935
In most of cases, it has a single space after comma in assembly operands.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103790
Summary: This a minor patch to refactor the test file,
llvm-objdump/XCOFF/section-headers.test, to use yaml2obj
for this testing rather than a canned binary.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D103146
According to ELF V2 ABI, `0` should be the dwarf number of `r0`. Currently MMA's register also uses `0` as its dwarf number, this confuses `RegisterInfoEmitter` and generates wrong dwarf -> llvm mapping.
```
extern const MCRegisterInfo::DwarfLLVMRegPair PPCDwarfFlavour1Dwarf2L[] = {
{ 0U, PPC::VSRp31 },
```
This leads to wrong cfi output in https://reviews.llvm.org/D100290.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D103761
MachO C-string literal sections should be split on null-terminator boundaries,
rather than the usual symbol boundaries. This patch updates
MachOLinkGraphBuilder to do that.
LowerTypeTests pass adds functions with a non-canonical jump table
to cfiFunctionDecls instead of cfiFunctionDefs. As the jump table
is in the regular LTO object, these functions will also need to be
exported. This change fixes the non-canonical jump table case and
adds a test similar to the existing one for canonical jump tables.
Reviewed By: pcc
Differential Revision: https://reviews.llvm.org/D103120
As discussed in the post-commit comments for:
3cdd05e519dd
It seems to be safe to propagate all flags from the final fneg
except for 'nsz' to the new select:
https://alive2.llvm.org/ce/z/J_APDc
nsz has unique FMF semantics: it is not poison, it is only
"insignificant" in the calculation according to the LangRef.
If we cannot otherwise use a VMOVimm/VMOVFPimm/VMVNimm, fall back to
producing a VDUP(const) as opposed to a constant pool load. This will at
least be smaller codesize and can allow the VDUP to be folded into other
instructions.
Differential Revision: https://reviews.llvm.org/D103808
- Add `-enable-ocl-mangling-mismatch-workaround` to work around the
mismatch on OCL name mangling so far.
Reviewed By: yaxunl, rampitec
Differential Revision: https://reviews.llvm.org/D103920
This patch https://reviews.llvm.org/D102876 caused some lit regressions on z/OS because tmp files were no longer being opened based on binary/text mode. This patch passes OpenFlags when creating tmp files so we can open files in different modes.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D103806
G_INSERT legalization is incomplete and doesn't work very
well. Instead try to use sequences of G_MERGE_VALUES/G_UNMERGE_VALUES
padding with undef values (although this can get pretty large).
For the case of load/store narrowing, this is still performing the
load/stores in irregularly sized pieces. It might be cleaner to split
this down into equal sized pieces, and rely on load/store merging to
optimize it.
Introduce a new cl::opt to hide "cold" blocks from CFG DOT graphs.
Use BFI to get block relative frequency. Hide the block if the
frequency is below the threshold set by the command line option value.
Reviewed By: davidxl, hoy
Differential Revision: https://reviews.llvm.org/D103640
The comment states the following, for calculating the Line variable:
> Draw a line starting from when we only have 1k left and increasing
> linearly to double the current weight.
However, the value was not calculated as described. Instead, it would
result in a negative value, which resulted in the function always
returning 0 afterwards.
```
// Invariant: CurrentSize <= MaxSize - 200
// Invariant: CurrentWeight >= 0
int Line = (-2 * CurrentWeight) * (MaxSize - CurrentSize + 1000);
// {Line <= 0}
```
This commit fixes the issue and linearly interpolates as described.
Patch by Loris Reiff. Thanks!
Differential Revision: https://reviews.llvm.org/D96207