This attempts to move all tools over to using `add_llvm_library` for
better consistency. After doing this, I noticed it ended up as nearly a
reimplementation of https://reviews.llvm.org/rL342148, which later got
reverted in r342336 (b09a8c9bd9b819741b38071a7ccd95042ef2643a).
With ccache and ninja on a large core machine (40), I haven't run into
build errors, so I'm hopeful it's better now, though it doesn't seem to
be any different / new.
Reviewed By: stephenneuendorffer
Differential Revision: https://reviews.llvm.org/D90970
Define an option -riscv-vector-bits-max to specify the maximum vector
bits for vectorizer. Loop vectorizer will use the value to check if it
is safe to use the whole vector registers to vectorize the loop.
It is not the optimum solution for loop vectorizing for scalable vector.
It assumed the whole vector registers will be used to vectorize the code.
If it is possible, we should configure vl to do vectorize instead of
using whole vector registers.
We only consider LMUL = 1 in this patch.
This patch just an initial work for loop vectorizer for RISC-V Vector.
Differential Revision: https://reviews.llvm.org/D95659
The generated calling convention code shouldn't see these types since
we split large types into 32-bit chunks before the calling convention
code is triggered.
GlobalISel ends up directly calls the generated CC code before
checking for the register count breakdown. Arguably this difference is
a bug, but this was dead code for the DAG anyway.
Link: https://lists.llvm.org/pipermail/llvm-dev/2020-October/146162.html "[RFC] FileCheck: (dis)allowing unused prefixes"
If a downstream project using lit needs time for transition,
add the following to `lit.local.cfg`:
```
from lit.llvm.subst import ToolSubst
fc = ToolSubst('FileCheck', unresolved='fatal')
config.substitutions.insert(0, (fc.regex, 'FileCheck --allow-unused-prefixes'))
```
Differential Revision: https://reviews.llvm.org/D95849
`clang/lib/CodeGen/CGOpenMPRuntime.cpp` synthesized union
(`distinct !DICompositeType(tag: DW_TAG_union_type, name: "kmp_cmplrdata_t", size: 64, elements: <0x62b690>)`)
does not have meaningful filename/line number.
D94735 dropped the previously arbitrary and untested filename/line from the union and caused a verifier error here.
This fixes `check-libarcher` failures.
Differential Revision: https://reviews.llvm.org/D96212
A One-Off Identity mask is a shuffle that is mostly an identity mask
from as single source but contains a single element out-of-place, either
from a different vector or from another position in the same vector. As
opposed to lowering this via a ARMISD::BUILD_VECTOR we can generate an
extract/insert pair directly. Under ARM with individually accessible
lane elements this often becomes a simple lane move.
This also alters the LowerVECTOR_SHUFFLEUsingMovs code to use v4f32 (not
v4i32), a more natural type for lane moves.
Differential Revision: https://reviews.llvm.org/D95551
Currently using LLVM_USE_SANITIZER with a MinGW target leads to a fatal
configuration error due to an unsupported platform. MinGW targets on
clang however implement a few sanitizers, currently ASAN and UBSAN.
This patch enables LLVM_USE_SANITIZER in a MinGW environment as well.
Differential Revision: https://reviews.llvm.org/D95750
On AArch64 (which seems to be the only target that supports it), this
attribute allows codegen to avoid saving/restoring the value in x0
across a call.
Gives a 0.1% geomean -Os code size improvement on CTMark.
Differential Revision: https://reviews.llvm.org/D96099
This reverts commit 0fc1738eb75d613b9e16143b83e7cb80512e84eb.
The test passes (unexpectedly, due to the XFAIL: *) when x86 isn't
the default triple (such as on an arm machine).
As the actual MSVC toolset doesn't use the GAS-style assembly that
Clang/LLVM produces and consumes, there's no reference for what
string to use for e.g. comments when building with a MSVC triple.
This frees up the use of semicolon as separator string, just like
was done for GNU targets in 23413195649d0cf6f3860ae8b5fb115b35032075.
(Previously, both the separator and comment strings were set to
the same, a semicolon.)
Compiler-rt extensively uses separator chars in its assembly,
and that assembly should be buildable with clang-cl for MSVC too.
Differential Revision: https://reviews.llvm.org/D96259
In assembly files, omitting `.type foo,@function` is common. Such functions have
type `STT_NOTYPE` and llvm-symbolizer reports `??` for them.
An ifunc symbol usually has an associated resolver symbol which is defined at
the same address. Returning either one is fine for symbolization. The resolver
symbol may not end up in the symbol table if (object file) `.L` is used (linked
image) .symtab is stripped while .dynsym is retained.
This patch allows ELF STT_NOTYPE/STT_GNU_IFUNC symbols for .symtab symbolization.
I have left TODO in the test files for an unimplemented STT_FILE heuristic.
Differential Revision: https://reviews.llvm.org/D95916
Building on the fixed vector support from D95705
I've added ISD nodes for vmv.v.x and vfmv.v.f and switched to
lowering the intrinsics to it. This allows us to share the same
isel patterns for both.
This doesn't handle splats of i64 on RV32 yet. The build_vector
gets converted to a vXi32 build_vector+bitcast during type
legalization. Not sure the best way to handle this at the moment.
Differential Revision: https://reviews.llvm.org/D96108
This is an alternative to D95563.
This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a
Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.
This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.
I'm only testing one vector width right now, but the width is
programmable via the command line.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95705
This adds support for commuting operands and converting between
vfmadd and vfmacc to avoid register copies.
To avoid messing up intrinsic behavior, I've added new pseudo
instructions that have the isCommutable flag set. These pseudos also
force a tail agnostic policy. The intrinsic version still use
the tail undisturbed policy.
For best results it looks like we need to start with fmadd and only
pick fmacc if its beneficial. MachineCSE commutes without contraining
the operands and then commutes back if it didn't help with CSE. So
I've made sure that when the operand choice isn't constrained, we
will keep fmadd for MachineCSE and when it does the second commute,
we get back the original instruction.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D95800
This ensures that we'll match immediates consistently regardless
of whether we match them as a standalone splat or as part of
another operation.
While I was there I added complexities to the simm5/uimm5 patterns so
we didn't have to assume that the 1 on the non-immediate was lower
than what tablegen inferred.
I had to make a minor tweak to tablegen to fix one place that
didn't expect to see a ComplexPattern that wasn't a "leaf".
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D96199
Making VectorIndex an `int` instead of `unsigned`, silences the warning:
comparison of unsigned expression in ‘>= 0’ is always true
in:
template <int Min, int Max>
DiagnosticPredicate isVectorIndex() const {
...
if (VectorIndex.Val >= Min && VectorIndex.Val <= Max)
return DiagnosticPredicateTy::Match;
...
}
when Min is 0.
More MachO madness for everyone. MachO relocations are only 32-bits, which
means the ARM64_RELOC_ADDEND one only actually has 24 (signed) bits for the
actual addend. This is a problem when calculating the address of a basic block;
because it has no symbol of its own, the sequence
adrp x0, Ltmp0@PAGE
add x0, x0, x0 Ltmp0@PAGEOFF
is represented by relocation with an addend that contains the offset from the
function start to Ltmp, and so the largest function where this is guaranteed to
work is 8MB. That's not quite big enough that we can call it user error (IMO).
So this patch puts the any blockaddress into a constant-pool, where the addend
is instead stored in the (x)word being relocated, which is obviously big enough
for any function.
Summary:
Introduce base classes that hold a textual represent of the IR
based on basic blocks and a base class for comparing this
representation. A new change printer is introduced that uses these
classes to save and compare representations of the IR before and after
each pass. It only reports when changes are made by a pass (similar to
-print-changed) except that the changes are shown in a patch-like format
with those lines that are removed shown in red prefixed with '-' and those
added shown in green with '+'. This functionality was introduced in my
tutorial at the 2020 virtual developer's meeting.
Author: Jamie Schmeiser <schmeise@ca.ibm.com>
Reviewed By: aeubanks (Arthur Eubanks)
Differential Revision: https://reviews.llvm.org/D91890
Different targets might handle branch performance differently, so this patch allows for
targets to specify the TailDuplicateSize threshold. Said threshold defines how small a branch
can be and still be duplicated to generate straight-line code instead.
This patch also specifies said override values for the AArch64 subtarget.
Differential Revision: https://reviews.llvm.org/D95631
This patch improves the index management during constraint building.
Previously, the code rejected constraints which used values that were not
part of Value2Index, but after combining the coefficients of the new
indices were 0 (if ShouldAdd was 0).
In those cases, no new indices need to be added. Instead of adding to
Value2Index directly, add new indices to the NewIndices map. The caller
can then check if it needs to add any new indices.
This enables checking constraints like `a + x <= a + n` to `x <= n`,
even if there is no constraint for `a` directly.
Maskray has reported a fault with .debug_gnu_pubnames in the comments on
D94976, caused by this patch, reverting to investigate.
This reverts commit 8998f5843503773c2f51fd475e2c77c687a65ee6.
Backing out this workaround to focus on fixing whatever's wrong with
.debug_gnu_pubnames, I'll revert the cause, (8998f584) in the next commit.
This reverts commit 56fa34ae3570a34fd0f4c2cf1bfaf095da01a959.
When running the tests on PowerPC and x86, the lit test GlobalISel/trunc.ll fails at the memory sanitize step. This seems to be due to wrong invalid logic (which matches even if it shouldn't) and likely missing variable initialisation."
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D95878
Because we mark all operations as expand for v2f64, scalar_to_vector
would end up lowering through a stack store/reload. But it is pretty
simple to implement, only inserting a D reg into an undef vector. This
helps clear up some inefficient codegen from soft calling conventions.
Differential Revision: https://reviews.llvm.org/D96153
In the LLVM-IR for this test, the inlined argument "b" in the "a" function
is optimized out on certain architectures, not on others. This hasn't been
reported as a test failure since 93faeecd8fa and ff2073a51 because we would
create a variable that looks like this:
DW_TAG_formal_parameter
DW_AT_abstract_origin
With no further information (and no location). With D95617 however, we
stop emitting such variables.
Prior to landing D95617: make this test stricter by checking that the
variable mentioned above has a location. We have to accept that on certain
architectures this goes missing, so add those to the XFail list.
I've run a few experiments, and right now it looks likely only powerpc64
still drops the variable location.
This patch adds support for both the fadd reduction intrinsic, in both
the ordered and unordered modes.
The fmin and fmax intrinsics are not currently supported due to a
discrepancy between the LLVM semantics and the RVV ISA behaviour with
regards to signaling NaNs. This behaviour is likely fixed in version 2.3
of the RISC-V F/D/Q extension, but until then the intrinsics can be left
unsupported.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D95870
Fixes TableGen parser errors reported by D95874 due to incompatible types being used on multiclass templates.
Differential Revision: https://reviews.llvm.org/D96205