Do not break down local loads and stores so ds_read/write_b96/b128 in
ISelLowering can be selected on subtargets that support them and if align
requirements allow them.
Differential Revision: https://reviews.llvm.org/D84403
Fix local ds_read/write_b96/b128 so they can be selected if the alignment
allows. Otherwise, either pick appropriate ds_read2/write2 instructions or break
them down.
Differential Revision: https://reviews.llvm.org/D81638
Features UnalignedBufferAccess and UnalignedDSAccess are now used to determine
whether hardware supports such access.
UnalignedAccessMode should be used to enable them.
hasUnalignedBufferAccessEnabled() and hasUnalignedDSAccessEnabled() can be
now used to quickly check both.
Differential Revision: https://reviews.llvm.org/D84522
Adjust alignment requirements for ds_read/write_b96/b128.
GFX9 and onwards allow misaligned access for reads and writes but only if
SH_MEM_CONFIG.alignment_mode allows it.
UnalignedDSAccess is set on GCN subtargets from GFX9 onward to let us know if we
can relax alignment requirements.
UnalignedAccessMode acts similary to UnalignedBufferAccess for DS instructions
but only from GFX9 onward and is supposed to match alignment_mode. By default
alignment of 4 is required.
Differential Revision: https://reviews.llvm.org/D82788
In SelectionDAGBuilder always translate the fshl and fshr intrinsics to
FSHL and FSHR (or ROTL and ROTR) instead of lowering them to shifts and
ORs. Improve the legalization of FSHL and FSHR to avoid code quality
regressions.
Differential Revision: https://reviews.llvm.org/D77152
Both AfterPass and AfterPassInvalidated pass instrumentation
callbacks get additional parameter of type PreservedAnalyses.
This patch was created by @fedor.sergeev. I have just slightly
changed it.
Reviewers: fedor.sergeev
Differential Revision: https://reviews.llvm.org/D81555
Before we speculatively execute a basic block, query the cost of
inserting the necessary select instructions against the phi folding
threshold. For non-trivial insertions, a more accurate decision can
probably be made during machine if-conversion. With minsize we query
the CodeSize cost, otherwise we use SizeAndLatency.
Differential Revision: https://reviews.llvm.org/D82438
Currently we have `checkDRI` and two `createDRIFrom` methods which
are used to create `DynRegionInfo` objects.
And we have an issue: constructions like:
`ObjF->getELFFile()->base() + P->p_offset`
that are used in `createDRIFrom` functions might overflow.
I had to revert `D85519` which triggered such UBSan failure.
This NFC, simplifies and generalizes how we create `DynRegionInfo` objects.
It will allow us to introduce more/better validation checks in a single place.
It also will allow to change `createDRI` to return `Expected<>` so
that we will be able to stop using the `reportError`, which
is used inside currently, and have a warning instead.
Differential revision: https://reviews.llvm.org/D86297
Modify the ARM getCmpSelInstrCost implementation for the code size
costs of selects. Now consider the legalization cost and increase
the cost of i1 because those values wouldn't live in a general purpose
register. We also make selects +1 more expensive to account for the IT
instruction.
Differential Revision: https://reviews.llvm.org/D82091
As part of D84741, this adds a target hook for the
preferPredicatedReductionSelect option and makes use
of it under MVE, allowing us to tail predicate most
reduction loops.
Differential Revision: https://reviews.llvm.org/D85980
Commit dbcfbffc adds ppc.readflm and ppc.setflm intrinsics to read or
write FPSCR register. This patch adds them to Clang.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D85874
In e99dee82b0, the "out_of_memory_new_handler" was changed to be
explicitly initialized instead of relying on a global static
constructor.
However before this change, install_out_of_memory_new_handler could be
called multiple times while it asserts right now.
We can be more tolerant to calling multiple time InitLLVM without
reintroducing a global constructor for this handler.
Differential Revision: https://reviews.llvm.org/D86330
If the type T is incomplete then sizeof(T) results in C++ compilation error at line:
static constexpr bool value = sizeof(T) <= (2 * sizeof(void *));
This patch allows incomplete types in parameters of function. Example:
using SomeFunc = void(SomeIncompleteType &);
llvm::unique_function<SomeFuncType> SomeFunc;
Reviewers: DaniilSuchkov, vvereschaka
Differential Revision: https://reviews.llvm.org/D81554
This patch adds support for referencing different abbrev tables. We use
'ID' to distinguish abbrev tables and use 'AbbrevTableID' to explicitly
assign an abbrev table to compilation units.
The syntax is:
```
debug_abbrev:
- ID: 0
Table:
...
- ID: 1
Table:
...
debug_info:
- ...
AbbrevTableID: 1 ## Reference the second abbrev table.
- ...
AbbrevTableID: 0 ## Reference the first abbrev table.
```
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D83116
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.
Reviewed By: jhenderson, labath
Differential Revision: https://reviews.llvm.org/D86194
ld.lld is an ELF linker. We can switch to the new LLD for Mach-O port
when it's more complete, but for now, assume the user will have set
CMAKE_LINKER correctly themselves when targeting Darwin.
This patch adds support for emitting multiple abbrev tables. Currently,
compilation units will always reference the first abbrev table.
Reviewed By: jhenderson, labath
Differential Revision: https://reviews.llvm.org/D86194
Summary:
- HIP uses an unsized extern array `extern __shared__ T s[]` to declare
the dynamic shared memory, which size is not known at the
compile time.
Reviewers: arsenm, yaxunl, kpyzhov, b-sumner
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D82496
We have two ways of using the runtimes build setup to build the
builtins. You can either have an empty LLVM_BUILTIN_TARGETS (or have it
include the "default" target), in which case builtin_default_target is
called to set up the default target, or you can have actual triples in
LLVM_BUILTIN_TARGETS, in which case builtin_register_target is called
for each triple. builtin_default_target lets you build the builtins for
Darwin (assuming your default triple is Darwin); builtin_register_target
does not.
I don't understand the reason for this distinction. The Darwin builtins
build is special in that a single CMake configure handles building the
builtins for multiple platforms (e.g. macOS, iPhoneSimulator, and iOS)
and architectures (e.g. arm64, armv7, and x86_64). Consequently, if you
specify multiple Darwin triples in LLVM_BUILTIN_TARGETS, expecting each
configure to only build for that particular triple, it won't work.
However, if you specify a *single* x86_64-apple-darwin triple in
LLVM_BUILTIN_TARGETS, that single configure will build the builtins for
all Darwin targets, exactly the same way that the default target would.
The only difference between the configuration for the default target and
the x86_64-apple-darwin triple is that the latter runs the configuration
with `-DCOMPILER_RT_DEFAULT_TARGET_ONLY=ON`, but that makes no
difference for Apple targets (none of the CMake codepaths which have
different behavior based on that variable are run for Apple targets).
I tested this by running two builtins builds on my Mac, one with the
default target and one with the x86_64-apple-darwin19.5.0 target (which
is the default target triple for my clang). The only relevant
CMakeCache.txt difference was the following, and as discussed above, it
has no effect on the actual build for Apple targets:
```
-//Default triple for which compiler-rt runtimes will be built.
-COMPILER_RT_DEFAULT_TARGET_TRIPLE:STRING=x86_64-apple-darwin19.5.0
+//No help, variable specified on the command line.
+COMPILER_RT_DEFAULT_TARGET_ONLY:UNINITIALIZED=ON
```
Furthermore, when I add the `-D` flag to compiler-rt's libtool
invocations, the libraries produced by the two builds are *identical*.
If anything, I would expect builtin_register_target to complain if you
tried specifying a triple for a particular Apple platform triple (e.g.
macosx), since that's the scenario in which it won't work as you want.
The generic darwin triple should be fine though, as best as I can tell.
I'm happy to add the error for specific Apple platform triples, either
in this diff or in a follow-up.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D86313
Known bits for G_ANYEXT was incorrectly using KnownBits::zext, causing
us to treat the high bits as zero even though they're (by definition)
unknown.
Differential Revision: https://reviews.llvm.org/D86323
Assuming this is used to split a memory access into smaller pieces,
the new access should still have the same aliasing properties as the
original memory access. As far as I can tell, this wasn't
intentionally dropped. It may be necessary to drop this if you are
moving the operand outside of the bounds of the original object in
such a way that it may alias another IR object, but I don't think any
of the existing users are doing this. Some of the uses widen into
unused alignment padding, which I think is OK.
Custom lower and widen odd sized loads up to the alignment. The
default set of legalization actions doesn't have a way to represent
this. This fixes naturally aligned <3 x s8> and <3 x s16> loads.
This also starts moving towards eliminating the buggy and
overcomplicated legalization rules for narrowing. All the memory size
changes should be done in the lower or custom action, not NarrowScalar
/ FewerElements. These currently have redundant and ambiguous code
with the lower action.
The SGPR spills happen in SILowerSGPRSpills() and allSGPRSpillsAreDead()
make sure there are no SGPR spills pending during PEI. But the FP/BP
spills happen during PEI and are exceptions.
Use actual frame indices of FP/BP in allSGPRSpillsAreDead() to
accommodate the exceptions.
Differential Revision: https://reviews.llvm.org/D86291
This patch is the initial support for the General Dynamic Thread Local
Local Storage model to produce code sequence and relocations correct
to the ABI for the model when using PC relative memory operations.
Patch by: NeHuang
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D82315
Move fixed length SDIV tests from sve-fixed-length-int-arith.ll to sve-fixed-length-int-div.ll. The former uses CHECK lines that verify legalization decisions. That's overkill for the i8/i16 SDIV tests, since they have a tricky legalization.
Then it is trivial to make the output indented (the second parameter of
json::OStream::OStream specifies the indentation).
Reviewed By: jhenderson, echristo
Differential Revision: https://reviews.llvm.org/D86045
There are no nxv16i8/nxv8i16 SDIV instructions, so these fixed width operations must be promoted to nxv4i32.
Differential Revision: https://reviews.llvm.org/D86114
This ensures that we never encode an instruction which is unavailable,
such as if we explicitly insert a forbidden instruction when lowering.
This is particularly important on RISC-V given its high degree of
modularity, and will become increasingly important as new standard
extensions appear.
Reviewed By: asb, lenary
Differential Revision: https://reviews.llvm.org/D85015
Currently we don't do anything about these,
neither in InstCombine, nor in SimplifyCFG's sinking.
These happen exceedingly rarely, but i've seen them in the cases where
PHI-aware aggregate reconstruction would have fired if not for them.
This exposes the module optimization pipeline as a pass that can be
applied stand-alone when using 'opt'. This helps ml inliner training
scenarios, where we start with IR captured right before inlining,
perform the inlining (-scc-oz-module-inliner) and then want to continue
and observe the final IR (where this patch comes into play). We can then
apply llc on the resulting IR to continue compilation down to native.
Differential Revision: https://reviews.llvm.org/D86224
The normal scheme for tail folding reductions is to use:
loop:
p = phi(0, a)
mask = ...
x = masked_load(..., mask)
a = add(x, p)
s = select(mask, a, p)
This means we need to keep the register p and a alive out of the loop, plus
the mask. On a target with predicated operations we can instead generate
the phi as p = phi(0, s). This ensures the select in the loop and we can
fold select(m, add(a, b), c) to something like a vaddt c, a, b using the
m predicate. This in turn allows us to tail predicate the entire loop.
Differential Revision: https://reviews.llvm.org/D84741
The getSrcFromCopy helper nowadays return a MachineOperand pointer,
so talking about zero_reg was incorrect as it nowadays return
a nullptr when not finding a copy like instruction.
Currently, although we handle `CallBase` case in updateImpl, we give up in initialize in the case.
That is problematic when we propagate a range from call site returned position to floating position.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D86196