This adds support for functions outlined by the IR Outliner to be
recognized by the debugger. The expected behavior is that it will
skip over the instructions included in that section. This is due to the
fact that we can not say which of the original locations the
instructions originated from.
These functions will show up in the call stack, but you cannot step
through them.
Reviewers: paquette, vsk, djtodoro
Differential Revision: https://reviews.llvm.org/D87302
This adjusts some of how the gather/scatter lowering pass passes around
data and where certain gathers/scatters are created from. It should not
effect code generation on its own, but allows other patches to more
clearly reason about the code.
A number of extra test cases were also added for smaller gathers/
scatters that can be extended, and some of the test comments were
updated.
As was reported on PR50620, the X86LbrCounter destructor was double-closing the filedescriptor and not unmapping the buffer.
Differential Revision: https://reviews.llvm.org/D104201
Loops with irreducible cycles may loop infinitely. Those cannot be
removed, unless the loop/function is marked as mustprogress.
Also discussed in D103382.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D104238
The const qualifier was a hangover from an earlier iteration that allowed
wrapper functions to return pointers to const memory. This feature has
been removed, so there's no reason for this to be const any more, and
removing it eliminates const-cast warnings.
Replace the existing WrapperFunctionResult type in
llvm/include/ExecutionEngine/Orc/Shared/TargetProcessControlTypes.h with a
version adapted from the ORC runtime's implementation.
Also introduce the SimplePackedSerialization scheme (also adapted from the ORC
runtime's implementation) for wrapper functions to avoid manual serialization
and deserialization for calls to runtime functions involving common types.
This commit mostly just replaces bad uses of `NDEBUG` with uses of
`LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI
breaking changes (normally extra struct elements in headers).
Differential Revision: https://reviews.llvm.org/D104216
Much like `mulx`'s `WriteIMulH`, there are two outputs of
AVX2 GATHER instructions. This was changed back in rL160110,
but the sched model change wasn't present.
So right now, for sched models that are marked as complete
(`znver3` only now), codegen'ning `GATHER` results in a crash:
```
DefIdx 1 exceeds machine model writes for early-clobber renamable $ymm3, dead early-clobber renamable $ymm2 = VPGATHERDDYrm killed renamable $ymm3(tied-def 0), undef renamable $rax, 4, renamable $ymm0, 0, $noreg, killed renamable $ymm2(tied-def 1) :: (load 32, align 1)
```
https://godbolt.org/z/Ks7zW7WGh
I'm guessing we need to deal with this like we deal with `WriteIMulH`.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104205
This patch fixes the logic that checks for variadic register definitions,
Before llvm-svn 348114 (commit 4cf35b4ab0b), it was not possible to explicitly
mark variadic operands as definitions. By default, variadic operands of an
MCInst were always assumed to be uses. A number of had-hoc checks were
introduced in the InstrBuilder to fix the processing of variadic register
operands of ARM ldm/stm variants.
This patch simply replaces those old (and buggy) checks with a much simpler (and
correct) check for MCID::Flag::VariadicOpsAreDefs.
This has been unnecessary since r352353 removed GraphTraits
specializations for Type, except that a couple of other headers were
accidentally relying on this declaration.
Differential Revision: https://reviews.llvm.org/D104119
This patch is to address https://bugs.llvm.org/show_bug.cgi?id=50459.
YAML:455:28: error: GUID strings are 38 characters long
The valid format for a GUID is {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
where X is a hex digit (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F).
The length of the individual components must be: 8, 4, 4, 4, 12.
For some cases, the converted string generated by obj2yaml, does not
comply with those lengths. yaml2obj checks that the GUID string must
be 38 characters including the dashes and braces.
Reviewed By: amccarth
Differential Revision: https://reviews.llvm.org/D103089
Changing vector element type doesn't work for v6i32->v6i16 now
that v6i32 is an MVT and v6i16 is not.
I would like to fix this in changeVectorElementType, but you
need a LLVMContext to call getVectorVT which we can't get from
an MVT.
Fixes PR50709.
Export `lq`, `stq`, `lqarx` and `stqcx.` in preparation for implementing 16-byte lock free atomic operations on AIX.
Add a new register class `g8prc` for these instructions, since these instructions require even-odd register pair.
Reviewed By: nemanjai, jsji, #powerpc
Differential Revision: https://reviews.llvm.org/D103010
If the flag is not set, the script saved_model_aot_compile.py in tensorflow will
default it to the correct value. However, in TF 2.5, the way the value is set in
TensorFlowCompile.cmake file triggers a build error.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D103972
One nice feature of the os_signpost API is that format string
substitutions happen in the consumer, not the logging
application. LLVM's current Signpost class doesn't take advantage of
this though and instead always uses a static "Begin/End %s" format
string.
This patch uses variadic macros to allow the API to be used as
intended. Unfortunately, the primary use-case I had in mind (the
LLDB_SCOPED_TIMER() macro) does not get much better from this, because
__PRETTY_FUNCTION__ is *not* a macro, but a static string, so
signposts created by LLDB_SCOPED_TIMER() still use a static "%s"
format string. At least LLDB_SCOPED_TIMERF() works as intended.
This reapplies the previously reverted patch with additional include
order fixes for non-modular builds of LLDB.
Differential Revision: https://reviews.llvm.org/D103575
Currently, Loop strengh reduce is not handling loops with scalable stride very well.
Take loop vectorized with scalable vector type <vscale x 8 x i16> for instance,
(refer to test/CodeGen/AArch64/sve-lsr-scaled-index-addressing-mode.ll added).
Memory accesses are incremented by "16*vscale", while induction variable is incremented
by "8*vscale". The scaling factor "2" needs to be extracted to build candidate formula
i.e., "reg(%in) + 2*reg({0,+,(8 * %vscale)}". So that addrec register reg({0,+,(8*vscale)})
can be reused among Address and ICmpZero LSRUses to enable optimal solution selection.
This patch allow LSR getExactSDiv to recognize special cases like "C1*X*Y /s C2*X*Y",
and pull out "C1 /s C2" as scaling factor whenever possible. Without this change, LSR
is missing candidate formula with proper scaled factor to leverage target scaled-index
addressing mode.
Note: This patch doesn't fully fix AArch64 isLegalAddressingMode for scalable
vector. But allow simple valid scale to pass through.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D103939
One nice feature of the os_signpost API is that format string
substitutions happen in the consumer, not the logging
application. LLVM's current Signpost class doesn't take advantage of
this though and instead always uses a static "Begin/End %s" format
string.
This patch uses variadic macros to allow the API to be used as
intended. Unfortunately, the primary use-case I had in mind (the
LLDB_SCOPED_TIMER() macro) does not get much better from this, because
__PRETTY_FUNCTION__ is *not* a macro, but a static string, so
signposts created by LLDB_SCOPED_TIMER() still use a static "%s"
format string. At least LLDB_SCOPED_TIMERF() works as intended.
This reapplies the previsously reverted patch with additional MachO.h
macro #undefs.
Differential Revision: https://reviews.llvm.org/D103575
Iff we have `SCALAR_TO_VECTOR` (and we demand it's only defined 0'th element),
and said scalar was produced by `EXTRACT_VECTOR_ELT` from the 0'th element
of some vector, then we can just continue traversal into said source vector.
This comes up in X86 vector uniform shift lowering.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104250
The code in fixLdsBranchVmemWARHazard looks for patterns of a vmem/lds
access followed by a branch, followed by an lds/vmem access.
The handling of the hazard requires an arbitrary number of instructions
to process. In the worst case where a function has a vmem access, but no lds
accesses, all instructions are examined only to conclude that the hazard
cannot occur.
Add the pre-processing stage which detects if there is both lds and vmem
present in the function and only then does the more costly search.
This patch significantly improves compilation time in the cases the hazard
cannot happen. In one pathological case I looked at IsHazardInst is needlesly
called 88.6 milions times.
The numbers could also be improved by introducing a map around the
inner calls to ::getWaitStatesSince in fixLdsBranchVmemWARHazard, but
nothing will beat not running fixLdsBranchVmemWARHazard at all in the cases
detected by shouldRunLdsBranchVmemWARHazardFixup().
Differential Revision: https://reviews.llvm.org/D104219
Emphasize that this is basically an attempt to remove
``PointerType::getElementType`` and ``Type::getPointerElementType()``.
Add a couple more subtasks.
Differential Revision: https://reviews.llvm.org/D104151
It assumes that PointerType will keep having an optional pointee type,
but we'd like to remove the pointee type in PointerType at some point.
I feel like the current implementation could be simplified anyway,
although perhaps I'm underestimating the amount of work needed
throughout BitcodeReader.
We will still need a side table to keep track of pointee types. This
will be reimplemented at some point.
This is essentially a revert of a4771e9d (which doesn't look like it was
reviewed anyway).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103135
This change provides the option to merge and aggregate cold context by the last k frames instead of context-less name. By default K = 1 means the context-less one.
This is for better perf tuning. The more selective merging and trimming will rely on llvm-profgen's preinliner.
Reviewed By: wenlei, hoy
Differential Revision: https://reviews.llvm.org/D104131
This patch adds support for loading and storing unaligned vectors via an
equivalently-sized i8 vector type, which has support in the RVV
specification for byte-aligned access.
This offers a more optimal path for handling of unaligned fixed-length
vector accesses, which are currently scalarized. It also prevents
crashing when `LegalizeDAG` sees an unaligned scalable-vector load/store
operation.
Future work could be to investigate loading/storing via the largest
vector element type for the given alignment, in case that would be more
optimal on hardware. For instance, a 4-byte-aligned nxv2i64 vector load
could loaded as nxv4i32 instead of as nxv16i8.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D104032
We canonicalized to these select patterns (poison-safe logic)
with D101191, so we need to reduce 'not' ops when possible
as we would with 'and'/'or' instructions.
This is shown in a secondary example in:
https://llvm.org/PR50389https://alive2.llvm.org/ce/z/BvsESh
Currently the irreducible cycles in the loops are ignored. The
irreducible cycle may loop infinitely in
irreducible_subloop_no_mustprogress, which is allowed and the loop
should not be removed.
Discussed in D103382.
As Eli mentioned post-commit in D103378, the result of the freeze may
still be out-of-range according to Alive2. So for now, just limit the
transform to indices that are non-poison.
6e5628354e22f3ca40b04295bac540843b8e6482 regressed the Windows build as
the return type no longer matched in both branches for the return value
type deduction. This uses a bit more compiler magic to deal with that.
Given a vecreduce_add node, detect the below pattern and convert it to the node
sequence with UABDL, [S|U]ADB and UADDLP.
i32 vecreduce_add(
v16i32 abs(
v16i32 sub(
v16i32 [sign|zero]_extend(v16i8 a), v16i32 [sign|zero]_extend(v16i8 b))))
=================>
i32 vecreduce_add(
v4i32 UADDLP(
v8i16 add(
v8i16 zext(
v8i8 [S|U]ABD low8:v16i8 a, low8:v16i8 b
v8i16 zext(
v8i8 [S|U]ABD high8:v16i8 a, high8:v16i8 b
Differential Revision: https://reviews.llvm.org/D104042
The sorting, obviously, must be stable, else we will have random assembly fluctuations.
Apparently there was no test coverage that would benefit from that,
so i've added one test.
The sorting consists of two parts - just sort the input vectors,
and recompute the shuffle mask -> input vector mapping.
I don't believe we need to do anything else.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D104187
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.
This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.
Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D99173