Fix a bug in MachOObjectFile::getSymbolType() that it is not checking if
the iterator is end() before deference the iterator. Instead, return
`Other` type, which aligns with the behavior of `llvm-nm`.
rdar://75291638
Reviewed By: davide, ab
Differential Revision: https://reviews.llvm.org/D98739
This uses the shuffle mask cost from D98206 to give a better cost of MVE
VREV instructions. This helps especially in VectorCombine where the cost
of shuffles is used to reorder bitcasts, which this helps keep the phase
ordering test for fp16 reductions producing optimal code. The isVREVMask
has been moved to a header file to allow it to be used across target
transform and isel lowering.
Differential Revision: https://reviews.llvm.org/D98210
All loop passes should preserve all analyses in LoopAnalysisResults. Add
checks for those.
Note that due to PR44815, we don't check LAR's ScalarEvolution.
Apparently calling SE.verify() can change its results.
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D98805
At the moment `getMCInstrBeads` is forward-declared in a few places,
bring this together into a single header file.
This was done as part of the disassembler work, since the disassembler
would otherwise add one more forward declaration.
Differential Revision: https://reviews.llvm.org/D98533
This is required because empty strings are not allowed when generating
the assembly parser tables.
Differential Revision: https://reviews.llvm.org/D98532
Maybe there's a way to make them work, but until I've investigated
if tools can consume large PDBs, erroring out is better than slowly
and silently consuming all available ram due to internal invariants
being violated.
(Patch to make writing larger files work at
https://bugs.chromium.org/p/chromium/issues/detail?id=1179085#c25
but I haven't had time to check if windbg & co can consume these
large PDBs. llvm-pdbutil can't, but we can fix that one at least :) )
Differential Revision: https://reviews.llvm.org/D98788
It is not legal to form a phi node with token type. The generic LCSSA construction code handles this correctly - by not forming LCSSA for such cases - but the adhoc fixup implementation in LICM did not.
This was noticed in the context of PR49607, but can be demonstrated on ToT with the tweaked test case. This is not specific to gc.relocate btw, it also applies to usage of the preallocated family of intrinsics as well.
Differential Revision: https://reviews.llvm.org/D98728
This adds an Mask ArrayRef to getShuffleCost, so that if an exact mask
can be provided a more accurate cost can be provided by the backend.
For example VREV costs could be returned by the ARM backend. This should
be an NFC until then, laying the groundwork for that to be added.
Differential Revision: https://reviews.llvm.org/D98206
Fixed section of code that iterated through a SmallDenseMap and added
instructions in each iteration, causing non-deterministic code; replaced
SmallDenseMap with MapVector to prevent non-determinism.
This reverts commit 01ac6d1587e8613ba4278786e8341f8b492ac941.
The `hasIrregularType` predicate checks whether an array of N values of type Ty is "bitcast-compatible" with a <N x Ty> vector.
The previous check returned invalid results in some cases where there's some padding between the array elements: eg. a 4-element array of u7 values is considered as compatible with <4 x u7>, even though the vector is only loading/storing 28 bits instead of 32.
The problem causes LLVM to generate incorrect code for some targets: for AArch64 the vector loads/stores are lowered in terms of ubfx/bfi, effectively losing the top (N * padding bits).
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D97465
Given a sextload i16, we can usually generate "ldrsh [rn. rm]". If we
don't naturally have a rn, rm addressing mode, we can either generate
"ldrh [rn, #0]; sxth" or "mov rm, #0; ldrsh [rn. rm]".
We currently generate the first, always creating a sxth. They are both
the same number of instructions, but if we generate the second then the
mov #0 will likely be CSE'd or pulled out of a loop, etc.
This adjusts the ISel patterns to do that, creating a mov instead of a
sxth.
Differential Revision: https://reviews.llvm.org/D98693
The D93881 added functionality which preserve ownership for output file
if llvm-objcopy is called under root. That code was added into the place
where output file is created. The llvm-objcopy already has a function which
sets/restores rights/permissions for the output file.
That is the restoreStatOnFile() function. This patch moves code
(preserving ownershipping) into the restoreStatOnFile() function.
Differential Revision: https://reviews.llvm.org/D98511
This caused non-deterministic compiler output; see comment on the
code review.
> This patch updates the various IR passes to correctly handle dbg.values with a
> DIArgList location. This patch does not actually allow DIArgLists to be produced
> by salvageDebugInfo, and it does not affect any pass after codegen-prepare.
> Other than that, it should cover every IR pass.
>
> Most of the changes simply extend code that operated on a single debug value to
> operate on the list of debug values in the style of any_of, all_of, for_each,
> etc. Instances of setOperand(0, ...) have been replaced with with
> replaceVariableLocationOp, which takes the value that is being replaced as an
> additional argument. In places where this value isn't readily available, we have
> to track the old value through to the point where it gets replaced.
>
> Differential Revision: https://reviews.llvm.org/D88232
This reverts commit df69c69427dea7f5b3b3a4d4564bc77b0926ec88.
Previously NEON used a target specific intrinsic for frintn, given that
the FROUNDEVEN ISD node now exists, move over to that instead and add
codegen support for that node for both NEON and fixed length SVE.
Differential Revision: https://reviews.llvm.org/D98487
This adds the cost of an i1 extract and a branch to the cost in
getMemInstScalarizationCost when the instruction is predicated. These
predicated loads/store would generate blocks of something like:
%c1 = extractelement <4 x i1> %C, i32 1
br i1 %c1, label %if, label %else
if:
%sa = extractelement <4 x i32> %a, i32 1
%sb = getelementptr inbounds float, float* %pg, i32 %sa
%sv = extractelement <4 x float> %x, i32 1
store float %sa, float* %sb, align 4
else:
So this increases the cost by the extract and branch. This is probably
still too low in many cases due to the cost of all that branching, but
there is already an existing hack increasing the cost using
useEmulatedMaskMemRefHack. It will increase the cost of a memop if it is
a load or there are more than one store. This patch improves the cost
for when there is only a single store, and hopefully at some point in
the future the hack can be removed.
Differential Revision: https://reviews.llvm.org/D98243
Current SLP pass has this piece of code that inserts a trunc instruction
after the vectorized instruction. In the case that the vectorized instruction
is a phi node and not the last phi node in the BB, the trunc instruction
will be inserted between two phi nodes, which will trigger verify problem
in debug version or unpredictable error in another pass.
This patch changes the algorithm to 'if the last vectorized instruction
is a phi, insert it after the last phi node in current BB' to fix this problem.
This patch adds an optimization path for BUILD_VECTOR nodes where the
majority of the elements are identical. These can be splatted, with the
remaining elements patched up with INSERT_VECTOR_ELTs. The threshold can
be tweaked as required - it is currently conservative. Undef elements
are disregarded when judging the dominance of a particular element. This
allows them to be covered by the splat value.
In addition, vectors of 2 elements are always optimized to a splat (for
the upper element) and an insert at element zero.
This optimization is disabled when optimizing for size.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D98700
Split out some of the instructions predicated on the dot2-insts target
feature into a new dot7-insts, in preparation for subtargets that have
some but not all of these instructions. NFCI.
Differential Revision: https://reviews.llvm.org/D98717
BasicAA stores a reference to LoopInfo inside. This imposes an implicit
requirement of keeping it up to date whenever we modify the IR (in particular,
whenever we modify terminators of blocks that belong to loops). Failing
to do so leads to incorrect state of the LoopInfo.
Because general AA does not require loop info updates and provides to API to
update it properly, the users of AA reasonably assume that there is no need to
update the loop info. It may be a reason of bugs, as example in PR43276 shows.
This patch drops dependence of BasicAA on LoopInfo to avoid this problem.
This may potentially pessimize the result of queries to BasicAA.
Differential Revision: https://reviews.llvm.org/D98627
Reviewed By: nikic
The motivation is to handle integer min/max reductions independently
of whether they are in the current cmp+sel form or the planned intrinsic
form.
We assumed that min/max included a select instruction, but we can
decouple that implementation detail by checking the instructions
themselves rather than relying on the recurrence (reduction) type.
- Previously, https://reviews.llvm.org/D97703 was [[ https://reviews.llvm.org/D98543 | reverted ]] as it broke when building the unit tests when shared libs on.
- This patch reverts the "revert" and makes two minor changes
- The first is it also links in the MCParser lib when building the unittest. This should resolve the issue when building with with shared libs on and off
- The second renames the name of the unit test from `SystemZAsmLexer` to `SystemZAsmLexerTests` since the convention for unittest binaries is to suffix the name of the unit test with "Tests"
Reviewed By: Kai
Differential Revision: https://reviews.llvm.org/D98666
This change adds an operand class for each addressing mode, which can then
be used as part of the assembler to match instructions.
Differential Revision: https://reviews.llvm.org/D98535
Previously we created a new node, then filled in the pieces. Now, we clone the existing node, then change the respective fields. The only change in handling is with phis since we have to handle multiple incoming edges from the same block a bit differently.
Differential Revision: https://reviews.llvm.org/D98316
A broadcast is a shufflevector where only one input is used. Because of the way we handle constants (undef is a constant), the canonical shuffle sees a meet of (some value) and (nullptr). Given this, every broadcast gets treated as a conflict and a new base pointer computation is added.
The other way to tackle this would be to change constant handling specifically for undefs, but this seems easier.
Differential Revision: https://reviews.llvm.org/D98315
RS4GC needs to rewrite the IR to ensure that every relocated pointer has an associated base pointer. The existing code isn't particularly smart about avoiding duplication of existing IR when it turns out the original pointer we were asked to materialize a base pointer for is itself a base pointer.
This patch adds a stage to the algorithm which prunes nodes proven (with a simple forward dataflow fixed point) to be base pointers from the list of nodes considered for duplication. This does require changing some of the later invariants slightly, that's probably the riskiest part of the change.
Differential Revision: D98122
Add MemorySSAWrapperPass as a dependency to MemCpyOptLegacyPass,
since MemCpyOpt now uses MemorySSA by default.
Differential Revision: https://reviews.llvm.org/D98484
The main part of the patch is the change in RegAllocGreedy.cpp: Q.collectInterferringVregs()
needs to be called before iterating the interfering live ranges.
The rest of the patch offers support that is the case: instead of clearing the query's
InterferingVRegs field, we invalidate it. The clearing happens when the live reg matrix
is invalidated (existing triggering mechanism).
Without the change in RegAllocGreedy.cpp, the compiler ices.
This patch should make it more easily discoverable by developers that
collectInterferringVregs needs to be called before iterating.
I will follow up with a subsequent patch to improve the usability and maintainability of Query.
Differential Revision: https://reviews.llvm.org/D98232
If llvm so lib is dlopened and dlclosed several times, then memory leak can be observed, reported by Valgrind.
This patch fixes the issue.
Reviewed By: lattner, dblaikie
Differential Revision: https://reviews.llvm.org/D83372
This was (partially) reverted in cfe8f8e0 because the conversion from readonly to readnone in Intrinsics.td exposed a couple of problems. This change has been reworked to not need that change (via some explicit checks in client code). This is being done to address the original optimization issue and simplify the testing of the readonly changes. I'm working on that piece under 49607.
Original commit message follows:
The last two operands to a gc.relocate represent indices into the associated gc.statepoint's gc bundle list. (Effectively, gc.relocates are projections from the gc.statepoints multiple return values.)
We can use this to recognize when two gc.relocates are equivalent (and can be CSEd), even when the indices are non-equal. This is particular useful when considering a chain of multiple statepoints as it lets us eliminate all duplicate gc.relocates in a single pass.
Differential Revision: https://reviews.llvm.org/D97974
Instead of maintaining a separate map from predicated instructions to
recipes, we can instead directly look at the VP operands. If the operand
comes from a predicated instruction, the operand will be a
VPPredInstPHIRecipe with a VPReplicateRecipe as its operand.
This is a follow-up to D98588, and fixes the inline `FIXME` about a GEP-related simplification not
preserving the provenance.
https://alive2.llvm.org/ce/z/qbQoAY
Additional tests were added in {rGf125f28afdb59eba29d2491dac0dfc0a7bf1b60b}
Depends on D98672
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D98611
This commit folds sxtw'd or uxtw'd offsets into gather loads where
possible with a DAGCombine optimization.
As an example, the following code:
1 #include <arm_sve.h>
2
3 svuint64_t func(svbool_t pred, const int32_t *base, svint64_t offsets) {
4 return svld1sw_gather_s64offset_u64(
5 pred, base, svextw_s64_x(pred, offsets)
6 );
7 }
would previously lower to the following assembly:
sxtw z0.d, p0/m, z0.d
ld1sw { z0.d }, p0/z, [x0, z0.d]
ret
but now lowers to:
ld1sw { z0.d }, p0/z, [x0, z0.d, sxtw]
ret
Differential Revision: https://reviews.llvm.org/D97858
One of (and primary) callers of isBasicBlockEntryGuardedByCond is
isKnownPredicateAt, which makes isKnownPredicate check before it.
It already makes non-recursive check inside. So, on this execution
path this check is made twice. The only other caller is
isLoopEntryGuardedByCond. Moving the check there should save some
compile time.
The InstrEmitter can sometimes insert a copy after an IMPLICIT_DEF
before connecting it to the vector instruction. This occurs when
constrainRegClass reduces to a class with less than 4 registers.
I believe LMUL8 on masked instructions triggers this since the
result can only use the v8, v16, or v24 register group as the mask
is using v0.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D98567
This commit implements an IR-level optimization to eliminate idempotent
SVE mul/fmul intrinsic calls. Currently, the following patterns are
captured:
fmul pg (dup_x 1.0) V => V
mul pg (dup_x 1) V => V
fmul pg V (dup_x 1.0) => V
mul pg V (dup_x 1) => V
fmul pg V (dup v pg 1.0) => V
mul pg V (dup v pg 1) => V
The result of this commit is that code such as:
1 #include <arm_sve.h>
2
3 svfloat64_t foo(svfloat64_t a) {
4 svbool_t t = svptrue_b64();
5 svfloat64_t b = svdup_f64(1.0);
6 return svmul_m(t, a, b);
7 }
will lower to a nop.
This commit does not capture all possibilities; only the simple cases
described above. There is still room for further optimisation.
Differential Revision: https://reviews.llvm.org/D98033
The default promotion uses zero extends that become shifts. We
cam use sign extend instead which is better for RISCV.
I've used two different implementations based on whether we
have minu/maxu instructions.
Differential Revision: https://reviews.llvm.org/D98683