InstCombine canonicalizes X>C && X<C' style comparisons into
(X+C1)<C2. This type of expression is recognized by some analyses
like LVI, but currently not when used inside assumptions, because
AssumptionCache does not track affected values for it.
This extends the existing x86-64-varargs test by passing enough
arguments that they need to be passed in memory, and by passing them in
reverse order, using va_arg for each argument to retrieve them and
restoring them to the correct order, and by using va_copy to have two
va_lists to use with va_arg.
If a faux shuffle uses smaller shuffle inputs, try to recursively combine with those inputs directly instead of widening them immediately. Then widen all smaller inputs at the bottom of the recursion.
This will still mean we're generating nodes on the fly (PR45974) even if we don't combine to a new shuffle but it does help AVX2+ targets combine across xmm/ymm/zmm types, mainly as variable shuffles.
This recommits a87fccb3ff9c with a fix to mark the destination operand
of the marker instruction as def, to fix a machine verifier failure.
This reverts the revert commit c0f2cea7c0afc7c9688e1633f2a9b25c8ea4a9bd.
Default expansion leads to repeated extensions/truncations to/from vXi16 which shuffle combining and demanded elts can't completely unravel.
Better just to promote (any_extend) the input and perform a vXi16 reduction.
We'll be able to remove a lot of this if we ever get decent legalization support for reduction intrinsics in SelectionDAG.
BasicAA currently handles cases like Scale*V0 + (-Scale)*V1 where
V0 != V1, but does not handle the simpler case of Scale*V with
V != 0. Add it based on an isKnownNonZero() call.
I'm not passing a context instruction for now, because the existing
approach of always using GEP1 for context could result in symmetry
issues.
Differential Revision: https://reviews.llvm.org/D93162
On macOS/arm, signature verification has kill semantics by default.
Signature verification is cached with a file's inode (actually, vnode),
and if a new executable is copied over an existing file (which reuses
the inode), the cache isn't invalidated. So when the new executable
is executed, the kernel still has the old content's signature cached
and the kills the executable because the old signatue doesn't match
the new contents (https://openradar.appspot.com/FB8914243).
As workaround, rm the desitnation files first, to ensure they have
a fresh vnode (and hence no stale cached signature) after the copy.
Part of PR46647. See also e0e334a9c1ac for a similar change.
In particular, if the successor block, which is about to get a new
predecessor block, currently only has a single predecessor,
then the bonus instructions will be directly used within said successor,
which is fine, since the block with bonus instructions dominates that
successor. But once there's a new predecessor, the IR is no longer valid,
and we don't fix it, because we only update PHI nodes.
Which means, the live-out bonus instructions must be exclusively used
by the PHI nodes in successor blocks. So we have to form trivial PHI nodes.
which will then be successfully updated to recieve cloned bonus instns.
This all works fine, except for the fact that we don't have access to
the dominator tree, and we don't ignore unreachable code,
so we sometimes do end up having to deal with some weird IR.
Fixes https://bugs.llvm.org/show_bug.cgi?id=48450
Some of the pattern matching in PPCInstrVSX.td and node lowering involving vectors assumes 64bit mode. This patch disables some of the unsafe pattern matching and lowering of BUILD_VECTOR in 32bit mode.
Reviewed By: Xiangling_L
Differential Revision: https://reviews.llvm.org/D92789
CVP currently handles switches by checking an equality predicate
on all edges from predecessor blocks. Of course, this can only
work if the value being switched over is defined in a different block.
Replace this implementation with a call to getPredicateAt(), which
also does the predecessor edge predicate check (if not defined in
the same block), but can also do quite a bit more: It can reason
about phi-nodes by checking edge predicates for incoming values,
it can reason about assumes, and it can reason about block values.
As such, this makes the implementation both simpler and more
powerful. The compile-time impact on CTMark is in the noise.
These cover cases handled by getPredicateAt(), but not by the
current implementation:
* Assumes based on context instruction.
* Value from phi node in same block (using per-pred reasoning).
* Value from non-phi node in same block (using block-val reasoning).
The getPayload/getMask/getPassThrough functions should return values
that could be composed into a masked load/store without any additional
type casts. The previous fix violated that.
Instead, convert scalar mask to a vector right before rescaling.
- Document which processors are supported by which runtimes.
- Add missing mappings for code object V2 note records
Differential Revision: https://reviews.llvm.org/D93016
The last use of isLoop was removed on Apr 29, 2002 in commit
09bbb5c015c6e40b3d45da057f955ddb7c8f8485 as part of an effort to
remove "old induction varaible cannonicalization pass built on top of
interval analysis".
AlignVectors treats all loaded/stored values as vectors of bytes,
and masks as corresponding vectors of booleans, so make getMask
produce a 1-element vector for scalars from the start.
This makes it possible to use update_llc_test_checks to manage tests
that check for incorrect x86 stack offsets. It does not yet modify any
test to make use of this new option.
The ABI demands a data16 prefix for lea in 64-bit LP64 mode, but not in
64-bit ILP32 mode. In both modes this prefix would ordinarily be
ignored, but the instructions may be changed by the linker to
instructions that are affected by the prefix.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D93157
This adds some basic MVE masked load/store costs, notably changing the
cost of legal loads/stores to the MVECostFactor and the cost of
scalarized instructions to 8*NumElts.
Differential Revision: https://reviews.llvm.org/D86538
When it comes to the scalar cost of any predicated block, the loop
vectorizer by default regards this predication as a sign that it is
looking at an if-conversion and divides the scalar cost of the block by
2, assuming it would only be executed half the time. This however makes
no sense if the predication has been introduced to tail predicate the
loop.
Original patch by Anna Welker
Differential Revision: https://reviews.llvm.org/D86452
This introduces more flexible multiclass for declaring two flags controlling the same boolean keypath.
Compared to existing Opt{In,Out}FFlag multiclasses, the new syntax makes it easier to read option declarations and reason about the keypath.
This also makes specifying common properties of both flags possible.
I'm open to suggestions on the class names. Not 100% sure the benefits are worth the added complexity.
Depends on D92774.
Reviewed By: dexonsmith
Differential Revision: https://reviews.llvm.org/D92775
We don't need to always generate `-f[no-]experimental-new-pass-manager`.
This patch does not change the behavior of any other command line flag. (For example `-triple` is still being always generated.)
Reviewed By: dexonsmith, Bigcheese
Differential Revision: https://reviews.llvm.org/D92857
gcov computes the line execution count as the sum of (a) counts from
predecessors on other lines and (b) the sum of loop execution counts of blocks
on the same line (think of loops on one line).
For (b), we use Donald B. Johnson's cycle enumeration algorithm and perform
cycle cancelling for each cycle. This number of candidate cycles were
exponential and D93036 made it polynomial by skipping zero count cycles. The
time complexity is high (O(V*E^2) (it could be O(E^2) but the linear `Blocks`
check made it higher) and the implementation is complex.
We could just identify loops and sum all back edges. However, this requires a
dominator tree construction which is more complex. The time complexity can be
decreased to almost linear, though.
This patch just performs cycle cancelling iteratively. Add two members
`traversable` and `incoming` to GCOVArc. There are 3 states:
* `!traversable`: blocks not on this line or explored blocks
* `traversable && incoming == nullptr`: unexplored blocks
* `traversable && incoming != nullptr`: blocks which are being explored (on the stack)
If an arc points to a block being explored, a cycle has been found.
Let E be the number of arcs. Every time a cycle is found, at least one arc is
saturated (`edgeCount` reduced to 0), so there are at most E cycles. Finding one
cycle takes O(E) time, so the overall time complexity is O(E^2). Note that we
always augment through a back edge and never need to augment its reverse edge so
reverse edges in traditional flow networks are not needed.
Reviewed By: xinhaoyuan
Differential Revision: https://reviews.llvm.org/D93073
The performance improvement on LBM previously achieved with improved software
prefetching (36d4421) have gone lost recently with e00f189. There now is one
memory access in the loop that LoopDataPrefetch cannot handle (while before
there was none) which the heuristic rejects.
This patch adds a small margin by allowing 1 non-prefetched memory access for
every 32 prefetched ones, so that the heuristic doesn't bail in this type of
case.
Review: Ulrich Weigand
Differential Revision: https://reviews.llvm.org/D92985
Summary:
"Speculative fix for link failure on bots" with a mention of "the clang-ppc64le-rhel bot fails on link: http://lab.llvm.org:8011/#/builders/57/builds/2307/steps/6/logs/stdio".
PPCAsmPrinter.cpp:(.text._ZN12_GLOBAL__N_116PPCAIXAsmPrinter19emitFunctionBodyEndEv+0x2f8): undefined reference to `llvm::XCOFF::getNameForTracebackTableLanguageId(llvm::XCOFF::TracebackTable::LanguageID)'
PPCAsmPrinter.cpp:(.text._ZN12_GLOBAL__N_116PPCAIXAsmPrinter19emitFunctionBodyEndEv+0x2170): undefined reference to `llvm::XCOFF::parseParmsType(unsigned int, unsigned int)'