On z/OS, the error message "EDC5111I Permission denied." is not matched correctly in lit tests. This patch updates the check expression to match successfully.
Differential Revision: https://reviews.llvm.org/D94432
We have no lowering for VSELECT vXi1, vXi1, vXi1, so mark them as
expanded to turn them into a series of logical operations.
Differential Revision: https://reviews.llvm.org/D94946
Split impliesPoison into two recursive walks, one over V, the
other over ValAssumedPoison. This allows us to reason about poison
implications in a number of additional cases that are important
in practice. This is a generalized form of D94859, which handles
the cmp to cmp implication in particular.
Differential Revision: https://reviews.llvm.org/D94866
This patch factors out the "VLMax" operand passed to most
scalable-vector ISel patterns into a property of each VType.
This is seen as a preparatory change to allow RVV in the future to
more easily support fixed-length vector types with constrained vector
lengths, with the AVL operand set to the length of the fixed-length
vector. It has no effect on the scalable code generation path.
Reviewed By: HsiangKai
Differential Revision: https://reviews.llvm.org/D94594
This adds some basic MVE sadd_sat/ssub_sat/uadd_sat/usub_sat costs,
based on when the instruction is legal. With smaller than legal types
that are promoted we generate shr(qadd(shl, shl)), so the cost is 4
appropriately.
Differential Revision: https://reviews.llvm.org/D94958
The TableGen emitter for directives has two slots for flangClass information and this was mainly
to be able to keep up with the legacy openmp parser at the time. Now that all clauses are encapsulated in
AccClause or OmpClause, these two strings are not necessary anymore and were the the source of couple
of problem while working with the generic structure checker for OpenMP.
This patch remove the flangClassValue string from DirectiveBase.td and use the string flangClass as the
placeholder for the encapsulated class.
Reviewed By: sameeranjoshi
Differential Revision: https://reviews.llvm.org/D94821
When performing peephole optimization to simplify the code, after removing
passed FPSP/XSRSP instruction we will set any uses of that FRSP/XSRSP to the
source of the FRSP/XSRSP.
We are finding the machine instruction using virtual register holding FRSP/XSRSP
results by searching all following instructions and encountering an issue
that the first use of the virtual register is a debug MI causing:
1. virtual register in the debug MI removed unexpectedly.
2. virtual register used in non-debug MI not replaced with the source of
FRSP/XSRSP. which stays in a undef status.
This patch fix the issue by only searching non-debug machine instruction using
virtual register holding FRSP/XSRSP results when the vr only has one non debug
usage.
Differential Revisien: https://reviews.llvm.org/D94711
Reviewed by: nemanjai
cmake_minimum_required(VERSION) calls cmake_policy(VERSION),
which sets all policies up to VERSION to NEW.
LLVM started requiring CMake 3.13 last year, so we can remove
a bunch of code setting policies prior to 3.13 to NEW as it
no longer has any effect.
Reviewed By: phosek, #libunwind, #libc, #libc_abi, ldionne
Differential Revision: https://reviews.llvm.org/D94374
83daa49758a1 made loop-rotate more conservative in the presence of
function calls in the prepare-for-lto stage. The code did not properly
account for calls that are no actual function calls, like calls to
intrinsics. This patch updates the code to ensure only calls that are
lowered to actual calls are considered inline candidates.
This CPU supports all v8.5a features except BTI, and so identifies as v8.5a to
Clang. A bit weird, but the best way for things like xnu to detect the new
features it cares about.
Such files (Thin-%%%%%%.tmp.o) are supposed to be deleted immediately
after they're used (either by renaming or deletion). However, we've seen
instances on Windows where this doesn't happen, probably due to the
filesystem being flaky. This is effectively a resource leak which has
prevented us from using the ThinLTO cache on Windows.
Since those temporary files are in the thinlto cache directory which we
prune periodically anyway, allowing them to be pruned too seems like a
tidy way to solve the problem.
Differential revision: https://reviews.llvm.org/D94962
This patch updates the llvm module map to reflect changes made in
`24672ddea3c97fd1eca3e905b23c0116d7759ab8` and fixes the module builds
(`-DLLVM_ENABLE_MODULES=On`).
Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>
This patch computes the cost for vector.reduce<operand> for scalable vectors.
The cost is split into two parts: the legalization cost and the horizontal
reduction.
Differential Revision: https://reviews.llvm.org/D93639
If a srl doesn't introduce any sign bits into the truncated result, then replace with a sra to let us use a PACKSS truncation - fixes a regression noticed in D56387 on pre-SSE41 targets that don't have PACKUSDW.
This caused a miscompile in Chromium, see comments on the codereview for
discussion and pointer to a reproducer.
> InstCombine already performs a fold where X == Y ? f(X) : Z is
> transformed to X == Y ? f(Y) : Z if f(Y) simplifies. However,
> if f(X) only has one use, then we can always directly replace the
> use inside the instruction. To actually be profitable, limit it to
> the case where Y is a non-expr constant.
>
> This could be further extended to replace uses further up a one-use
> instruction chain, but for now this only looks one level up.
>
> Among other things, this also subsumes D94860.
>
> Differential Revision: https://reviews.llvm.org/D94862
This also reverts the follow-up
a003f26539cf4db744655e76c41f4c4a8913f116:
> [llvm] Prevent infinite loop in InstCombine of select statements
>
> This fixes an issue where the RHS and LHS the comparison operation
> creating the predicate were swapped back and forth forever.
>
> Differential Revision: https://reviews.llvm.org/D94934
D84108 exposed a bad interaction between inlining and loop-rotation
during regular LTO, which is causing notable regressions in at least
CINT2006/473.astar.
The problem boils down to: we now rotate a loop just before the vectorizer
which requires duplicating a function call in the preheader when compiling
the individual files ('prepare for LTO'). But this then prevents further
inlining of the function during LTO.
This patch tries to resolve this issue by making LoopRotate more
conservative with respect to rotating loops that have inline-able calls
during the 'prepare for LTO' stage.
I think this change intuitively improves the current situation in
general. Loop-rotate tries hard to avoid creating headers that are 'too
big'. At the moment, it assumes all inlining already happened and the
cost of duplicating a call is equal to just doing the call. But with LTO,
inlining also happens during full LTO and it is possible that a previously
duplicated call is actually a huge function which gets inlined
during LTO.
From the perspective of LV, not much should change overall. Most loops
calling user-provided functions won't get vectorized to start with
(unless we can infer that the function does not touch memory, has no
other side effects). If we do not inline the 'inline-able' call during
the LTO stage, we merely delayed loop-rotation & vectorization. If we
inline during LTO, chances should be very high that the inlined code is
itself vectorizable or the user call was not vectorizable to start with.
There could of course be scenarios where we inline a sufficiently large
function with code not profitable to vectorize, which would have be
vectorized earlier (by scalarzing the call). But even in that case,
there probably is no big performance impact, because it should be mostly
down to the cost-model to reject vectorization in that case. And then
the version with scalarized calls should also not be beneficial. In a way,
LV should have strictly more information after inlining and make more
accurate decisions (barring cost-model issues).
There is of course plenty of room for things to go wrong unexpectedly,
so we need to keep a close look at actual performance and address any
follow-up issues.
I took a look at the impact on statistics for
MultiSource/SPEC2000/SPEC2006. There are a few benchmarks with fewer
loops rotated, but no change to the number of loops vectorized.
Reviewed By: sanwou01
Differential Revision: https://reviews.llvm.org/D94232
This patch handles cases where we have to save/restore the link register
into the stack and and load/store instruction which use the stack are
part of the outlined region. It checks that there will be no overflow
introduced by the new offset and fixup these instructions accordingly.
Differential Revision: https://reviews.llvm.org/D92934
This fixes an issue where the RHS and LHS the comparison operation
creating the predicate were swapped back and forth forever.
Differential Revision: https://reviews.llvm.org/D94934
In addition to consistency, we'll hit a wall when 11.1.0 gets released, because
we cannot represent it with lit versioning scheme.
Differential Revision: https://reviews.llvm.org/D94157
Previously uniqueCallSite could have race conditions between different
threads. Now it is accessed with an atomic RMW and will be unique
between different threads.
Differential Revision: https://reviews.llvm.org/D94784
A previous patch has already changed getInstructionCost to return
an InstructionCost type. This patch changes the other various
getXXXCost functions to return an InstructionCost too. This is a
non-functional change - I've added a few asserts that the costs
are valid in places where we're selecting between vector call
and intrinsic costs. However, since we don't yet return invalid
costs from any of the TTI implementations these asserts should
not fire.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential Revision: https://reviews.llvm.org/D94065
Element sections will also need flags, so we shouldn't squat the
WASM_SEGMENT namespace.
Depends on D90948.
Differential Revision: https://reviews.llvm.org/D92315
This patch changes to make call_indirect explicitly refer to the
corresponding function table, residualizing TABLE_NUMBER relocs against
it.
With this change, wasm-ld now sees all references to tables, and can
link multiple tables.
Differential Revision: https://reviews.llvm.org/D90948
Add nounwind attribute to avoid generating cfi instructions. Also make
global buffer 64 bytes align in lit test case.
Differential Revision: https://reviews.llvm.org/D94910
As of 8dacca943af8a53a23b1caf3142d10fb4a77b645, we sign extend the atomic loaded
operand for signed subword comparisons. However, the assumption that the other
operand is correctly sign extended doesn't always hold. This patch sign extends
the other operand if it needs to be sign extended.
This is a second fix for https://bugs.llvm.org/show_bug.cgi?id=30451
Differential revision: https://reviews.llvm.org/D94058
This is a additional bug fix for c5be0e0cc0. The distance for
the spill instructions is wrong in previous patch.
Differential Revision: https://reviews.llvm.org/D94772
This patch teaches SimplifyCFG::SimplifyBranchOnICmpChain to understand select form of
(x == C1 || x == C2 || ...) / (x != C1 && x != C2 && ...) and optimize them into switch if possible.
D93065 has more context about the transition, including links to the list of optimizations being updated.
Differential Revision: https://reviews.llvm.org/D93943
The code here is checking to see if two sets are identical.
OtherBlocksSet should point to OtherL->getBlocksSet() instead.
Differential Revision: https://reviews.llvm.org/D94926
After much refactoring over the last 2 weeks to the reduction
matching code, I think this change is finally ready.
We effectively broke fmax/fmin vector reduction optimization
when we started canonicalizing to intrinsics in instcombine,
so this should restore that functionality for SLP.
There are still FMF problems here as noted in the code comments,
but we should be avoiding miscompiles on those for fmax/fmin by
restricting to full 'fast' ops (negative tests are included).
Fixing FMF propagation is a planned follow-up.
Differential Revision: https://reviews.llvm.org/D94913
This recommits 2c51bef76cbf0149101b9e7c7c658b4a58657929.
I've fixed the broken check line from when I renamed the test function.
Original commit message:
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
This builds on D94142 where scalable vectors are allowed in structs.
I did have to fix one scalable vector issue in the vector type
creation for these intrinsics where we used getVectorNumElements
instead of ElementCount.
Differential Revision: https://reviews.llvm.org/D94149
This patch adds the default value of 1 to drop_begin.
In the llvm codebase, 70% of calls to drop_begin have 1 as the second
argument. The interface similar to with std::next should improve
readability.
This patch converts a couple of calls to drop_begin as examples.
Differential Revision: https://reviews.llvm.org/D94858