Local values are constants or addresses that can't be folded into
the instruction that uses them. FastISel materializes these in a
"local value" area that always dominates the current insertion
point, to try to avoid materializing these values more than once
(per block).
https://reviews.llvm.org/D43093 added code to sink these local
value instructions to their first use, which has two beneficial
effects. One, it is likely to avoid some unnecessary spills and
reloads; two, it allows us to attach the debug location of the
user to the local value instruction. The latter effect can
improve the debugging experience for debuggers with a "set next
statement" feature, such as the Visual Studio debugger and PS4
debugger, because instructions to set up constants for a given
statement will be associated with the appropriate source line.
There are also some constants (primarily addresses) that could be
produced by no-op casts or GEP instructions; the main difference
from "local value" instructions is that these are values from
separate IR instructions, and therefore could have multiple users
across multiple basic blocks. D43093 avoided sinking these, even
though they were emitted to the same "local value" area as the
other instructions. The patch comment for D43093 states:
Local values may also be used by no-op casts, which adds the
register to the RegFixups table. Without reversing the RegFixups
map direction, we don't have enough information to sink these
instructions.
This patch undoes most of D43093, and instead flushes the local
value map after(*) every IR instruction, using that instruction's
debug location. This avoids sometimes incorrect locations used
previously, and emits instructions in a more natural order.
In addition, constants materialized due to PHI instructions are
not assigned a debug location immediately; instead, when the
local value map is flushed, if the first local value instruction
has no debug location, it is given the same location as the
first non-local-value-map instruction. This prevents PHIs
from introducing unattributed instructions, which would either
be implicitly attributed to the location for the preceding IR
instruction, or given line 0 if they are at the beginning of
a machine basic block. Neither of those consequences is good
for debugging.
This does mean materialized values are not re-used across IR
instruction boundaries; however, only about 5% of those values
were reused in an experimental self-build of clang.
(*) Actually, just prior to the next instruction. It seems like
it would be cleaner the other way, but I was having trouble
getting that to work.
This reapplies commits cf1c774d and dc35368c, and adds the
modification to PHI handling, which should avoid problems
with debugging under gdb.
Differential Revision: https://reviews.llvm.org/D91734
There were a number of tests needing updates for D91734, and I added a
bunch of LABEL directives to help track down where those had to go.
These directives are an improvement independent of the functional
patch, so I'm committing them as their own separate patch.
The existing implementation of parallel region merging applies only to
consecutive parallel regions that have speculatable sequential
instructions in-between. This patch lifts this limitation to expand
merging with any sequential instructions in-between, except calls to
unmergable OpenMP runtime functions. In-between sequential instructions
in the merged region are sequentialized in a "master" region and any
output values are broadcasted to the following parallel regions and the
sequential region continuation of the merged region.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D90909
This patch unifies the way recipes and VPValues are printed after the
transition to VPDef.
VPSlotTracker has been updated to iterate over all recipes and all
their defined values to number those. There is no need to number
values in Value2VPValue.
It also updates a few places that only used slot numbers for
VPInstruction. All recipes now can produce numbered VPValues.
This adds uses for locals introduced for new debug messages for the load store optimizer. Those locals are only used on debug statements and otherwise create unused variable warnings.
Differential Revision: https://reviews.llvm.org/D94398
These are often generated when building a vector from the reduction sums of independent vectors.
I've implemented some typical patterns from various v4f32/v4i32 based off current codegen emitted from the vectorizers, although these tests are more about tweaking some hadd style backend folds to handle whatever the vectorizers/vectorcombine throws at us...
It has multiple issues fixed by this patch:
1) It shouldn't test how llvm-readelf/yaml2obj works.
2) It should use "-NEXT" prefix for check lines.
3) It can use YAML macros, that allows to use a single YAML.
4) It should probably test the case when a group member is a null section.
Differential revision: https://reviews.llvm.org/D93753
As was mentioned in comments here:
https://reviews.llvm.org/D92636#inline-864967
we are not consistent and sometimes index things from 0, but sometimes
from 1 in warnings.
This patch fixes 2 places: messages reported for
program headers and messages reported for relocations.
Differential revision: https://reviews.llvm.org/D93805
`getUniquedSectionName(const Elf_Shdr *Sec)` assumes that
`Sec` is not `nullptr`.
I've found one place in `getUniquedSymbolName` where it is
not true (because of that we crash when trying to dump
unnamed null section symbols).
Patch fixes the crash and changes the signature of the
`getUniquedSectionName` section to accept a reference.
Differential revision: https://reviews.llvm.org/D93754
This reapplies commit rG80dee7965dffdfb866afa9d74f3a4a97453708b2.
[X86][SSE] Fold unpack(hop(),hop()) -> permute(hop())
UNPCKL/UNPCKH only uses one op from each hop, so we can merge the hops and then permute the result.
REAPPLIED with a fix for unary unpacks of HOP.
Changes in this patch:
- When lowering floating-point masked gathers, cast the result of the
gather back to the original type with reinterpret_cast before returning.
- Added patterns for reinterpret_casts from integer to floating point, and
concat_vector patterns for bfloat16.
- Tests for various legalisation scenarios with floating point types.
Reviewed By: sdesmalen, david-arm
Differential Revision: https://reviews.llvm.org/D94171
This patch fixes a bug that could result in miscompiles (at least
in an OOT target). The problem could be seen by adding checks that
the DominatorTree used in BasicAliasAnalysis and ValueTracking was
valid (e.g. by adding DT->verify() call before every DT dereference
and then running all tests in test/CodeGen).
Problem was that the LegacyPassManager calculated "last user"
incorrectly for passes such as the DominatorTree when not telling
the pass manager that there was a transitive dependency between
the different analyses. And then it could happen that an incorrect
dominator tree was used when doing alias analysis (which was a pretty
serious bug as the alias analysis result could be invalid).
Fixes: https://bugs.llvm.org/show_bug.cgi?id=48709
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D94138
The tile register spill need 2 instructions.
%46:gr64_nosp = MOV64ri 64
TILESTORED %stack.2, 1, killed %46:gr64_nosp, 0, $noreg, %43:tile
The first instruction load the stride to a GPR, and the second
instruction store tile register to stack slot. The optimization of merge
spill instruction is done after register allocation. And spill tile
register need create a new virtual register to for stride, so we can't
hoist tile spill instruction in postOptimization() of register
allocation. We can't hoist TILESTORED alone and we can't hoist the 2
instuctions together because MOV64ri will clobber some GPR. This patch
is to disble the spill merge for any spill which need 2 instructions.
Differential Revision: https://reviews.llvm.org/D93898
This patch is part of a series of patches that migrate integer
instruction costs to use InstructionCost. In the function
selectVectorizationFactor I have simply asserted that the cost
is valid and extracted the value as is. In future we expect
to encounter invalid costs, but we should filter out those
vectorization factors that lead to such invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential Revision: https://reviews.llvm.org/D92178
This reverts commit 8e3e148c
This commit fixes two issues with the original patch:
* The sanitizer build bot reported an uninitialized value. This was caused by normalizeStringIntegral not returning None on failure.
* Some build bots complained about inaccessible keypaths. To mitigate that, "this->" was added back to the keypath to restore the previous behavior.
We did not have specific costs for larger than legal truncates that were
not otherwise cheap (where they were next to stores, for example). As
MVE does not have a dedicated instruction for them (and we do not use
loads/stores yet), they should be expensive as they get expanded to a
series of lane moves.
Differential Revision: https://reviews.llvm.org/D94260
The Pseudo class sets isCodeGenOnly=1 which causes the asm strings
in the pseudos to be ignored. I think this is why the aliases are
needed at all.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D94024
PreFixupPasses better reflects when these passes will run.
A future patch will (re)introduce a PostAllocationPasses list that will run
after allocation, but before JITLinkContext::notifyResolved is called to notify
the rest of the JIT about the resolved symbol addresses.
The size of spill/reload may be unknown for scalable vector types.
When the size is unknown, print it as "Unknown-size" instead of a very
large number.
Differential Revision: https://reviews.llvm.org/D94299
Loop peeling as a last step triggers loop simplification and this
can change the loop structure. As a result all cashed values like
latch branch becomes invalid.
Patch re-structure the code to take into account the possible
changes caused by peeling.
Reviewers: dmgreen, Meinersbur, etiotto, fhahn, efriedma, bmahjour
Reviewed By: Meinersbur, fhahn
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D93686
This patch moves all but the BaseInstr to bits in TSFlags.
For the index fields, we can just use a bit to indicate their presence.
The locations of the operands are well defined.
This reduces the llc binary by about 32K on my build. It also
removes the binary search of the table from the custom inserter.
Instead we just check that the SEW op is present.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D94375
We are checking the unsafe-fp-math for sqrt but not for fpow, which behaves inconsistent.
As the direction is to remove this global option, we need to remove the unsafe-fp-math
check for sqrt and update the test with afn fast-math flags.
Reviewed By: Spatel
Differential Revision: https://reviews.llvm.org/D93891
This is a resubmit of dd6bb367 (which was reverted due to stage2 build failures in 7c63aac), with the additional restriction added to the transform to only consider outer most loops.
As shown in the added test case, ensuring LCSSA is up to date when deleting an inner loop is tricky as we may actually need to remove blocks from any outer loops, thus changing the exit block set. For the moment, just avoid transforming this case. I plan to return to this case in a follow up patch and see if we can do better.
Original commit message follows...
The basic idea is that if SCEV can prove the backedge isn't taken, we can go ahead and get rid of the backedge (and thus the loop) while leaving the rest of the control in place. This nicely handles cases with dispatch between multiple exits and internal side effects.
Differential Revision: https://reviews.llvm.org/D93906
This patch introduces a helper class SubsequentDelim to simplify loops
that generate a comma-separated lists.
For example, consider the following loop, taken from
llvm/lib/CodeGen/MachineBasicBlock.cpp:
for (auto I = pred_begin(), E = pred_end(); I != E; ++I) {
if (I != pred_begin())
OS << ", ";
OS << printMBBReference(**I);
}
The new class allows us to rewrite the loop as:
SubsequentDelim SD;
for (auto I = pred_begin(), E = pred_end(); I != E; ++I)
OS << SD << printMBBReference(**I);
where SD evaluates to the empty string for the first time and ", " for
subsequent iterations.
Unlike interleaveComma, defined in llvm/include/llvm/ADT/STLExtras.h,
SubsequentDelim can accommodate a wider variety of loops, including:
- those that conditionally skip certain items,
- those that need iterators to call getSuccProbability(I), and
- those that iterate over integer ranges.
As an example, this patch cleans up MachineBasicBlock::print.
Differential Revision: https://reviews.llvm.org/D94377
For now, `*_STANDALONE_BUILD` is set to ON even if they're built along
with LLVM because of issues mentioned in the comments. This can cause some issues.
For example, if we build OpenMP along with LLVM, we'd like to copy those OpenMP
headers to `<prefix>/lib/clang/<version>/include` such that `clang` can find
those headers without using `-I <prefix>/include` because those headers will be
copied to `<prefix>/include` if it is built standalone.
In this patch, we fixed the dependence issue in OpenMP such that it can be built
correctly even with `OPENMP_STANDALONE_BUILD=OFF`. The issue is in the call to
`add_lit_testsuite`, where `clang` and `clang-resource-headers` are passed as
`DEPENDS`. Since we're building OpenMP along with LLVM, `clang` is set by CMake
to be the C/C++ compiler, therefore these two dependences are no longer needed,
where caused the dependence issue.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D93738
This patch added `openmp` to `LLVM_ALL_RUNTIMES` so that when the CMake argument `LLVM_ENABLE_RUNTIMES=all`, OpenMP can also be built.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D94369
A severe compile-time slowdown from this call is noted in:
https://llvm.org/PR48689
My naive fix was to put it under LLVM_DEBUG ( 267ff79 ),
but that's not limiting in the way we want.
This is a quick fix (or we could just remove the call completely
and rely on some later pass to discover potentially wrong IR?).
A bigger/better fix would be to improve/limit verifyFunction()
as noted in:
https://llvm.org/PR47712
Differential Revision: https://reviews.llvm.org/D94328