For most tables, we already use commas in headers. This set of patches
unifies dumping the remaining ones.
Differential Revision: https://reviews.llvm.org/D80806
For most tables, we already use commas in headers. This set of patches
unifies dumping the remaining ones.
Differential Revision: https://reviews.llvm.org/D80806
For most tables, we already use commas in headers. This set of patches
unifies dumping the remaining ones.
Differential Revision: https://reviews.llvm.org/D80806
Partially reverts feee98645dde4be31a70cc6660d2fc4d4b9d32d8.
Add explicit braces to a different place to fix
"error: add explicit braces to avoid dangling else [-Werror,-Wdangling-else]"
This is a reimplementation of the `orderNodes` function, as the old
implementation didn't take into account all cases.
The new implementation uses SCCs instead of Loops to take account of
irreducible loops.
Fix PR41509
Differential Revision: https://reviews.llvm.org/D79037
This improves the next points for broken hash tables:
1) Use reportUniqueWarning to prevent duplication when
--hash-table and --elf-hash-histogram are used together.
2) Dump nbuckets and nchain fields. It is often possible
to dump them even when the table itself goes past the EOF etc.
Differential revision: https://reviews.llvm.org/D80373
When a stack offset was too big to materialize in a single instruction, we were
trying to do it in stages:
adds xD, sp, #imm
adds xD, xD, #imm
Unfortunately, if xD is xzr then the second instruction doesn't exist and
wouldn't do what was needed if it did. Instead we can use a temporary register
for all but the last addition.
Relying on the find method implies a roundtrip to the iterator world, which is
not costless because iterator creation involves a few check to ensure the
iterator is in a valid position (through the SmallPtrSetIteratorImpl::AdvanceIfNotValid
method). It turns out that the result of SmallPtrSetImpl::find_imp is either
valid or the EndPointer, so there's no need to go through that abstraction,
and the compiler cannot guess it.
Differential Revision: https://reviews.llvm.org/D80708
Summary: Exploit vabsd* for for absolute difference of vectors on P9,
for example:
void foo (char *restrict p, char *restrict q, char *restrict t)
{
for (int i = 0; i < 16; i++)
t[i] = abs (p[i] - q[i]);
}
this case should be matched to the HW instruction vabsdub.
Reviewed By: steven.zhang
Differential Revision: https://reviews.llvm.org/D80271
Previously we walked the users of any vector binop looking for
more binops with the same opcode or phis that eventually ended up
in a reduction. While this is simple it also means visiting the
same nodes many times since we'll do a forward walk for each
BinaryOperator in the chain. It was also far more general than what
we have tests for or expect to see.
This patch replaces the algorithm with a new method that starts at
extract elements looking for a horizontal reduction. Once we find
a reduction we walk through backwards through phis and adds to
collect leaves that we can consider for rewriting.
We only consider single use adds and phis. Except for a special
case if the Add is used by a phi that forms a loop back to the
Add. Including other single use Adds to support unrolled loops.
Ultimately, I want to narrow the Adds, Phis, and final reduction
based on the partial reduction we're doing. I still haven't
figured out exactly what that looks like yet. But restricting
the types of graphs we expect to handle seemed like a good first
step. As does having all the leaves and the reduction at once.
Differential Revision: https://reviews.llvm.org/D79971
This matches what we do for the full sized vector ops at the start of combineX86ShufflesRecursively, and helps getFauxShuffleMask extract more INSERT_SUBVECTOR patterns.
I inverted the mask when I ported to the new form of G_PTRMASK in
8bc03d2168241f7b12265e9cd7e4eb7655709f34.
I don't think this really broke anything, since G_VASTART isn't
handled for types with an alignment higher than the stack alignment.
As discussed in PR45951:
https://bugs.llvm.org/show_bug.cgi?id=45951
There's a potential name collision between update_test_checks.py and -instnamer
and/or manually-generated IR test files because all of them try to use the
variable name that should never be used: "tmp".
This patch proposes to reduce the odds of collision and adds a warning if we
detect the problem. This will cause regression test churn when regenerating
CHECK lines on existing files.
Differential Revision: https://reviews.llvm.org/D80584
Goes with proposal in D80885.
This is adapted from the InstCombine tests that were added for
D50992
But these should be adjusted further to provide more interesting
scenarios for x86-specific codegen. Eg, vector types/sizes will
have different costs depending on ISA attributes.
We also need to add tests that include a load of the scalar
variable and add tests that include extra uses of the insert
to further exercise the cost model.
Try to prevent future node creation issues (as detailed in PR45974) by making the SelectionDAG reference const, so it can still be used for analysis, but not node creation.
As detailed on PR45974 and D79987, getFauxShuffleMask is creating nodes on the fly to create shuffles with inputs the same size as the result, causing problems for hasOneUse() checks in later simplification stages.
Currently only combineX86ShufflesRecursively benefits from these widened inputs so I've begun moving the functionality there, and out of getFauxShuffleMask. This allows us to remove the widening from VBROADCAST and *EXTEND* faux shuffle cases.
This just leaves the INSERT_SUBVECTOR case in getFauxShuffleMask still creating nodes, which will require more extensive refactoring.
In some cases ScheduleDAGRRList has to add new nodes to resolve problems
with interfering physical registers. When new nodes are added, it
completely re-computes the topological order, which can take a long
time, but is unnecessary. We only add nodes one by one, and initially
they do not have any predecessors. So we can just insert them at the end
of the vector. Later we add predecessors, but the helper function
properly updates the topological order much more efficiently. With this
change, the compile time for the program below drops from 300s to 30s on
my machine.
define i11129 @test1() {
%L1 = load i11129, i11129* undef
%B30 = ashr i11129 %L1, %L1
store i11129 %B30, i11129* undef
ret i11129 %L1
}
This should be generally beneficial, as we can skip a large amount of
work. Theoretically there are some scenarios where we might not safe
much, e.g. when we add a dependency between the first and last node.
Then we would have to shift all nodes. But we still do not have to spend
the time re-computing the initial order.
Reviewers: MatzeB, atrick, efriedma, niravd, paquette
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D59722
We already had a DAG combine for (mmx (bitconvert (i64 (extractelement v2i64))))
to MOVDQ2Q.
Remove patterns for MMX_MOVQ2DQrr/MMX_MOVDQ2Qrr that use
scalar_to_vector/extractelement involving i64 scalar type with
v2i64 and x86mmx.
This code was repeated in two callers of CommitTargetLoweringOpt.
But CommitTargetLoweringOpt is also called from TargetLowering.
We should print a message for those calls to. So sink the
repeated code into CommitTargetLoweringOpt to catch those calls.
As noticed by dblaikie.
I don't know what code paths using reportError can cause stdout output
to be interleaved with stderr, so no test is added now.
Also drop an unneeded use of errs().fflush() in reportWarning().
I requested this in D64165.
The instruction is defined to only produce high result if both
destinations are the same. We can exploit this to avoid
unnecessarily clobbering a register.
In order to hide this from register allocation we use a pseudo
instruction and expand the result during MCInst creation.
Differential Revision: https://reviews.llvm.org/D80500
-Replace some ifs that should be impossible with asserts.
-Use X86::AddrDisp and X86::AddrNumOperands to make code more readable
-Use X86II::isKMasked/isKMergeMasked to do some operand skipping to remove or simplify switches
rG7873376bb36b fixes a build failure for allyesconfig.
The problem happened when the single exiting block doesn't dominate the
loop latch, then the immediate dominator of the exit block should not be
the exiting block after unrolling. As the exiting block of
different unrolled iteration can branch to the exit block, and the ith
exiting block doesn't dominate (i+1)th exiting block, the immediate
dominator of the exit block should not the nearest common dominator of
the exiting block and the loop latch of the same iteration.
Differential Revision: https://reviews.llvm.org/D80477