This is a simple patch that adds constant folding for freeze
instruction.
IIUC, it isn't needed to update ConstantFold.cpp because there is no freeze
constexpr.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84597
This is a simple patch that makes canCreateUndefOrPoison use
Instruction::isBinaryOp because BinaryOperator inherits Instruction.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D84596
It doesn't really need to know where Timings are stored, it just needs
to be able to sort them, so MutableArrayRef is enough.
That uncovers an interesting quirk that it relied on
implicit double->int conversion for calculating percentiles.
When running multiple shards, don't include skipped tests in the xunit
output since merging the files will result in duplicates.
In our CHERI Jenkins CI, I configured the libc++ tests to run using sharding
(since we are testing using a single-CPU QEMU). We then merge the generated
XUnit xml files to produce a final result, but if the individual XMLs
report tests excluded due to sharding each test is included N times in the
final result. This also makes it difficult to find the tests that were
skipped due to missing REQUIRES: etc.
Reviewed By: yln
Differential Revision: https://reviews.llvm.org/D84235
The switch in emitNop uses 64-bit registers for nops exceeding
2 bytes. This isn't valid outside 64-bit mode. We could fix this
easily enough, but there are no users that ask for more than 2
bytes outside 64-bit mode.
Inlining the method to make the coupling between the two methods
more explicit.
Summary:
In parallelizeChainedStores, a TokenFactor was created with the size greater than 3000.
We found that DAGCombiner::visitTokenFactor will consume a huge amount of time on
such nodes. Since the number of operands already exceeds TokenFactorInlineLimit, we propose
to give up simplification with the consideration of compile time.
Reviewers:
@spatel, @arsenm
Differential Revision:
https://reviews.llvm.org/D84204
I mixed up the precedence of operators in the assert and thought I
had it right since there was no compiler warning. This just
adds the parentheses in the expression as needed.
We can happily turn function definitions into declarations,
thus obscuring their argument from being elided by this pass.
I don't believe there is a good reason to just ignore declarations.
likely even proper llvm intrinsics ones,
at worst the input becomes uninteresting.
The other question here is that all these transforms are all-or-nothing.
In some cases, should we be treating each use separately?
The main blocker here seemed to be that llvm::CloneFunctionInto()
does `&OldFunc->front()`, which inserts a nullptr into a densemap,
which is not happy about it and asserts.
replaceFunctionCalls() is very non-exhaustive, it only handles
CallInst's. Which means, by the time we drop old function,
there may still be uses of it lurking around.
Let's instead whack-a-mole them by all by replacing with undef.
I'm not sure this is the best handling, especially for calls, but IMO
poorly reduced input is much better than crashing reduction tool.
A (previously-crashing!) test added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46819
Terminator may have returned value, so we need to replace uses,
and in general handle invoke as a branch inst.
I'm not sure this is the best handling, but IMO poorly reduced
input is much better than crashing reduction tool.
A (previously-crashing!) test added.
Fixes https://bugs.llvm.org/show_bug.cgi?id=46818
Subclasses will commonly gather that information from a remote during
construction, in which case they won't have meaningful values to pass to
TargetProcessControl's constructor.
(Disabled under flag for the moment)
This is part of a larger project wherein we are finally integrating lowering of gc live operands with the register allocator. Today, we force spill all operands in SelectionDAG. The code to do so is distinctly non-optimal. The approach this patch is working towards is to instead lower the relocations directly into the MI form, and let the register allocator pick which ones get spilled and which stack slots they get spilled to. In terms of performance, the later part is actually more important as it avoids redundant shuffling of values between stack slots.
This particular change adds ISEL support to produce the variadic def STATEPOINT form required by the above. In particular, the first N are lowered to variadic tied def/use pairs. So new statepoint looks like this:
reloc1,reloc2,... = STATEPOINT ..., base1, derived1<tied-def0>, base2, derived2<tied-def1>, ...
N is limited by the maximal number of tied registers machine instruction can have (15 at the moment).
The current patch is restricted to handling relocations within a single basic block. Cross block relocations (e.g. invokes) are handled via the legacy mechanism. This restriction will be relaxed in future patches.
Patch By: dantrushin
Differential Revision: https://reviews.llvm.org/D81648
ReduceFunctions could do it, but it also replaces *all* calls with undef,
so if any of undef replacements makes reduction uninteresting,
it won't work.
ReduceBasicBlocks also could do it, but well, it may take many guesses
for all the blocks of a function to happen to be out-of-chunk,
which is not a very efficient way to go about it.
So let's just do this first.
Reapply with DTU update moved after CFG update, which is a
requirement of the API.
-----
Non-feasible control-flow edges are currently removed by replacing
the branch condition with a constant and then calling
ConstantFoldTerminator. This happens in a rather roundabout manner,
by inspecting the users (effectively: predecessors) of unreachable
blocks, and further complicated by the need to explicitly materialize
the condition for "forced" edges. I would like to extend SCCP to
discard switch conditions that are non-feasible based on range
information, but this is incompatible with the current approach
(as there is no single constant we could use.)
Instead, this patch explicitly removes non-feasible edges. It
currently only needs to handle the case where there is a single
feasible edge. The llvm_unreachable() branch will need to be
implemented for the aforementioned switch improvement.
Differential Revision: https://reviews.llvm.org/D84264
This patch updates IPSCCP to drop argmemonly and
inaccessiblemem_or_argmemonly if it replaces a pointer argument.
Fixes PR46717.
Reviewers: efriedma, davide, nikic, jdoerfert
Reviewed By: efriedma, jdoerfert
Differential Revision: https://reviews.llvm.org/D84432
If we don't care about an entire LHS/RHS of the PACK op, then can just treat it the same as undef (we don't care if it saturates) and is safe to treat as a shuffle.
This can happen if we attempt to decode as a faux shuffle before SimplifyDemandedVectorElts has been called on the PACK which should replace the source with UNDEF entirely.
Adds a range-based version of `std::move`, the version that moves a range, not the one that creates r-value references.
Reviewed By: dblaikie, gamesh411
Differential Revision: https://reviews.llvm.org/D83902
Very minor code size improvements (hits 8 times in Bullet at -O3), but still
something.
Also very minor NFC change to make sure we only search for a 0 constant when
selecting a store. Before, we'd do this for loads as well.
Differential Revision: https://reviews.llvm.org/D84573
Function entry count might be zero after the profile counts reset and
before reentry to the function.
Zero profile entry count is very bad as the profile count from BFI will
be wrong.
A simple fix is to set the profile entry count to 1 if there are
non-zero profile counts in this function.
Differential Revision: https://reviews.llvm.org/D84378
Skip profile count promotion if any of the ExitBlocks contains a ret
instruction. This is to prevent dumping of incomplete profile -- if the
the loop is a long running loop and dump is called in the middle
of the loop, the result profile is incomplete.
ExitBlocks containing a ret instruction is an indication of a long running
loop -- early exit to error handling code.
Differential Revision: https://reviews.llvm.org/D84379