findIndirectCallFunctionSamples will leave Sum uninitialized if it returns an empty vector, we don't really use Sum in this case (but we do make a copy that isn't used either) - so ensure we initialize the value to zero to at least silence the static analysis warning.
These checks are not specific to the instruction based variant of
isPotentiallyReachable(), they are equally valid for the basic
block based variant. Move them there, to make sure that switching
between the instruction and basic block variants cannot introduce
regressions.
Match whats documented in the Intel AOM (and Agner/instlatx64 agree) - vector integer multiplies are pipelined - all Port0, throughput = 2 @ 128bits, 1 @ 64bits.
Noticed while checking reduction costs - now that we can use in-order models in llvm-mca, the atom model is the "worst case scenario" we have in x86.
All the uses that we have for collectBitParts revolve around us matching down to an operation with a single root value - I don't think we're intending to change that (and a lot of collectBitParts assumes it).
The binops cases (OR/FSHL/FSHR) already check if the providers are the same, but that would still mean we waste time collecting through unaryops before getting to them.
Currently we only match bswap intrinsics from or(shl(),lshr()) style patterns when we could often match bitreverse intrinsics almost as cheaply.
Differential Revision: https://reviews.llvm.org/D90170
Reapply rG5ed56a821c06 (after reverted by rG7aa89c4a22fd) - don't take reference from struct that will be erased in X86FrameLowering::eliminateCallFramePseudoInstr
I'm also adding an explicit data layout, so we can
confirm that alignment requirements/prefs are met.
I tried to use complete/scripted CHECK lines here,
but that fails with 1 of the globals, and not sure why.
Use comesBefore() instead of performing an instruction walk. In
line with the previous implementation, instructions are considered
to reach themselves.
The system's network API is in libnetwork.so, so we explicitly need to link to
them on Haiku. This patch is similar to https://reviews.llvm.org/D97633.
Patch by Niels Reedijk. Thanks Niels!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D98405
Adds the ability to pass MCRegisterInfo to dump_pretty and to the print functions,
so that if present, target specific enums names are printed instead of enum values.
GlobalVariables are Constants, yet should not unconditionally be
considered true for __builtin_constant_p.
Via the LangRef
https://llvm.org/docs/LangRef.html#llvm-is-constant-intrinsic:
This intrinsic generates no code. If its argument is known to be a
manifest compile-time constant value, then the intrinsic will be
converted to a constant true value. Otherwise, it will be converted
to a constant false value.
In particular, note that if the argument is a constant expression
which refers to a global (the address of which _is_ a constant, but
not manifest during the compile), then the intrinsic evaluates to
false.
Move isManifestConstant from ConstantFolding to be a method of
Constant so that we can reuse the same logic in
LowerConstantIntrinsics.
pr/41459
Reviewed By: rsmith, george.burgess.iv
Differential Revision: https://reviews.llvm.org/D102367
The FixSGPRCopies pass converts instructions to VALU when
removing illegal VGPR to SGPR copies. Instructions that use SCC
are changed to use VCC instead. When that happens, the pass must
also change instructions that define SCC to define VCC.
The pass was not changing the SCC definition when an ADDC is
converted due to a input that is a VGPR to SGPR copy. But, the
initial ADD insruction, which define SCC, is not converted.
This causes a compilation failure due to a use of an undefined
physical register.
This patch adds code that inserts the SCC definition in the
MoveToVALU worklist when a SCC use is converted to a VCC use.
Differential Revision: https://reviews.llvm.org/D102111
Currently we didn't support multiple return type, we work around to use error_code to represent:
1) The dangling probe.
2) Ignore the weight of non-probe instruction
While merging the instructions' weight for the whole BB, it will filter out the error code. But If all instructions of the BB give error_code, the outside logic will mark it as a BB requiring the inference algorithm to infer its weight. This is different from the zero value which will be treated as a cold block.
Fix one place that if we can't find the FunctionSamples in the profile data which indicates the BB is cold, we choose to return zero.
Also refine the comments.
Reviewed By: hoy, wenlei
Differential Revision: https://reviews.llvm.org/D102007
Previously, we already used BatchAA for individual simple pointer
dependency queries. This extends BatchAA usage for the non-local
case, so that only one BatchAA instance is used for all blocks,
instead of one instance per block.
Use of BatchAA is safe as IR cannot be modified during a MemDep
query.
This patch adds the abstract class SystemZCallingConventionRegisters
which is a SystemZ-specific class detailing special registers used
by calling conventions on the target. SystemZELFRegisters and
SystemZXPLINK64Registers implement this class for ELF and XPLINK64
respectively.
Reviewed By: uweigand
Differential Revision: https://reviews.llvm.org/D102370
It's easy to hit 2**16 limit with i686 GNU toolchains these days.
Clang does it automagically, so it's not needed there, and the option
causes warnings about being unused when linking.
Differential Revision: https://reviews.llvm.org/D102419
This is not expected to have any practical compile-time effect,
as the alias() calls inside callCapturesBefore() are rare. This
should still be supported for API completeness, and might be
useful for reachability caching.
This patch updates the cost model for full unrolling to discount the cost of a loop invariant expression on all but one iteration. The reasoning here is that such an expression (as determined by SCEV) will be CSEd or DSEd once the loop is unrolled. Note that SCEVs reasoning will find things which could be invariant, not simply those outside the loop.
Differential Revision: https://reviews.llvm.org/D102506
As with other transforms in demanded bits, we must be careful not to
wrongly propagate nsw/nuw if we are reducing values leading up to the shift.
This bug was introduced with 1b24f35f843c and leads to the miscompile
shown in:
https://llvm.org/PR50341
Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.