TypePromotion is meant to be a generic pass and doesn't reference
any ARM intrinsics so it shouldn't include IntrinsicsARM.h.
The other Intrinsic related headers appear to be unneeded as well.
This is a more general alternative/extension to D102635. Rather than
handling the special case of "header exit with non-exiting latch",
this unrolls against the smallest exact trip count from any exit.
The latch exit is no longer treated as priviledged when it comes to
full unrolling.
The motivating case is in full-unroll-one-unpredictable-exit.ll.
Here the header exit is an IV-based exit, while the latch exit is
a data comparison. This kind of loop does not get rotated, because
the latch is already exiting, and loop rotation doesn't try to
distinguish IV-based/analyzable latches.
Differential Revision: https://reviews.llvm.org/D102982
DSE will currently only remove stores in the same block unless they can
be guaranteed to be loop invariant. This expands that to any stores that
are in the same Loop, at the same loop level. This should still account
for where AA/MSSA will not handle aliasing between loops, but allow the
dead stores to be removed where they overlap in the same loop iteration.
It requires adding loop info to DSE, but that looks fairly harmless.
The test case this helps is from code like this, which can come up in
certain matrix operations:
for(i=..)
dst[i] = 0;
for(j=..)
dst[i] += src[i*n+j];
After LICM, this becomes:
for(i=..)
dst[i] = 0;
sum = 0;
for(j=..)
sum += src[i*n+j];
dst[i] = sum;
The first store is dead, and with this patch is now removed.
Differntial Revision: https://reviews.llvm.org/D100464
These all (and some others) are being affected by D104597,
but they are manually-written, which rather complicates
checking the effect that change has on them.
Now that FileCheck eagerly complains when prefixes are unused,
the update script does the same, and is becoming very common
to need to drop some prefixes, yet figuring out the file
it complains about isn't obvious unless it actually tells us.
This problem is exposed by D104598, after it tail-merges `ret` in
`@test_inline_constraint_S_label`, the verifier would start complaining
`invalid operand for inline asm constraint 'S'`.
Essentially, taking address of a block is mismodelled in IR.
It should probably be an explicit instruction, a first one in block,
that isn't identical to any other instruction of the same type,
so that it can't be hoisted.
If the outer add has an simm12 immediate operand we should prefer
it instead of materializing it in a register. This would guarantee
and extra instruction and temporary register. Since we don't check
one use on the shl or zext we might generate more instructions if
there is an additional user.
- Distinct metadata needs generating in the codegen to attach correct
AAInfo on the loads/stores after lowering, merging, and other relevant
transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
which provides basic interfaces to create/retrive metadata slots. In
addition, once LLVM IR is processsed, additional hooks are also
introduced to help collect machine metadata and assign them slot
numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
an additional 'machineMetadataNodes' field containing all the
definition of those nodes.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D103205
- Take the same principle as the conversion from f64 to i64 with extra
necessary pre- and post-processing. It helps to reduce that conversion
sequence by half compared to legacy one.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D104427
D97225 moved LazyCallGraph verify() calls behind EXPENSIVE_CHECKS,
but verity() is defined for debug builds only so this had the unintended
effect of breaking release builds with EXPENSIVE_CHECKS.
Fix by enabling verify() for both debug and EXPENSIVE_CHECKS.
Differential Revision: https://reviews.llvm.org/D104514
getFramePointerReg only depends on information in ARMSubtarget,
so move it in there so it can be accessed from more places.
Make use of ARMSubtarget::getFramePointerReg to remove duplicated code.
The main use of useR7AsFramePointer is getFramePointerReg, so inline it.
Differential Revision: https://reviews.llvm.org/D104476
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.
Differential Revision: https://reviews.llvm.org/D104487
This patch adds an optional PriorityInlineOrder, which uses the heap to order inlining.
The callsite which size is smaller would have a higher priority.
Reviewed By: mtrofin
Differential Revision: https://reviews.llvm.org/D104028
This patch was derived from Valentin Churavy's work in
https://reviews.llvm.org/D104480. It adds support for setting the transform on
an IRTransformLayer, and for accessing the IRTransformLayer in LLJIT. It also
adds access to the ThreadSafeModule::withModuleDo method for thread-safe
access to modules.
A new example has been added to show how to use these APIs to optimize a module
during materialization.
Thanks Valentin!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D103855
On ELF, the D1003372 optimization can apply to more cases. There are two
prerequisites for making `__profd_` private:
* `__profc_` keeps `__profd_` live under compiler/linker GC
* `__profd_` is not referenced by code
The first is satisfied because all counters/data are in a section group (either
`comdat any` or `comdat noduplicates`). The second requires that the function
does not use value profiling.
Regarding the second point: `__profd_` may be referenced by other text sections
due to inlining. There will be a linker error if a prevailing text section
references the non-prevailing local symbol.
With this change, a stage 2 (`-DLLVM_TARGETS_TO_BUILD=X86 -DLLVM_BUILD_INSTRUMENTED=IR`)
clang is 4.2% smaller (1-169620032/177066968).
`stat -c %s **/*.o | awk '{s+=$1}END{print s}' is 2.5% smaller.
Reviewed By: davidxl, rnk
Differential Revision: https://reviews.llvm.org/D103717
We were not reporting isFNegFree for v2f32, although it is effectively
free after legalization. The generic combine was pulling fneg out of
the fma source operands, and the AMDGPU combine was doing the
opposite.
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.
I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104477
Remove dependence on ULO.TripCount/ULO.TripMultiple from ORE and
debug code. For debug code, print information about all exits.
For optimization remarks, only include the unroll count and the
type of unroll (complete, partial or runtime), but omit detailed
information about exit folding, now that more than one exit may
be folded.
Differential Revision: https://reviews.llvm.org/D104482
We were using 0 as an indicator of invalid offset when computing disjoint ranges. In reality, 0 can be an valid code offset which stands for the first function in .text section. I'm using UINT64_MAX as an invalid code offset instead.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104497
Previously we went directly to unknown state on VTYPE mismatch.
If we instead remember the partial match, we can use this to
still use X0, X0 vsetvli in successors if AVL and needed SEW/LMUL
ratio match.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D104069
If we have seen an inwards transition from external code to internal code, but not a following outwards transition, the inwards transition is likely due to interrupt which is usually unpaired. Ignore current and subsequent entries since they are likely from an unrelated pre-interrupt context.
LBR records from different interrupt context are unrelated and they should not be mixed together. Currenlty the OS does this for task-scheduling interrupt but not for all interrupts.
Reviewed By: wenlei, wlei
Differential Revision: https://reviews.llvm.org/D104276
Implemented the transformation of xor (llvm.amdgcn.class x, mask), -1 into
llvm.amdgcn.class(x, ~mask). Added LIT tests as well.
Differential Revision: https://reviews.llvm.org/D104049
We were using a `StringMap` object to store all profiles to be emitted. The object is basically an unordered hash table, therefore updating it in the process of trasvering it may cause issue since the underlying bucket array could change.
I'm also moving the `csspgo-preinliner` switch around so that no context tri will be constructed (by the constructor of `CSPreInliner`) when the switch is off.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D104267
These other platforms are unsupported and untested.
They could be re-added later based on MSan code.
Reviewed By: gbalats, stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D104481