to Pass.h.
In some compiler passes like SampleProfileLoaderPass, we want to know which
LTO/ThinLTO phase the pass is in. Currently the phase is represented in enum
class PassBuilder::ThinLTOPhase, so it is only available in PassBuilder and
it also cannot represent phase in full LTO. The patch extends it to include
full LTO phases and move it from PassBuilder.h to Pass.h, then it is much
easier for PassBuilder to communiate with each pass about current LTO phase.
Differential Revision: https://reviews.llvm.org/D94613
Current code breaks this version of MSVC due to a mismatch between `std::is_trivially_copyable` and `llvm::is_trivially_copyable` for `std::pair` instantiations. Hence I was attempting to use `std::is_trivially_copyable` to set `llvm::is_trivially_copyable<T>::value`.
I spent some time root causing an `llvm::Optional` build error on MSVC 16.8.3 related to the change described above:
```
62>C:\src\ocg_llvm\llvm-project\llvm\include\llvm/ADT/BreadthFirstIterator.h(96,12): error C2280: 'llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>>::operator =(const llvm::Optional<std::pair<std::pair<unsigned int,llvm::Graph<4>::NodeSubset> *,llvm::Optional<llvm::Graph<4>::ChildIterator>>> &)': attempting to reference a deleted function (compiling source file C:\src\ocg_llvm\llvm-project\llvm\unittests\ADT\BreadthFirstIteratorTest.cpp)
...
```
The "trivial" specialization of `optional_detail::OptionalStorage` assumes that the value type is trivially copy constructible and trivially copy assignable. The specialization is invoked based on a check of `is_trivially_copyable` alone, which does not imply both `is_trivially_copy_assignable` and `is_trivially_copy_constructible` are true.
[[ https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable | According to the spec ]], a deleted assignment operator does not make `is_trivially_copyable` false. So I think all these properties need to be checked explicitly in order to specialize `OptionalStorage` to the "trivial" version:
```
/// Storage for any type.
template <typename T, bool = std::is_trivially_copy_constructible<T>::value
&& std::is_trivially_copy_assignable<T>::value>
class OptionalStorage {
```
Above fixed my build break in MSVC, but I think we need to explicitly check `is_trivially_copy_constructible` too since it might be possible the copy constructor is deleted. Also would be ideal to move over to `std::is_trivially_copyable` instead of the `llvm` namespace verson.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D93510
The generated code for the split fp128 load/stores was missing a small yet important adjustment to the pointer metadata being fed into `getStore` and `getLoad`, making it out of sync with the effective memory address.
This problem often resulted in instructions being scheduled in the wrong order.
I also took this chance to clean up some "wrong" uses of `getAlignment` as done in D77687.
Thanks @jrtc27 for finding the problem and providing a patch.
Patch by LemonBoy and Jessica Clarke(jrtc27)
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D94345
This patch pre-commits test cases with dead stores of
existing values for D90328. It also updates a few tests that had such
stores by accident, to preserve the original spirit of those tests.
This change extends virtual unwinder to support pseudo probe in llvm-profgen. Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**Implementation**
- Added `ProbeBasedCtxKey` derived from `ContextKey` for sample counter aggregation. As we need string splitting to infer the profile for callee function, string based context introduces more string handling overhead, here we just use probe pointer based context.
- For linear unwinding, as inline context is encoded in each pseudo probe, we don't need to go through each instruction to extract range sharing same inliner. So just record the range for the context.
- For probe based context, we should ignore the top frame probe since it will be extracted from the address range. we defer the extraction in `ProfileGeneration`.
- Added `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- Some helper function to get pseduo probe info(call probe, inline context) from profiled binary.
- Added regression test for unwinder's output
The pseudo probe based profile generation will be in the upcoming patch.
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92896
As we plan to support both CSSPGO and AutoFDO for llvm-profgen, we will have different kinds of perf sample and different kinds of sample counter(cs/non-cs, with/without pseudo probe) which both need to do aggregation in hash map. This change implements the hashable interface(`Hashable`) and the unified base class for them to have better extensibility and reusability.
Currently perf trace sample and sample counter with context implemented this `Hashable` and the class hierarchy is like:
```
| Hashable
| PerfSample
| HybridSample
| LBRSample
| ContextKey
| StringBasedCtxKey
| ProbeBasedCtxKey
| CallsiteBasedCtxKey
| ...
```
- Class specifying `Hashable` should implement `getHashCode` and `isEqual`. Here we make `getHashCode` a non-virtual function to avoid vtable overhead, so derived class should calculate and assign the base class's HashCode manually. This also provides the flexibility for calculating the hash code incrementally(like rolling hash) during frame stack unwinding
- `isEqual` is a virtual function, which will have perf overhead. In the future, if we redesign a better hash function, then we can just skip this or switch to non-virtual function.
- Added `PerfSample` and `ContextKey` as base class for perf sample and counter context key, leveraging llvm-style RTTI for this.
- Added `StringBasedCtxKey` class extending `ContextKey` to use string as context id.
- Refactor `AggregationCounter` to take all kinds of `PerfSample` as key
- Refactor `ContextSampleCounter` to take all kinds of `ContextKey` as key
- Other refactoring work:
- Create a wrapper class `SampleCounter` to wrap `RangeCounter` and `BranchCounter`
- Hoist `ContextId` and `FunctionProfile` out of `populateFunctionBodySamples` and `populateFunctionBoundarySamples` to reuse them in ProfileGenerator
Differential Revision: https://reviews.llvm.org/D92584
This change implements pseudo probe decoding and disassembling for llvm-profgen/CSSPGO. Please see https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.
**ELF section format**
Please see the encoding patch(https://reviews.llvm.org/D91878) for more details of the format, just copy the example here:
Two section(`.pseudo_probe_desc` and `.pseudoprobe` ) is emitted in ELF to support pseudo probe.
The format of `.pseudo_probe_desc` section looks like:
```
.section .pseudo_probe_desc,"",@progbits
.quad 6309742469962978389 // Func GUID
.quad 4294967295 // Func Hash
.byte 9 // Length of func name
.ascii "_Z5funcAi" // Func name
.quad 7102633082150537521
.quad 138828622701
.byte 12
.ascii "_Z8funcLeafi"
.quad 446061515086924981
.quad 4294967295
.byte 9
.ascii "_Z5funcBi"
.quad -2016976694713209516
.quad 72617220756
.byte 7
.ascii "_Z3fibi"
```
For each `.pseudoprobe` section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the `.text` section). A function record has the following format :
```
FUNCTION BODY (one for each outlined function present in the text section)
GUID (uint64)
GUID of the function
NPROBES (ULEB128)
Number of probes originating from this function.
NUM_INLINED_FUNCTIONS (ULEB128)
Number of callees inlined into this function, aka number of
first-level inlinees
PROBE RECORDS
A list of NPROBES entries. Each entry contains:
INDEX (ULEB128)
TYPE (uint4)
0 - block probe, 1 - indirect call, 2 - direct call
ATTRIBUTE (uint3)
reserved
ADDRESS_TYPE (uint1)
0 - code address, 1 - address delta
CODE_ADDRESS (uint64 or ULEB128)
code address or address delta, depending on ADDRESS_TYPE
INLINED FUNCTION RECORDS
A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
callees. Each record contains:
INLINE SITE
GUID of the inlinee (uint64)
ID of the callsite probe (ULEB128)
FUNCTION BODY
A FUNCTION BODY entry describing the inlined function.
```
**Disassembling**
A switch `--show-pseudo-probe` is added to use along with `--show-disassembly` to print disassembly code with pseudo probe directives.
For example:
```
00000000002011a0 <foo2>:
2011a0: 50 push rax
2011a1: 85 ff test edi,edi
[Probe]: FUNC: foo2 Index: 1 Type: Block
2011a3: 74 02 je 2011a7 <foo2+0x7>
[Probe]: FUNC: foo2 Index: 3 Type: Block
[Probe]: FUNC: foo2 Index: 4 Type: Block
[Probe]: FUNC: foo Index: 1 Type: Block Inlined: @ foo2:6
2011a5: 58 pop rax
2011a6: c3 ret
[Probe]: FUNC: foo2 Index: 2 Type: Block
2011a7: bf 01 00 00 00 mov edi,0x1
[Probe]: FUNC: foo2 Index: 5 Type: IndirectCall
2011ac: ff d6 call rsi
[Probe]: FUNC: foo2 Index: 4 Type: Block
2011ae: 58 pop rax
2011af: c3 ret
```
**Implementation**
- `PseudoProbeDecoder` is added in ProfiledBinary as an infra for the decoding. It decoded the two section and generate two map: `GUIDProbeFunctionMap` stores all the `PseudoProbeFunction` which is the abstraction of a general function. `AddressProbesMap` stores all the pseudo probe info indexed by its address.
- All the inline info is encoded into binary as a trie(`PseudoProbeInlineTree`) and will be constructed from the decoding. Each pseudo probe can get its inline context(`getInlineContext`) by traversing its inline tree node backwards.
Test Plan:
ninja & ninja check-llvm
Differential Revision: https://reviews.llvm.org/D92334
Blocks can be laid out such that a t2WhileLoopStart branches backwards. This is forbidden by the architecture and so it fails to be converted into a low-overhead loop. This new pass checks for these cases and moves the target block, fixing any fall-through that would then be broken.
Differential Revision: https://reviews.llvm.org/D92385
See if we can remove the shuffle by resorting a HOP chain so that the HOP args are pre-shuffled.
This initial version just handles (the most common) v4i32/v4f32 hadd/hsub reduction patterns - future work can extend this to v8i16 types plus PACK chains (2f64 HADD/HSUB should already be handled in the half-lane combine code later on).
This re-lands e5553b9a6ab9 with two small fixes to the tests:
- Don't touch the source directory in debug-map-parsing.test but
instead copy everything over in a temporary directory in
timestamp-mismatch.test.
- Don't redirect stderr to stdout to avoid the output getting
intertwined in extern-alias.test.
In commit 700d2417d8281ea56dfd7ac72d1a1473d03d2d59 the CodeExtractor
was updated so that bitcasts that have lifetime markers that beginning
outside of the region are deduplicated outside the region and are not
used as an output. This caused a discrepancy in the IROutliner, where
in these cases there were arguments added to the aggregate function
that were not needed causing assertion errors.
The IROutliner queries the CodeExtractor twice to determine the inputs
and outputs, before and after `findAllocas` is called with the same
ValueSet for the outputs causing the duplication. This has been fixed
with a dummy ValueSet for the first call.
However, the additional bitcasts prevent us from using the same
similarity relationships that were previously defined by the
IR Similarity Analysis Pass. In these cases, we check whether the
initial version of the region being analyzed for outlining is still the
same as it was previously. If it is not, i.e. because of the additional
bitcast instructions from the CodeExtractor, we discard the region.
Reviewers: yroux
Differential Revision: https://reviews.llvm.org/D94303
We can fold a ? b : false to a & b if is_poison(b) implies that
is_poison(a), at which point we're able to reuse all the usual fold
on ands. In particular, this covers the very common case of
icmp X, C && icmp X, C'. The same applies to ors.
This currently only has an effect if the
-instcombine-unsafe-select-transform=0 option is set.
Differential Revision: https://reviews.llvm.org/D94550
Add support for G_FCONSTANT of FP128 (Quadruple precision) type.
It replaces the constant by emitting a load with a constant pool entry.
Reviewed By: aemerson
Differential Revision: https://reviews.llvm.org/D94437
This fixes double printing of insertion debug messages in the
legalizer.
Try to cleanup usage of observers. Currently the use of observers is
pretty hard to follow and it's not clear what is responsible for
them. Observers are referenced in 3 places:
1. In the MachineFunction
2. In the MachineIRBuilder
3. In the LegalizerHelper
The observers in the MachineFunction and MachineIRBuilder are both
called only on insertions, and are redundant with each other. The
source of the double printing was the same observer was added to both
the MachineFunction, and the MachineIRBuilder. One of these references
needs to be removed. Arguably observers in general should be fully
removed from one or the other, but it may be useful to have a local
observer in the MachineIRBuilder that is not added to the function's
observers. Alternatively, the wrapper observer could manage a local
observer in one place.
The LegalizerHelper only ever calls the observer on changing/changed
instructions, and never insertions. Logically these are two different
types of observers, for changes and for insertions.
Additionally, some places used the GISelObserverWrapper when they only
needed a single observer they could use directly.
Setting the observer in the LegalizerHelper constructor is not
flexible enough if the LegalizerHelper is constructed anywhere outside
the one used by the legalizer. AMDGPU calls the LegalizerHelper in
RegBankSelect, and needs to use a local observer to apply the regbank
to newly created instructions. Currently it accomplishes this by
constructing a local MachineIRBuilder. I'm trying to move the
MachineIRBuilder to be owned/maintained by the RegBankSelect pass
itself, but the locally constructed LegalizerHelper would reset the
observer.
Mips also has a special case use of the LegalizationArtifactCombiner
in applyMappingImpl; I think we do need to run the artifact combiner
during RegBankSelect, but in a more consistent way outside of
applyMappingImpl.
Following on from D91255, this patch is responsible for sinking relevant mul
operands to the same block so that umull/smull instructions can be correctly
generated by the mul combine implemented in the aforementioned patch.
Differential revision: https://reviews.llvm.org/D91271
canonicalizeShuffleMaskWithHorizOp currently only supports shuffles with 1 or 2 sources, but PR41813 will require us to support higher numbers of sources.
This patch just generalizes the initial setup stages to ensure all src ops are the same type and opcode and then will continue to early out if we have more than 2 sources.
This reverts commit e5553b9a6ab9f02f382a31cc5117b52c3bfaf77a.
Tests are not allowed to modify the source. Please figure out a way to
use %t rather than dynamically modifying the inputs.
For some reason some builds dont like the arrow operator access. using the deref then access should fix the issue.
/home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include/llvm/ADT/iterator.h:171:34: error: taking the address of a temporary object of type 'llvm::StringRef' [-Waddress-of-temporary]
PointerT operator->() { return &static_cast<DerivedT *>(this)->operator*(); }
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/llvm/include/llvm/ADT/StringExtras.h:387:13: note: in instantiation of member function 'llvm::iterator_facade_base<llvm::mapped_iterator<mlir::tblgen::TypeParameter *, (lambda at /home/buildbots/ppc64le-flang-mlir-rhel-test/ppc64le-flang-rhel-clang-build/llvm-project/mlir/tools/mlir-tblgen/TypeDefGen.cpp:414:19), llvm::StringRef>, std::random_access_iterator_tag, llvm::StringRef, long, llvm::StringRef *, llvm::StringRef &>::operator->' requested here
Len += I->size();
rG73a44f437bf1 result in 256-bit packss/packus ops with additional shuffles that shuffle combining can sometimes try to convert back into a truncation.
This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:
%a = <vscale x 2 x i1>
%1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a)
%2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1)
%3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2)
%4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3)
%5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4)
%6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5)
ret <vscale x 2 x i1> %6
and will be replaced with:
ret <vscale x 2 x i1> %a
Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.
Differential Revision: https://reviews.llvm.org/D94074
In places where we calculate costs using TTI.getXXXCost() interfaces
I have changed the code to use InstructionCost instead of unsigned.
The change is non functional since InstructionCost behaves in the
same way as an integer for valid costs. Currently the getXXXCost()
functions used in this file do not return invalid costs.
See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html
Differential revision: https://reviews.llvm.org/D94484
Only class declarations should be inside anonymous namespaces
(https://llvm.org/docs/CodingStandards.html#anonymous-namespaces)
Instead of using a anonymous namespace, just mark the functions in it as
static (some of them already were).
This simplifies the diff for D94486.
This reuses the code from yaml2obj (moves it to ELFYAML.h).
With it we can set the `sh_entsize` in a single place in `obj2yaml`.
Note that it also fixes a bug of `yaml2obj`: we do not
set the `sh_entsize` field for the `SHT_ARM_EXIDX` section properly.
Differential revision: https://reviews.llvm.org/D93858
The isVMOVNOriginalMask was previously only checking for two input
shuffles that could be better expanded as vmovn nodes. This expands that
to single input shuffles that will later be legalized to multiple
vectors.
Differential Revision: https://reviews.llvm.org/D94189
Currently we don't support multiple SHT_SYMTAB_SHNDX sections
and the DT_SYMTAB_SHNDX tag currently.
This patch implements it and fixes the
https://bugs.llvm.org/show_bug.cgi?id=43991.
I had to introduce the `struct DataRegion` to ELF.h,
it is used to represent a region that might have no known size.
It is needed, because we don't know the size of the extended
section indices table when it is located via DT_SYMTAB_SHNDX.
In this case we still want to validate that we don't read
past the end of the file.
Differential revision: https://reviews.llvm.org/D92923