llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00

History

Roman Lebedev 5d534d8259 [llvm-exegesis] Loop unrolling for loop snippet repetitor mode I really needed this, like, factually, yesterday, when verifying dependency breaking idioms for AMD Zen 3 scheduler model. Consider the following example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 } error: '' info: '' assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3 ... ``` What does it tell us? So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle? That doesn't seem right. That's even less than there are pipes supporting this type of op. Now, second example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` Now that's just worse. Due to the looping, the throughput completely plummeted, and now we can only do a single instruction/cycle!? That's not great. And final example: ``` $ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000 Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o --- mode: inverse_throughput key: instructions: - 'VPXORYrr YMM0 YMM0 YMM0' config: '' register_initial_values: [] cpu_name: znver3 llvm_triple: x86_64-unknown-linux-gnu num_repetitions: 1000000 measurements: - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 } error: '' info: '' assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3 ... ``` So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x (loop-body-size/instruction count in snippet), and run a loop with 1000 iterations over that duplicated/unrolled snippet, the measured throughput goes through the roof, up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle! Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D102522		2021-05-25 12:08:27 +03:00
..
ADT	Add a range-based wrapper for std::unique(begin, end, binary_predicate)	2021-05-24 17:26:46 -07:00
Analysis	Revert "[NPM] Do not run function simplification pipeline unnecessarily"	2021-05-21 16:38:02 -07:00
AsmParser	[SVE] Remove calls to VectorType::getNumElements from AsmParserTest	2020-07-07 14:55:42 -07:00
BinaryFormat	[BinaryFormat] Add formatv support for DW_OP constants	2020-06-08 15:27:44 +02:00
Bitcode	[AMDGPU] Set the default globals address space to 1	2020-11-20 15:46:53 +00:00
Bitstream	Switch from llvm::is_trivially_copyable to std::is_trivially_copyable	2020-12-02 22:02:48 -08:00
CodeGen	[MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo	2021-05-23 14:15:23 -07:00
DebugInfo	[MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo	2021-05-23 14:15:23 -07:00
Demangle	[demangler] Initial support for the new Rust mangling scheme	2021-05-03 16:44:30 -07:00
ExecutionEngine	[JITLink] Suppress expect-death test in release mode.	2021-05-24 22:57:10 -07:00
FileCheck	Bump googletest to 1.10.0	2021-05-14 19:16:31 +02:00
Frontend	[OpenMP][OMPIRBuilder]Adding support for `omp atomic`	2021-05-23 17:44:09 -04:00
FuzzMutate	[FuzzMutate] Add mutator to modify instruction flags.	2021-01-23 19:05:20 +00:00
InterfaceStub	[llvm] Fix ODRViolations for VersionTuple YAML specializations NFC	2020-10-20 18:29:15 -07:00
IR	[VP] make getFunctionalOpcode return an Optional	2021-05-19 17:08:34 +02:00
LineEditor
Linker	[RGT] Recode more unreachable assertions and tautologies	2021-03-19 09:17:22 -07:00
MC	[MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo	2021-05-23 14:15:23 -07:00
MI	[AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm	2021-03-03 09:33:57 +01:00
Object	[llvm-readelf] Support dumping the BB address map section with --bb-addr-map.	2021-03-08 16:20:11 -08:00
ObjectYAML	Add -Wno-error=unknown flag to clang-format.	2020-09-19 10:17:57 +02:00
Option	[clang][cli] Accept strings instead of options in ImpliedByAnyOf	2021-01-26 09:30:36 +01:00
Passes	[NFC] Format PassesBindingsTests CMake like other unittests	2021-05-18 10:40:07 -07:00
ProfileData	[CoverageMapping] Handle gaps in counter IDs for source-based coverage	2021-05-19 10:46:38 -07:00
Remarks	[Remarks] Fix error message check in unit test	2019-10-31 15:51:36 -07:00
Support	[clang][ARM] Remove non-existent arm9312 CPU	2021-05-25 08:58:24 +00:00
TableGen	Revert "Make TableGenGlobalISel an object library"	2021-03-31 13:27:00 -07:00
Target	[ARM] Remove new ARMSelectionDAGTest unittest.	2021-03-04 10:14:35 +00:00
TextAPI	[llvm][TextAPI] add mapping from OS string to Platform	2021-05-06 16:25:56 -07:00
tools	[llvm-exegesis] Loop unrolling for loop snippet repetitor mode	2021-05-25 12:08:27 +03:00
Transforms	[VPlan] Add mayReadOrWriteMemory & friends.	2021-05-24 13:11:32 +01:00
XRay	Put back the trailing commas on TYPED_TEST_SUITE	2021-05-17 14:14:13 +02:00
CMakeLists.txt	Reland [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library	2020-09-01 14:59:28 +02:00
unittest.cfg.in	Add support for unittest inputs.	2018-09-05 23:30:17 +00:00