1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00
llvm-mirror/unittests
Roman Lebedev 5d534d8259 [llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
..
ADT Add a range-based wrapper for std::unique(begin, end, binary_predicate) 2021-05-24 17:26:46 -07:00
Analysis Revert "[NPM] Do not run function simplification pipeline unnecessarily" 2021-05-21 16:38:02 -07:00
AsmParser [SVE] Remove calls to VectorType::getNumElements from AsmParserTest 2020-07-07 14:55:42 -07:00
BinaryFormat [BinaryFormat] Add formatv support for DW_OP constants 2020-06-08 15:27:44 +02:00
Bitcode [AMDGPU] Set the default globals address space to 1 2020-11-20 15:46:53 +00:00
Bitstream Switch from llvm::is_trivially_copyable to std::is_trivially_copyable 2020-12-02 22:02:48 -08:00
CodeGen [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
DebugInfo [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
Demangle [demangler] Initial support for the new Rust mangling scheme 2021-05-03 16:44:30 -07:00
ExecutionEngine [JITLink] Suppress expect-death test in release mode. 2021-05-24 22:57:10 -07:00
FileCheck Bump googletest to 1.10.0 2021-05-14 19:16:31 +02:00
Frontend [OpenMP][OMPIRBuilder]Adding support for omp atomic 2021-05-23 17:44:09 -04:00
FuzzMutate [FuzzMutate] Add mutator to modify instruction flags. 2021-01-23 19:05:20 +00:00
InterfaceStub [llvm] Fix ODRViolations for VersionTuple YAML specializations NFC 2020-10-20 18:29:15 -07:00
IR [VP] make getFunctionalOpcode return an Optional 2021-05-19 17:08:34 +02:00
LineEditor
Linker [RGT] Recode more unreachable assertions and tautologies 2021-03-19 09:17:22 -07:00
MC [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
MI [AMDGPU] Rename amdgcn_wwm to amdgcn_strict_wwm 2021-03-03 09:33:57 +01:00
Object [llvm-readelf] Support dumping the BB address map section with --bb-addr-map. 2021-03-08 16:20:11 -08:00
ObjectYAML Add -Wno-error=unknown flag to clang-format. 2020-09-19 10:17:57 +02:00
Option [clang][cli] Accept strings instead of options in ImpliedByAnyOf 2021-01-26 09:30:36 +01:00
Passes [NFC] Format PassesBindingsTests CMake like other unittests 2021-05-18 10:40:07 -07:00
ProfileData [CoverageMapping] Handle gaps in counter IDs for source-based coverage 2021-05-19 10:46:38 -07:00
Remarks [Remarks] Fix error message check in unit test 2019-10-31 15:51:36 -07:00
Support [clang][ARM] Remove non-existent arm9312 CPU 2021-05-25 08:58:24 +00:00
TableGen Revert "Make TableGenGlobalISel an object library" 2021-03-31 13:27:00 -07:00
Target [ARM] Remove new ARMSelectionDAGTest unittest. 2021-03-04 10:14:35 +00:00
TextAPI [llvm][TextAPI] add mapping from OS string to Platform 2021-05-06 16:25:56 -07:00
tools [llvm-exegesis] Loop unrolling for loop snippet repetitor mode 2021-05-25 12:08:27 +03:00
Transforms [VPlan] Add mayReadOrWriteMemory & friends. 2021-05-24 13:11:32 +01:00
XRay Put back the trailing commas on TYPED_TEST_SUITE 2021-05-17 14:14:13 +02:00
CMakeLists.txt Reland [FileCheck] Move FileCheck implementation out of LLVMSupport into its own library 2020-09-01 14:59:28 +02:00
unittest.cfg.in Add support for unittest inputs. 2018-09-05 23:30:17 +00:00