1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 20:23:11 +01:00
llvm-mirror/tools
Roman Lebedev 5d534d8259 [llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
..
bugpoint Avoid shuffle self-assignment in EXPENSIVE_CHECKS builds 2021-03-10 11:17:34 +00:00
bugpoint-passes
dsymutil [dsymutil] Emit an error when the Mach-O exceeds the 4GB limit. 2021-05-24 16:29:06 -07:00
gold [gold] Match lld WPD behavior for shared library symbols and add test 2021-02-17 15:28:49 -08:00
llc Recommit "[VP,Integer,#2] ExpandVectorPredication pass" 2021-05-04 11:47:52 +02:00
lli [lli] Honor the --entry-function flag in orc and orc-lazy modes. 2021-04-13 11:33:24 -07:00
llvm-ar [NFC] Reordering parameters in getFile and getFileOrSTDIN 2021-03-25 09:47:49 -04:00
llvm-as llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-as-fuzzer
llvm-bcanalyzer llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-c-test LLVM-C: Allow LLVM{Get/Set}Alignment on an atomicrmw/cmpxchg instruction. 2021-02-12 18:31:18 -05:00
llvm-cat [tools] Use llvm::append_range (NFC) 2021-01-05 21:15:56 -08:00
llvm-cfi-verify [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-config [MinGW] Use lib prefix for libraries 2020-09-12 22:01:29 +03:00
llvm-cov [Coverage] Support overriding compilation directory 2021-05-11 15:26:45 -07:00
llvm-cvtres [llvm-cvtres] Reduce the set of dependencies of llvm-cvtres. NFC. 2021-04-21 11:50:10 +03:00
llvm-cxxdump llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-cxxfilt [demangler] Initial support for the new Rust mangling scheme 2021-05-03 16:44:30 -07:00
llvm-cxxmap [Support] Don't include VirtualFileSystem.h in CommandLine.h 2021-04-21 10:19:01 -04:00
llvm-diff Switch from llvm::is_trivially_copyable to std::is_trivially_copyable 2020-12-02 22:02:48 -08:00
llvm-dis Allow llvm-dis to disassemble multiple files 2021-05-06 11:08:55 -07:00
llvm-dwarfdump [NFC][llvm-dwarfdump] Avoid passing std::string by value in collectStatsForDie() 2021-05-12 01:29:37 -07:00
llvm-dwp [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-elfabi [llvm-elfabi] Add flag to preserve timestamp when output is the same 2020-12-29 20:27:06 -08:00
llvm-exegesis [llvm-exegesis] Loop unrolling for loop snippet repetitor mode 2021-05-25 12:08:27 +03:00
llvm-extract llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-go
llvm-gsymutil Add option to llvm-gsymutil to read addresses from stdin. 2021-05-20 06:10:35 +00:00
llvm-ifs [SystemZ][z/OS] Add IsText Argument to GetFile and GetFileOrSTDIN 2021-04-16 10:08:36 -04:00
llvm-isel-fuzzer [AIX] Turn -fdata-sections on by default in Clang 2020-10-14 15:58:31 +00:00
llvm-itanium-demangle-fuzzer
llvm-jitlink [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-jitlistener [MCJIT] Profile the code generated by MCJIT engine using Intel VTune profiler 2020-11-16 19:28:14 +11:00
llvm-libtool-darwin [Support] Don't include VirtualFileSystem.h in CommandLine.h 2021-04-21 10:19:01 -04:00
llvm-link NFC: Run clang-format over llvm-link. 2021-04-28 14:33:00 -07:00
llvm-lipo [TextAPI] move source code files out of subdirectory, NFC 2021-04-05 10:24:42 -07:00
llvm-lto Recommit "[LTO] Use lto::backend for code generation." 2021-02-15 10:05:42 +00:00
llvm-lto2 Don't use $ as suffix for symbol names in ThinLTOBitcodeWriter and other places 2021-03-29 13:03:52 +02:00
llvm-mc [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-mc-assemble-fuzzer [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-mc-disassemble-fuzzer
llvm-mca [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-microsoft-demangle-fuzzer
llvm-ml [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-modextract llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-mt llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-nm [llvm-nm] Support the -V option, print that the tool is compatible with GNU nm 2021-05-13 22:36:25 +03:00
llvm-objcopy [llvm-strip] Add support for '--' for delimiting options from input files 2021-05-20 03:33:51 -07:00
llvm-objdump [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-opt-fuzzer [NewPM] Hide pass manager debug logging behind -debug-pass-manager-verbose 2021-05-07 21:51:47 -07:00
llvm-opt-report [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text 2021-04-06 07:23:31 -04:00
llvm-pdbutil Removed redundant code. 2021-04-07 05:37:46 +04:00
llvm-profdata [CSSPGO][llvm-profdata] Support trimming cold context when merging profiles 2021-04-22 00:42:37 -07:00
llvm-profgen [NFC][CSSPGO]llvm-profge] Fix Build warning dueo to an attrbute usage. 2021-05-24 12:59:02 -07:00
llvm-rc [llvm-rc] Add a GNU windres-like frontend to llvm-rc 2021-04-26 22:04:29 +03:00
llvm-readobj [AMDGPU] Add gfx1034 target 2021-05-13 14:25:18 -04:00
llvm-reduce [llvm-reduce] Don't unset dso_local on implicitly dso_local GVs 2021-04-30 11:57:22 -07:00
llvm-rtdyld [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
llvm-rust-demangle-fuzzer Add fuzzer for Rust demangler 2021-05-05 12:50:50 -07:00
llvm-shlib [CMake][ELF] Link libLLVM.so and libclang-cpp.so with -Bsymbolic-functions 2021-05-13 13:44:57 -07:00
llvm-size [llvm-cov] Use is_contained (NFC) 2020-12-27 09:57:25 -08:00
llvm-special-case-list-fuzzer
llvm-split [LTO] Update splitCodeGen to take a reference to the module. (NFC) 2021-01-29 11:53:11 +00:00
llvm-stress Avoid shuffle self-assignment in EXPENSIVE_CHECKS builds 2021-03-10 11:17:34 +00:00
llvm-strings llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-symbolizer [llvm-symbolizer] Place Mach-O options into the Mach-O option group. 2021-05-12 12:04:54 +01:00
llvm-undname llvmbuildectomy - replace llvm-build by plain cmake 2020-11-13 10:35:24 +01:00
llvm-xray [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text 2021-04-06 07:23:31 -04:00
llvm-yaml-numeric-parser-fuzzer [llvm] NFC: Cleanup llvm-yaml-numeric-parser-fuzzer 2021-02-15 14:52:53 +01:00
llvm-yaml-parser-fuzzer [llvm] Use llvm::erase_value and llvm::erase_if (NFC) 2021-01-02 09:24:15 -08:00
lto [LTO][Legacy] Decouple option parsing from LTOCodeGenerator 2021-03-31 16:43:26 +00:00
msbuild
obj2yaml Reland: "[lld][WebAssembly] Initial support merging string data" 2021-05-10 16:03:38 -07:00
opt [NewPM] Add options to PrintPassInstrumentation 2021-05-18 20:59:35 -07:00
opt-viewer
remarks-shlib [tools][remarks-shlib] Don't build libRemarks.so without PIC 2020-09-20 12:40:21 +02:00
sancov [MC] Refactor MCObjectFileInfo initialization and allow targets to create MCObjectFileInfo 2021-05-23 14:15:23 -07:00
sanstats [NFC] Reordering parameters in getFile and getFileOrSTDIN 2021-03-25 09:47:49 -04:00
split-file [Support] Don't include VirtualFileSystem.h in CommandLine.h 2021-04-21 10:19:01 -04:00
verify-uselistorder [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text 2021-04-06 07:23:31 -04:00
vfabi-demangle-fuzzer
xcode-toolchain
yaml2obj [llvm] Make obj2yaml and yaml2obj LLVM utilities instead of tools 2020-10-19 10:21:21 -07:00
CMakeLists.txt