1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00
llvm-mirror/lib
Tom Stellard c3f6130f41 AMDGPU/SI: Better handle s_wait insertion
We can wait on either VM, EXP or LGKM.
The waits are independent.

Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.

Here's an example of subtle perf reduction this patch solves:

This is without the patch:

buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen

The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.

Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.

Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.

Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.

Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.

Patch by: Axel Davy

Differential Revision: http://reviews.llvm.org/D11883

llvm-svn: 245755
2015-08-21 22:47:27 +00:00
..
Analysis [LVI] Use a SmallVector instead of SmallPtrSet. NFC 2015-08-21 21:18:26 +00:00
AsmParser AsmParser: Save and restore the parsing state for types using SlotMapping. 2015-08-21 21:32:39 +00:00
Bitcode [IR] Give catchret an optional 'return value' operand 2015-08-15 02:46:08 +00:00
CodeGen Range-for-ify some things in GlobalMerge 2015-08-21 22:19:06 +00:00
DebugInfo Fix some comment typos. 2015-08-08 18:27:36 +00:00
ExecutionEngine [RuntimeDyld] Make sure code-sections aren't under-aligned. 2015-08-14 06:26:42 +00:00
Fuzzer Fix missing space in libfuzzer's help text. 2015-08-12 20:00:10 +00:00
IR [opaque pointer types] Push the passing of value types up from Function/GlobalVariable to GlobalObject 2015-08-21 21:35:28 +00:00
IRReader
LibDriver There is only one saver of strings. 2015-08-13 01:07:02 +00:00
LineEditor
Linker Linker: Remove empty destructor. 2015-08-21 04:51:24 +00:00
LTO LTO: Simplify ownership of LTOCodeGenerator::TargetMach. 2015-08-21 04:45:57 +00:00
MC Fix symbol value computation when part of the expression is weak. 2015-08-20 16:18:30 +00:00
Object Convert getSymbolSection to return an ErrorOr. 2015-08-07 23:27:14 +00:00
Option Add an ArgList::AddAllArgs that accepts a vector of OptSpecifier. 2015-07-29 17:34:41 +00:00
Passes [PM/AA] Remove the last relics of the separate IPA library from LLVM, 2015-08-18 17:51:53 +00:00
ProfileData
Support [ARM] Fix MachO CPU Subtype selection 2015-08-21 21:52:48 +00:00
TableGen TableGen: Support folding casts from bits to int 2015-07-31 01:12:06 +00:00
Target AMDGPU/SI: Better handle s_wait insertion 2015-08-21 22:47:27 +00:00
Transforms Re-apply r245635, "[InstCombine] Transform A & (L - 1) u< L --> L != 0" 2015-08-21 22:22:37 +00:00
CMakeLists.txt
LLVMBuild.txt
Makefile