Matt Arsenault
dd9ab77318
AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
...
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.
Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).
llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
3a9a1ac61b
AMDGPU: Use unsigned compare for eq/ne
...
For some reason there are both of these available, except
for scalar 64-bit compares which only has u64. I'm not sure
why there are both (I'm guessing it's for the one bit inputs we
don't use), but for consistency always using the
unsigned one.
llvm-svn: 282832
2016-09-30 01:50:20 +00:00
Matt Arsenault
2dfb6d03c5
AMDGPU: Run SIFoldOperands after PeepholeOptimizer
...
PeepholeOptimizer cleans up redundant copies, which makes
the operand folding more effective.
shader-db stats:
Totals:
SGPRS: 34200 -> 34336 (0.40 %)
VGPRS: 22118 -> 21655 (-2.09 %)
Code Size: 632144 -> 633460 (0.21 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 10240 -> 11264 (10.00 %) bytes per wave
Max Waves: 8822 -> 8918 (1.09 %)
Wait states: 0 -> 0 (0.00 %)
Totals from affected shaders:
SGPRS: 7704 -> 7840 (1.77 %)
VGPRS: 5169 -> 4706 (-8.96 %)
Code Size: 234444 -> 235760 (0.56 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Scratch: 0 -> 1024 (0.00 %) bytes per wave
Max Waves: 1188 -> 1284 (8.08 %)
Wait states: 0 -> 0 (0.00 %)
Increases:
SGPRS: 35 (0.01 %)
VGPRS: 1 (0.00 %)
Code Size: 59 (0.02 %)
LDS: 0 (0.00 %)
Scratch: 1 (0.00 %)
Max Waves: 48 (0.02 %)
Wait states: 0 (0.00 %)
Decreases:
SGPRS: 26 (0.01 %)
VGPRS: 54 (0.02 %)
Code Size: 68 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Max Waves: 4 (0.00 %)
Wait states: 0 (0.00 %)
llvm-svn: 266378
2016-04-14 21:58:24 +00:00
Matt Arsenault
e73cb153a7
AMDGPU: Fold bitcasts of scalar constants to vectors
...
This cleans up some messes since the individual scalar components
can be CSEed.
llvm-svn: 266376
2016-04-14 21:58:07 +00:00
Matt Arsenault
e676b40286
AMDGPU: Remove some old intrinsic uses from tests
...
llvm-svn: 260493
2016-02-11 06:02:01 +00:00
Matt Arsenault
3784a7252a
AMDGPU: Improve accuracy of instruction rates for some FP instructions
...
llvm-svn: 245774
2015-08-22 00:50:41 +00:00
Tom Stellard
a3220fa789
AMDGPU/SI: Add support for shrinking v_cndmask_b32_e32 instructions
...
Reviewers: arsenm
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D11061
llvm-svn: 242146
2015-07-14 14:15:03 +00:00
Tom Stellard
3f1708598e
R600 -> AMDGPU rename
...
llvm-svn: 239657
2015-06-13 03:28:10 +00:00