1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

12 Commits

Author SHA1 Message Date
Matt Arsenault
40d3e7765c AMDGPU: Remove denormal subtarget features
Switch to using the denormal-fp-math/denormal-fp-math-f32 attributes.
2020-04-02 17:17:12 -04:00
Changpeng Fang
d65c44efb7 AMDGPU: Implement FDIV optimizations in AMDGPUCodeGenPrepare
Summary:
      RCP has the accuracy limit. If FDIV fpmath require high accuracy rcp may not
    meet the requirement. However, in DAG lowering, fpmath information gets lost,
    and thus we may generate either inaccurate rcp related computation or slow code
    for fdiv.

    In patch implements fdiv optimizations in the AMDGPUCodeGenPrepare, which could
    exactly know !fpmath.

     FastUnsafeRcpLegal: We determine whether it is legal to use rcp based on
                         unsafe-fp-math, fast math flags, denormals and fpmath
                         accuracy request.

     RCP Optimizations:
       1/x -> rcp(x) when fast unsafe rcp is legal or fpmath >= 2.5ULP with
                                                      denormals flushed.
       a/b -> a*rcp(b) when fast unsafe rcp is legal.

     Use fdiv.fast:
       a/b -> fdiv.fast(a, b) when RCP optimization is not performed and
                              fpmath >= 2.5ULP with denormals flushed.

       1/x -> fdiv.fast(1,x)  when RCP optimization is not performed and
                              fpmath >= 2.5ULP with denormals.

    Reviewers:
      arsenm

    Differential Revision:
      https://reviews.llvm.org/D71293
2020-01-23 16:57:43 -08:00
Qiu Chaofan
3d8ed5a845 [DAGCombiner] Improve division estimation of floating points.
Current implementation of estimating divisions loses precision since it
estimates reciprocal first and does multiplication.  This patch is to re-order
arithmetic operations in the last iteration in DAGCombiner to improve the
accuracy.

Reviewed By: Sanjay Patel, Jinsong Ji

Differential Revision: https://reviews.llvm.org/D66050

llvm-svn: 371713
2019-09-12 07:51:24 +00:00
Matt Arsenault
7ead0f3ed6 AMDGPU: Allow SIShrinkInstructions to work in non-SSA
Immediates can be folded as long as the immediate is a vreg.

Also undo commuting instructions if it didn't fold an immediate.

llvm-svn: 307575
2017-07-10 19:53:57 +00:00
Alexander Timofeev
71cf453d98 [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307097
2017-07-04 17:32:00 +00:00
NAKAMURA Takumi
e6c7524092 Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"
It broke a testcase.

  Failing Tests (1):
      LLVM :: CodeGen/AMDGPU/alignbit-pat.ll

llvm-svn: 307054
2017-07-04 02:14:18 +00:00
Alexander Timofeev
ef2bf78d2f [AMDGPU] Switch scalarize global loads ON by default
Differential revision: https://reviews.llvm.org/D34407

llvm-svn: 307026
2017-07-03 14:54:11 +00:00
Matt Arsenault
dd9ab77318 AMDGPU: Mark all unspecified CC functions in tests as amdgpu_kernel
Currently the default C calling convention functions are treated
the same as compute kernels. Make this explicit so the default
calling convention can be changed to a non-kernel.

Converted with perl -pi -e 's/define void/define amdgpu_kernel void/'
on the relevant test directories (and undoing in one place that actually
wanted a non-kernel).

llvm-svn: 298444
2017-03-21 21:39:51 +00:00
Matt Arsenault
6944286a7c AMDGPU: fdiv -1, x -> rcp -x
llvm-svn: 277535
2016-08-02 22:25:04 +00:00
Matt Arsenault
7193bf43c5 AMDGPU: Add volatile to test loads and stores
When the memory vectorizer is enabled, these tests break.
These tests don't really care about the memory instructions,
and it's easier to write check lines with the unmerged loads.

llvm-svn: 266071
2016-04-12 13:38:18 +00:00
Matt Arsenault
e676b40286 AMDGPU: Remove some old intrinsic uses from tests
llvm-svn: 260493
2016-02-11 06:02:01 +00:00
Tom Stellard
3f1708598e R600 -> AMDGPU rename
llvm-svn: 239657
2015-06-13 03:28:10 +00:00