1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00
llvm-mirror/lib
Sanjay Patel 7477a9a155 Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)
The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c:

float distance = sqrt(dx * dx + dy * dy + dz * dz);
float mag = dt / (distance * distance * distance);

Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces:

   addis 3, 2, .LCPI4_2@toc@ha
   lfs 4, .LCPI4_2@toc@l(3)
   addis 3, 2, .LCPI4_1@toc@ha
   lfs 0, .LCPI4_1@toc@l(3)
   fcmpu 0, 1, 4
   beq 0, .LBB4_2
# BB#1:
   frsqrtes 4, 1
   addis 3, 2, .LCPI4_0@toc@ha
   lfs 5, .LCPI4_0@toc@l(3)
   fnmsubs 13, 1, 5, 1
   fmuls 6, 4, 4
   fmadds 1, 13, 6, 5
   fmuls 1, 4, 1
   fres 4, 1                <--- reciprocal of reciprocal square root
   fnmsubs 1, 1, 4, 0
   fmadds 4, 4, 1, 4
.LBB4_2:
   fmuls 1, 4, 2
   fres 2, 1
   fnmsubs 0, 1, 2, 0
   fmadds 0, 2, 0, 2
   fmuls 1, 3, 0
   blr

After the patch, this simplifies to:

frsqrtes 0, 1
addis 3, 2, .LCPI4_1@toc@ha
fres 5, 2
lfs 4, .LCPI4_1@toc@l(3)
addis 3, 2, .LCPI4_0@toc@ha
lfs 7, .LCPI4_0@toc@l(3)
fnmsubs 13, 1, 4, 1
fmuls 6, 0, 0
fnmsubs 2, 2, 5, 7
fmadds 1, 13, 6, 4
fmadds 2, 5, 2, 5
fmuls 0, 0, 1
fmuls 0, 0, 2
fmuls 1, 3, 0
blr

Differential Revision: http://reviews.llvm.org/D5628

llvm-svn: 219139
2014-10-06 19:31:18 +00:00
..
Analysis [BasicAA] Revert "Revert r218714 - Make better use of zext and sign information." 2014-10-06 18:37:59 +00:00
AsmParser Make CallingConv::ID an alias of "unsigned". 2014-09-10 18:00:17 +00:00
Bitcode Do not destroy external linkage when deleting function body 2014-09-23 12:54:19 +00:00
CodeGen Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y) 2014-10-06 19:31:18 +00:00
DebugInfo Refactor RelocVisitor to take an object. This removes some 2014-10-06 06:55:55 +00:00
ExecutionEngine [MCJIT] Don't crash in debugging output for sections that aren't emitted. 2014-10-01 21:57:47 +00:00
IR [PM] Remove an unused and rather expensive mapping from an analysis 2014-10-06 00:30:59 +00:00
IRReader Pass a && to getLazyBitcodeModule. 2014-09-03 17:31:46 +00:00
LineEditor
Linker Merge alignment of common GlobalValue. 2014-09-09 17:48:18 +00:00
LTO LTO: Document the Boolean argument from r218784 2014-10-02 21:11:04 +00:00
MC MachObjectWriter: optimize the string table for common suffices 2014-10-06 17:05:19 +00:00
Object llvm-readobj: print out the fields of the COFF delay-import table 2014-10-03 18:07:18 +00:00
Option Add an overload of getLastArgNoClaim taking two OptSpecifiers. 2014-09-12 19:42:53 +00:00
ProfileData Eliminate some deep std::vector copies. NFC. 2014-10-03 18:33:16 +00:00
Support Make the MD5 result name consistent between functions, header and source. 2014-10-06 13:48:07 +00:00
TableGen Eliminate some deep std::vector copies. NFC. 2014-10-03 18:33:16 +00:00
Target ARM: silence unused variable warning 2014-10-06 17:26:36 +00:00
Transforms Give the Reassociate pass a bit more flexibility and autonomy when optimizing expressions. 2014-10-05 23:41:26 +00:00
CMakeLists.txt
LLVMBuild.txt
Makefile