1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
llvm-mirror/lib/Target/AMDGPU
Nikolai Bozhenov 2540ce6c57 [X86] Heuristic to selectively build Newton-Raphson SQRT estimation
On modern Intel processors hardware SQRT in many cases is faster than RSQRT
followed by Newton-Raphson refinement. The patch introduces a simple heuristic
to choose between hardware SQRT instruction and Newton-Raphson software
estimation.

The patch treats scalars and vectors differently. The heuristic is that for
scalars the compiler should optimize for latency while for vectors it should
optimize for throughput. It is based on the assumption that throughput bound
code is likely to be vectorized.

Basically, the patch disables scalar NR for big cores and disables NR completely
for Skylake. Firstly, scalar SQRT has shorter latency than NR code in big cores.
Secondly, vector SQRT has been greatly improved in Skylake and has better
throughput compared to NR.

Differential Revision: https://reviews.llvm.org/D21379

llvm-svn: 277725
2016-08-04 12:47:28 +00:00
..
AsmParser [AMDGPU] Assembler: fix row_bcast parsing 2016-07-14 14:50:35 +00:00
Disassembler AMDGPU: Expand register indexing pseudos in custom inserter 2016-07-19 00:35:03 +00:00
InstPrinter AMDGPU: Remove unnecessary string usage in AsmPrinter 2016-07-05 22:06:56 +00:00
MCTargetDesc MC] Provide an MCTargetOptions to implementors of MCAsmBackendCtorTy, NFC 2016-07-25 17:18:28 +00:00
TargetInfo Remove autoconf support 2016-01-26 21:29:08 +00:00
Utils [AMDGPU] Enable absolute expression initializer for amd_kernel_code_t fields. 2016-06-23 14:13:06 +00:00
AMDGPU.h AMDGPU: Change fdiv lowering based on !fpmath metadata 2016-07-19 23:16:53 +00:00
AMDGPU.td AMDGPU: Add feature for unaligned access 2016-07-01 23:03:44 +00:00
AMDGPUAlwaysInlinePass.cpp Cloning: Clean up the interface to the CloneFunction function. 2016-05-10 20:23:24 +00:00
AMDGPUAnnotateKernelFeatures.cpp AMDGPU: Add HSA dispatch id intrinsic 2016-07-22 17:01:30 +00:00
AMDGPUAnnotateUniformValues.cpp Add optimization bisect opt-in calls for AMDGPU passes 2016-04-25 22:23:44 +00:00
AMDGPUAsmPrinter.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
AMDGPUAsmPrinter.h AMDGPU: Delete more dead code 2016-07-22 17:01:25 +00:00
AMDGPUCallingConv.td AMDGPU: Fix kernel argument alignment impacting stack size 2016-06-18 05:15:53 +00:00
AMDGPUCallLowering.cpp AMDGPU: Add skeleton GlobalIsel implementation 2016-04-14 19:09:28 +00:00
AMDGPUCallLowering.h AMDGPU: Add skeleton GlobalIsel implementation 2016-04-14 19:09:28 +00:00
AMDGPUCodeGenPrepare.cpp AMDGPU: Use rcp for fdiv 1, x with fpmath metadata 2016-07-26 23:25:44 +00:00
AMDGPUFrameLowering.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
AMDGPUFrameLowering.h AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
AMDGPUInstrInfo.cpp AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
AMDGPUInstrInfo.h AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
AMDGPUInstrInfo.td AMDGPU : Add intrinsics for compare with the full wavefront result 2016-07-28 16:42:13 +00:00
AMDGPUInstructions.td AMDGPU: Fix i1 fp_to_int 2016-07-22 17:01:21 +00:00
AMDGPUIntrinsicInfo.cpp AMDGPU: Change fdiv lowering based on !fpmath metadata 2016-07-19 23:16:53 +00:00
AMDGPUIntrinsicInfo.h AMDGPU: Change fdiv lowering based on !fpmath metadata 2016-07-19 23:16:53 +00:00
AMDGPUIntrinsics.td AMDGPU: Remove read_workdim intrinsic 2016-07-25 20:17:02 +00:00
AMDGPUISelDAGToDAG.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
AMDGPUISelLowering.cpp [X86] Heuristic to selectively build Newton-Raphson SQRT estimation 2016-08-04 12:47:28 +00:00
AMDGPUISelLowering.h [X86] Heuristic to selectively build Newton-Raphson SQRT estimation 2016-08-04 12:47:28 +00:00
AMDGPUMachineFunction.cpp AMDGPU: Make AMDGPUMachineFunction fields private 2016-07-26 16:45:58 +00:00
AMDGPUMachineFunction.h AMDGPU: Make AMDGPUMachineFunction fields private 2016-07-26 16:45:58 +00:00
AMDGPUMCInstLower.cpp AMDGPU/SI: Add support for R_AMDGPU_GOTPCREL 2016-07-13 14:23:33 +00:00
AMDGPUMCInstLower.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
AMDGPUOpenCLImageTypeLoweringPass.cpp [NFC] Header cleanup 2016-04-18 09:17:29 +00:00
AMDGPUPromoteAlloca.cpp AMDGPU: Remove pointless dyn_cast_or_null 2016-07-18 19:00:07 +00:00
AMDGPURegisterInfo.cpp AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
AMDGPURegisterInfo.h AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
AMDGPURegisterInfo.td
AMDGPURuntimeMetadata.h Re-commit [AMDGPU] Add metadata for runtime 2016-07-16 05:09:21 +00:00
AMDGPUSubtarget.cpp AMDGPU: Delete dead code 2016-07-25 19:06:25 +00:00
AMDGPUSubtarget.h AMDGPU: Delete dead code 2016-07-25 19:06:25 +00:00
AMDGPUTargetMachine.cpp [GlobalISel] Introduce an instruction selector. 2016-07-27 14:31:55 +00:00
AMDGPUTargetMachine.h AMDGPU: Delete more dead code 2016-07-22 17:01:25 +00:00
AMDGPUTargetObjectFile.cpp Revert "[AMDGPU] Emit read-only data to .rodata for hsa" 2016-07-22 23:46:40 +00:00
AMDGPUTargetObjectFile.h AMDGPU/SI: Add support for AMD code object version 2. 2016-05-05 17:03:33 +00:00
AMDGPUTargetTransformInfo.cpp AMDGPU: Implement getLoadStoreVecRegBitWidth 2016-07-01 00:56:27 +00:00
AMDGPUTargetTransformInfo.h [TTI] The cost model should not assume vector casts get completely scalarized 2016-07-06 17:30:56 +00:00
AMDILCFGStructurizer.cpp AMDGPU: Remove implicit iterator conversions, NFC 2016-07-08 19:16:05 +00:00
AMDKernelCodeT.h [AMDGPU] fix amd_kernel_code_t bit field position as per spec (added missing reserved fields) 2016-02-24 10:54:25 +00:00
CaymanInstructions.td AMDGPU/R600: Add PatFrags for selecting the correct vtx id for loads 2016-07-05 00:12:51 +00:00
CIInstructions.td [AMDGPU] refactor DS instruction definitions. NFC. 2016-08-01 14:21:30 +00:00
CMakeLists.txt AMDGPU/R600: Delete/rename intrinsics no longer used by mesa 2016-07-14 05:47:17 +00:00
DSInstructions.td [AMDGPU] refactor DS instruction definitions. NFC. 2016-08-01 14:21:30 +00:00
EvergreenInstructions.td AMDGPU/R600: Replace barrier intrinsics 2016-07-18 18:34:59 +00:00
GCNHazardRecognizer.cpp AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
GCNHazardRecognizer.h AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
LLVMBuild.txt AMDGPU: Prune AMDGPUAsmParser in libdeps. 2016-07-09 07:54:27 +00:00
Processors.td AMDGPU: Fix crashes on unknown processor name 2016-06-02 18:37:16 +00:00
R600ClauseMergePass.cpp AMDGPU: Remove implicit iterator conversions, NFC 2016-07-08 19:16:05 +00:00
R600ControlFlowFinalizer.cpp AMDGPU: Delete more dead code 2016-07-22 17:01:25 +00:00
R600Defines.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
R600EmitClauseMarkers.cpp AMDGPU: Remove implicit iterator conversions, NFC 2016-07-08 19:16:05 +00:00
R600ExpandSpecialInstrs.cpp AMDGPU: Delete more dead code 2016-07-22 17:01:25 +00:00
R600FrameLowering.cpp AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
R600FrameLowering.h AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
R600InstrFormats.td
R600InstrInfo.cpp [AMDGPU] Fix lifetime of SmallVector temporaries. 2016-07-30 11:31:16 +00:00
R600InstrInfo.h AMDGPU/R600: Delete dead code. 2016-07-15 21:26:46 +00:00
R600Instructions.td AMDGPU: Fix TargetPrefix for remaining r600 intrinsics 2016-07-15 21:27:08 +00:00
R600Intrinsics.td AMDGPU: Fix TargetPrefix for remaining r600 intrinsics 2016-07-15 21:27:08 +00:00
R600ISelLowering.cpp AMDGPU/R600: Remove dead custom inserters 2016-07-26 21:03:38 +00:00
R600ISelLowering.h AMDGPU: Fix i1 fp_to_int 2016-07-22 17:01:21 +00:00
R600MachineFunctionInfo.cpp AMDGPU: Delete more dead code 2016-07-22 17:01:25 +00:00
R600MachineFunctionInfo.h AMDGPU: Delete more dead code 2016-07-22 17:01:25 +00:00
R600MachineScheduler.cpp CodeGen: Use MachineInstr& in TargetInstrInfo, NFC 2016-06-30 00:01:54 +00:00
R600MachineScheduler.h AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
R600OptimizeVectorRegisters.cpp AMDGPU: Remove implicit iterator conversions, NFC 2016-07-08 19:16:05 +00:00
R600Packetizer.cpp CodeGen: Use MachineInstr& in TargetInstrInfo, NFC 2016-06-30 00:01:54 +00:00
R600RegisterInfo.cpp AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
R600RegisterInfo.h AMDGPU: Move R600 only pieces into R600 classes 2016-07-09 18:11:15 +00:00
R600RegisterInfo.td
R600Schedule.td AMDGPU: Fix trailing whitespace 2016-06-10 02:18:02 +00:00
R700Instructions.td
SIAnnotateControlFlow.cpp AMDGPU/SI: Don't handle a loop if there is no loop at all for a terminator BB. 2016-07-28 23:01:45 +00:00
SIDebuggerInsertNops.cpp AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
SIDefines.h AMDGPU: Stay in WQM for non-intrinsic stores 2016-08-02 19:31:14 +00:00
SIFixControlFlowLiveIntervals.cpp Revert "AMDGPU: Remove unused control flow intrinsic" 2016-07-09 17:18:39 +00:00
SIFixSGPRCopies.cpp Revert "AMDGPU: Remove unused control flow intrinsic" 2016-07-09 17:18:39 +00:00
SIFoldOperands.cpp CodeGen: Use MachineInstr& in TargetInstrInfo, NFC 2016-06-30 00:01:54 +00:00
SIFrameLowering.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
SIFrameLowering.h [AMDGPU] Emit debugger prologue and emit the rest of the debugger fields in the kernel code header 2016-06-25 03:11:28 +00:00
SIInsertWaits.cpp AMDGPU: Remove implicit iterator conversions, NFC 2016-07-08 19:16:05 +00:00
SIInstrFormats.td AMDGPU: Stay in WQM for non-intrinsic stores 2016-08-02 19:31:14 +00:00
SIInstrInfo.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
SIInstrInfo.h AMDGPU: Stay in WQM for non-intrinsic stores 2016-08-02 19:31:14 +00:00
SIInstrInfo.td AMDGPU: Stay in WQM for non-intrinsic stores 2016-08-02 19:31:14 +00:00
SIInstructions.td AMDGPU: Stay in WQM for non-intrinsic stores 2016-08-02 19:31:14 +00:00
SIIntrinsics.td AMDGPU: Change fdiv lowering based on !fpmath metadata 2016-07-19 23:16:53 +00:00
SIISelLowering.cpp AMDGPU: fdiv -1, x -> rcp -x 2016-08-02 22:25:04 +00:00
SIISelLowering.h AMDGPU: Remove analyzeImmediate 2016-07-28 00:32:02 +00:00
SILoadStoreOptimizer.cpp AMDGPU: Move subtarget feature checks into passes 2016-06-27 20:32:13 +00:00
SILowerControlFlow.cpp AMDGPU: add execfix flag to SI_ELSE 2016-07-28 11:39:24 +00:00
SILowerI1Copies.cpp AMDGPU: Cleanup subtarget handling. 2016-06-24 06:30:11 +00:00
SIMachineFunctionInfo.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
SIMachineFunctionInfo.h AMDGPU: Make AMDGPUMachineFunction fields private 2016-07-26 16:45:58 +00:00
SIMachineScheduler.cpp AMDGPU/SI: Fix SI scheduler refcount issue 2016-07-19 00:35:22 +00:00
SIMachineScheduler.h AMDGPU: R600 code splitting cleanup 2016-03-11 08:00:27 +00:00
SIRegisterInfo.cpp MachineFunction: Return reference for getFrameInfo(); NFC 2016-07-28 18:40:00 +00:00
SIRegisterInfo.h AMDGPU/SI: Don't use reserved VGPRs for SGPR spilling 2016-07-28 14:30:43 +00:00
SIRegisterInfo.td [AMDGPU] Some code cleaning in SIRegisterInfo.td 2016-07-21 13:29:57 +00:00
SISchedule.td AMDGPU: Define a schedule class for COPY. 2016-06-24 23:52:11 +00:00
SIShrinkInstructions.cpp AMDGPU: Expand register indexing pseudos in custom inserter 2016-07-19 00:35:03 +00:00
SITypeRewriter.cpp AMDGPU: Add a shader calling convention 2016-04-06 19:40:20 +00:00
SIWholeQuadMode.cpp AMDGPU: Stay in WQM for non-intrinsic stores 2016-08-02 19:31:14 +00:00
VIInstrFormats.td [AMDGPU] refactor DS instruction definitions. NFC. 2016-08-01 14:21:30 +00:00
VIInstructions.td [AMDGPU] refactor DS instruction definitions. NFC. 2016-08-01 14:21:30 +00:00