1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00
llvm-mirror/lib/Target/AMDGPU
Nirav Dave afe2eccae3 In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Retrying after fixing after removing load-store factoring through
token factors in favor of improved token factor operand pruning

Simplify Consecutive Merge Store Candidate Search

Now that address aliasing is much less conservative, push through
simplified store merging search which only checks for parallel stores
through the chain subgraph. This is cleaner as the separation of
non-interfering loads/stores from the store-merging logic.

Whem merging stores, search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited. This improves the quality of the
output SelectionDAG and generally the output CodeGen (with some
exceptions).

Additional Minor Changes:

   1. Finishes removing unused AliasLoad code
   2. Unifies the the chain aggregation in the merged stores across
      code paths
   3. Re-add the Store node to the worklist after calling
      SimplifyDemandedBits.
   4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is
      arbitrary, but seemed sufficient to not cause regressions in
      tests.

This finishes the change Matt Arsenault started in r246307 and
jyknight's original patch.

Many tests required some changes as memory operations are now
reorderable. Some tests relying on the order were changed to use
volatile memory operations

Noteworthy tests:

    CodeGen/AArch64/argument-blocks.ll -
      It's not entirely clear what the test_varargs_stackalign test is
      supposed to be asserting, but the new code looks right.

    CodeGen/AArch64/arm64-memset-inline.lli -
    CodeGen/AArch64/arm64-stur.ll -
    CodeGen/ARM/memset-inline.ll -

      The backend now generates *worse* code due to store merging
      succeeding, as we do do a 16-byte constant-zero store efficiently.

    CodeGen/AArch64/merge-store.ll -
      Improved, but there still seems to be an extraneous vector insert
      from an element to itself?

    CodeGen/PowerPC/ppc64-align-long-double.ll -
      Worse code emitted in this case, due to the improved store->load
      forwarding.

    CodeGen/X86/dag-merge-fast-accesses.ll -
    CodeGen/X86/MergeConsecutiveStores.ll -
    CodeGen/X86/stores-merging.ll -
    CodeGen/Mips/load-store-left-right.ll -
      Restored correct merging of non-aligned stores

    CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll -
      Improved. Correctly merges buffer_store_dword calls

    CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll -
      Improved. Sidesteps loading a stored value and
      merges two stores

    CodeGen/X86/pr18023.ll -
      This test has been removed, as it was asserting incorrect
      behavior. Non-volatile stores *CAN* be moved past volatile loads,
      and now are.

    CodeGen/X86/vector-idiv.ll -
    CodeGen/X86/vector-lzcnt-128.ll -
      It's basically impossible to tell what these tests are actually
      testing. But, looks like the code got better due to the memory
      operations being recognized as non-aliasing.

    CodeGen/X86/win32-eh.ll -
      Both loads of the securitycookie are now merged.

Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle

Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel

Differential Revision: https://reviews.llvm.org/D14834

llvm-svn: 289659
2016-12-14 15:44:26 +00:00
..
AsmParser Replace APFloatBase static fltSemantics data members with getter functions 2016-12-14 11:57:17 +00:00
Disassembler AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
InstPrinter [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
MCTargetDesc [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
TargetInfo Move the global variables representing each Target behind accessor function 2016-10-09 23:00:34 +00:00
Utils AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
AMDGPU.h [AMDGPU] Add amdgpu-unify-metadata pass 2016-12-08 19:46:04 +00:00
AMDGPU.td AMDGPU/SI: Remove XNACK feature from CI 2016-12-09 19:49:58 +00:00
AMDGPUAlwaysInlinePass.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AMDGPUAnnotateKernelFeatures.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AMDGPUAnnotateUniformValues.cpp [AMDGPU] Scalarization of global uniform loads. 2016-12-08 17:28:47 +00:00
AMDGPUAsmPrinter.cpp AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects 2016-12-09 19:49:48 +00:00
AMDGPUAsmPrinter.h AMDGPU: Emit runtime metadata as a note element in .note section 2016-11-10 21:18:49 +00:00
AMDGPUCallingConv.td
AMDGPUCallLowering.cpp GlobalISel: pass Function to lowerFormalArguments directly (NFC). 2016-09-21 12:57:35 +00:00
AMDGPUCallLowering.h GlobalISel: pass Function to lowerFormalArguments directly (NFC). 2016-09-21 12:57:35 +00:00
AMDGPUCodeGenPrepare.cpp AMDGPU: Fix crash on i16 constant expression 2016-12-06 23:18:06 +00:00
AMDGPUFrameLowering.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
AMDGPUFrameLowering.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
AMDGPUInstrInfo.cpp MachineScheduler: Export function to construct "default" scheduler. 2016-11-28 20:11:54 +00:00
AMDGPUInstrInfo.h MachineScheduler: Export function to construct "default" scheduler. 2016-11-28 20:11:54 +00:00
AMDGPUInstrInfo.td AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. 2016-12-07 02:42:15 +00:00
AMDGPUInstructions.td [AMDGPU] Add f16 support (VI+) 2016-11-13 07:01:11 +00:00
AMDGPUIntrinsicInfo.cpp
AMDGPUIntrinsicInfo.h
AMDGPUIntrinsics.td
AMDGPUISelDAGToDAG.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
AMDGPUISelLowering.cpp In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. 2016-12-14 15:44:26 +00:00
AMDGPUISelLowering.h AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. 2016-12-07 02:42:15 +00:00
AMDGPUMachineFunction.cpp
AMDGPUMachineFunction.h AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment 2016-12-06 21:53:10 +00:00
AMDGPUMCInstLower.cpp [AMDGPU] Add wave barrier builtin 2016-11-15 19:00:15 +00:00
AMDGPUMCInstLower.h Reapply "AMDGPU: Support using tablegened MC pseudo expansions" 2016-10-06 17:19:11 +00:00
AMDGPUOpenCLImageTypeLoweringPass.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AMDGPUPromoteAlloca.cpp AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts 2016-12-10 00:52:50 +00:00
AMDGPUPTNote.h AMDGPU: Emit runtime metadata as a note element in .note section 2016-11-10 21:18:49 +00:00
AMDGPURegisterInfo.cpp
AMDGPURegisterInfo.h
AMDGPURegisterInfo.td
AMDGPURuntimeMetadata.h AMDGPU: Attempt to fix build failure on x86-64 selfhost build 2016-11-11 02:48:50 +00:00
AMDGPUSubtarget.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
AMDGPUSubtarget.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
AMDGPUTargetMachine.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
AMDGPUTargetMachine.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
AMDGPUTargetObjectFile.cpp Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject. 2016-10-24 19:23:39 +00:00
AMDGPUTargetObjectFile.h Target: Change various section classifiers in TargetLoweringObjectFile to take a GlobalObject. 2016-10-24 19:23:39 +00:00
AMDGPUTargetTransformInfo.cpp AMDGPU: llvm.amdgcn.interp.mov is a source of divergence 2016-12-12 16:52:19 +00:00
AMDGPUTargetTransformInfo.h Do a sweep over move ctors and remove those that are identical to the default. 2016-10-20 12:20:28 +00:00
AMDGPUUnifyMetadata.cpp [AMDGPU] Add amdgpu-unify-metadata pass 2016-12-08 19:46:04 +00:00
AMDILCFGStructurizer.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
AMDKernelCodeT.h
BUFInstructions.td AMDGPU: Add VI i16 support 2016-11-10 16:02:37 +00:00
CaymanInstructions.td
CIInstructions.td [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitions 2016-09-23 09:08:07 +00:00
CMakeLists.txt [AMDGPU] Add amdgpu-unify-metadata pass 2016-12-08 19:46:04 +00:00
DSInstructions.td AMDGPU: Add VI i16 support 2016-11-10 16:02:37 +00:00
EvergreenInstructions.td
FLATInstructions.td AMDGPU: Rename flat operands to match mubuf 2016-11-29 19:30:44 +00:00
GCNHazardRecognizer.cpp AMDGPU: Rename flat operands to match mubuf 2016-11-29 19:30:44 +00:00
GCNHazardRecognizer.h AMDGPU/SI: Handle hazard with s_rfe_b64 2016-10-27 23:50:21 +00:00
GCNSchedStrategy.cpp AMDGPU/SI: Allow using SGPRs 96-101 on VI 2016-12-09 19:49:40 +00:00
GCNSchedStrategy.h
LLVMBuild.txt
MIMGInstructions.td [AMDGPU] TableGen: change individual instruction flags to bit type from bits<1> 2016-11-15 13:39:07 +00:00
Processors.td AMDGPU: Refactor processor definition to use ISA version features 2016-10-26 16:37:56 +00:00
R600ClauseMergePass.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
R600ControlFlowFinalizer.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
R600Defines.h
R600EmitClauseMarkers.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
R600ExpandSpecialInstrs.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
R600FrameLowering.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
R600FrameLowering.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
R600InstrFormats.td
R600InstrInfo.cpp Finish renaming remaining analyzeBranch functions 2016-09-14 20:43:16 +00:00
R600InstrInfo.h Finish renaming remaining analyzeBranch functions 2016-09-14 20:43:16 +00:00
R600Instructions.td AMDGPU: Refactor exp instructions 2016-12-05 20:23:10 +00:00
R600Intrinsics.td
R600ISelLowering.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
R600ISelLowering.h
R600MachineFunctionInfo.cpp
R600MachineFunctionInfo.h
R600MachineScheduler.cpp
R600MachineScheduler.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
R600OptimizeVectorRegisters.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
R600Packetizer.cpp Fix spelling mistakes in AMDGPU target comments. NFC. 2016-11-18 11:04:02 +00:00
R600RegisterInfo.cpp
R600RegisterInfo.h
R600RegisterInfo.td
R600Schedule.td
R700Instructions.td
SIAnnotateControlFlow.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
SIDebuggerInsertNops.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
SIDefines.h AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIFixControlFlowLiveIntervals.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
SIFixSGPRCopies.cpp AMDGPU/SI: Don't move copies of immediates to the VALU 2016-12-06 21:13:30 +00:00
SIFoldOperands.cpp AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIFrameLowering.cpp AMDGPU: Fix using incorrect private resource with no allocation 2016-10-28 19:43:31 +00:00
SIFrameLowering.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
SIInsertSkips.cpp AMDGPU: Refactor exp instructions 2016-12-05 20:23:10 +00:00
SIInsertWaits.cpp AMDGPU: Refactor exp instructions 2016-12-05 20:23:10 +00:00
SIInstrFormats.td AMDGPU: Fix vintrp disassembly 2016-12-10 00:29:55 +00:00
SIInstrInfo.cpp AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIInstrInfo.h AMDGPU: Fix asan errors when folding operands 2016-12-10 19:58:00 +00:00
SIInstrInfo.td AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIInstructions.td AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIIntrinsics.td AMDGPU: Refactor exp instructions 2016-12-05 20:23:10 +00:00
SIISelLowering.cpp AMDGPU: Fix isTypeDesirableForOp for i16 2016-12-09 17:57:43 +00:00
SIISelLowering.h AMDGPU: Make f16 ConstantFP legal 2016-12-08 20:14:46 +00:00
SILoadStoreOptimizer.cpp [AMDGPU][CodeGen] To improve CGEMM performance: combine LDS reads. 2016-11-03 14:37:13 +00:00
SILowerControlFlow.cpp [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies 2016-11-28 18:58:49 +00:00
SILowerI1Copies.cpp [AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies 2016-11-28 18:58:49 +00:00
SIMachineFunctionInfo.cpp AMDGPU/SI: Add support for triples with the mesa3d operating system 2016-09-16 21:34:26 +00:00
SIMachineFunctionInfo.h [AMDGPU] Wave and register controls 2016-09-06 20:22:28 +00:00
SIMachineScheduler.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-12 22:23:53 +00:00
SIMachineScheduler.h [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
SIOptimizeExecMasking.cpp AMDGPU: Fix use-after-free in SIOptimizeExecMasking 2016-10-07 08:40:14 +00:00
SIRegisterInfo.cpp AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIRegisterInfo.h AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SIRegisterInfo.td AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SISchedule.td
SIShrinkInstructions.cpp AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
SITypeRewriter.cpp Use StringRef in Pass/PassManager APIs (NFC) 2016-10-01 02:56:57 +00:00
SIWholeQuadMode.cpp [AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). 2016-12-09 22:06:55 +00:00
SMInstructions.td [AMDGPU] Scalarization of global uniform loads. 2016-12-08 17:28:47 +00:00
SOPInstructions.td AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. 2016-12-07 02:42:15 +00:00
VIInstrFormats.td [AMDGPU] Refactor VOP1 and VOP2 instruction TD definitions 2016-09-23 09:08:07 +00:00
VIInstructions.td AMDGPU: Add VI i16 support 2016-11-10 16:02:37 +00:00
VOP1Instructions.td Check that emitted instructions meet their predicates on all targets except ARM, Mips, and X86. 2016-11-19 13:05:44 +00:00
VOP2Instructions.td AMDGPU: Fix handling of 16-bit immediates 2016-12-10 00:39:12 +00:00
VOP3Instructions.td [AMDGPU] Handle f16 select{_cc} 2016-11-16 03:16:26 +00:00
VOPCInstructions.td [AMDGPU] Add f16 support (VI+) 2016-11-13 07:01:11 +00:00
VOPInstructions.td Fix spelling mistakes in AMDGPU target comments. NFC. 2016-11-18 11:04:02 +00:00