llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00

Author	SHA1	Message	Date
Matt Arsenault	4452689e07	AMDGPU: Add replacement bfe intrinsics llvm-svn: 295899	2017-02-22 23:04:58 +00:00
Matt Arsenault	a601343521	AMDGPU: Don't add emergency stack slot if all spills are SGPR->VGPR This should avoid reporting any stack needs to be allocated in the case where no stack is truly used. An unused stack slot is still left around in other cases where there are real stack objects but no spilling occurs. llvm-svn: 295891	2017-02-22 22:23:32 +00:00
Matt Arsenault	d3f55eae6c	AMDGPU: Don't look at chain users when adjusting writemask Fixes not adjusting using new intrinsics with chains. llvm-svn: 295878	2017-02-22 21:16:41 +00:00
Matt Arsenault	d47230b13f	AMDGPU: Always allocate emergency stack slot at offset 0 This allows us to ensure that 0 is never a valid pointer to a user object, and ensures that the offset is always legal without needing a register to access it. This comes at the cost of usable offsets and wasted stack space. llvm-svn: 295877	2017-02-22 21:05:25 +00:00
Matt Arsenault	9071e134d4	AMDGPU: Change exp with compr bit printing llvm-svn: 295873	2017-02-22 20:37:12 +00:00
Wei Ding	4029bfd5f8	Revert "AMDGPU : Update TrapCode based on Trap Handler ABI." This reverts commit r295867. llvm-svn: 295871	2017-02-22 20:29:22 +00:00
Wei Ding	2c0b4a37cf	AMDGPU : Update TrapCode based on Trap Handler ABI. Differential Revision: http://reviews.llvm.org/D30232 llvm-svn: 295867	2017-02-22 20:05:06 +00:00
Bill Seurer	2628d17171	[DAGCombiner] revert r295336 r295336 causes a bootstrapped clang to fail for many compilations on powerpc BE. See http://lab.llvm.org:8011/builders/clang-ppc64be-linux-multistage/builds/2315 for example. Reverting as per the developer's request. llvm-svn: 295849	2017-02-22 16:27:33 +00:00
Matt Arsenault	d2e2dba6a0	AMDGPU: Add cvt.pkrtz intrinsic Convert llvm.SI.packf16 test uses llvm-svn: 295797	2017-02-22 00:27:34 +00:00
Matt Arsenault	3320e649a3	AMDGPU: Remove some uses of llvm.SI.export in tests Merge some of the old, smaller tests into more complete versions. llvm-svn: 295792	2017-02-22 00:02:21 +00:00
Matt Arsenault	65d8dccee7	AMDGPU: Remove llvm.AMDGPU.clamp intrinsic llvm-svn: 295789	2017-02-21 23:46:04 +00:00
Matt Arsenault	73b8eb1cc6	AMDGPU: Redefine clamp node as clamp 0.0-1.0 Change implementation to use max instead of add. min/max/med3 do not flush denormals regardless of the mode, so it is OK to use it whether or not they are enabled. Also allow using clamp with f16, and use knowledge of dx10_clamp. llvm-svn: 295788	2017-02-21 23:35:48 +00:00
Matt Arsenault	b94f4fd9d0	AMDGPU: Remove dead declarations in tests llvm-svn: 295757	2017-02-21 19:31:33 +00:00
Matt Arsenault	0f8d55acef	AMDGPU: Remove llvm.AMDGPU.flbit intrinsic llvm-svn: 295754	2017-02-21 19:27:33 +00:00
Matt Arsenault	85a1bec778	AMDGPU: Don't use stack space for SGPR->VGPR spills Before frame offsets are calculated, try to eliminate the frame indexes used by SGPR spills. Then we can delete them after. I think for now we can be sure that no other instruction will be re-using the same frame indexes. It should be easy to notice if this assumption ever breaks since everything asserts if it tries to use a dead frame index later. The unused emergency stack slot seems to still be left behind, so an additional 4 bytes is still wasted. llvm-svn: 295753	2017-02-21 19:12:08 +00:00
NAKAMURA Takumi	a8741d2ac3	llvm/test/CodeGen/AMDGPU/r600.alu-limits.ll should require +Asserts. This would run into infinite loop anyways with -Asserts. llvm-svn: 295591	2017-02-19 02:31:06 +00:00
Matt Arsenault	a207e31c14	AMDGPU: Merge initial gfx9 support llvm-svn: 295554	2017-02-18 18:29:53 +00:00
Jan Vesely	ea180c8076	AMDGPU/R600: Assert on infinite loop in EmitClauseMarkers Differential Revision: https://reviews.llvm.org/D29792 llvm-svn: 295539	2017-02-18 04:24:10 +00:00
Justin Bogner	bc00df62ba	Verifier: Disallow a line number without a file in DISubprogram A line number doesn't make much sense if you don't say where it's from. Add a verifier check for this and update some tests that had bogus debug info. llvm-svn: 295516	2017-02-17 23:57:42 +00:00
Matt Arsenault	b210903892	AMDGPU: Fix crashes on invalid icmp/fcmp intrinsics llvm-svn: 295489	2017-02-17 19:49:10 +00:00
Matt Arsenault	240b1c3d6d	AMDGPU: Remove llvm.AMDGPU.cube intrinsic llvm-svn: 295359	2017-02-16 19:09:04 +00:00
Matt Arsenault	6075ccd031	AMDGPU: Remove llvm.AMDGPU.rsq intrinsic llvm-svn: 295358	2017-02-16 19:08:58 +00:00
Artur Pilipenko	17b23dc26c	[DAGCombiner] Support {a\|s}ext, {a\|z\|s}ext load nodes in load combine Resubmit -r295314 with PowerPC and AMDGPU tests updated. Support {a\|s}ext, {a\|z\|s}ext load nodes as a part of load combine patters. Reviewed By: filcab Differential Revision: https://reviews.llvm.org/D29591 llvm-svn: 295336	2017-02-16 17:07:27 +00:00
Matt Arsenault	97a1843703	AMDGPU: Remove llvm.SI.sendmsg llvm-svn: 295270	2017-02-16 02:01:17 +00:00
Matt Arsenault	daf5e675f7	AMDGPU: Remove SI_fs_constant and SI_fs_interp intrinsics Update test uses with expansion in terms of new intrinsics. llvm-svn: 295269	2017-02-16 02:01:13 +00:00
Matt Arsenault	d0625484c6	AMDGPU: Consolidate sendmsg/sendmsghalt handling and tests llvm-svn: 295244	2017-02-15 22:17:09 +00:00
Kyle Butt	96c1e7e4f0	Codegen: Make chains from trellis-shaped CFGs Lay out trellis-shaped CFGs optimally. A trellis of the shape below: A B \|\ /\| \| \ / \| \| X \| \| / \ \| \|/ \\| C D would be laid out A; B->C ; D by the current layout algorithm. Now we identify trellises and lay them out either A->C; B->D or A->D; B->C. This scales with an increasing number of predecessors. A trellis is a a group of 2 or more predecessor blocks that all have the same successors. because of this we can tail duplicate to extend existing trellises. As an example consider the following CFG: B D F H / \ / \ / \ / \ A---C---E---G---Ret Where A,C,E,G are all small (Currently 2 instructions). The CFG preserving layout is then A,B,C,D,E,F,G,H,Ret. The current code will copy C into B, E into D and G into F and yield the layout A,C,B(C),E,D(E),F(G),G,H,ret define void @straight_test(i32 %tag) { entry: br label %test1 test1: ; A %tagbit1 = and i32 %tag, 1 %tagbit1eq0 = icmp eq i32 %tagbit1, 0 br i1 %tagbit1eq0, label %test2, label %optional1 optional1: ; B call void @a() br label %test2 test2: ; C %tagbit2 = and i32 %tag, 2 %tagbit2eq0 = icmp eq i32 %tagbit2, 0 br i1 %tagbit2eq0, label %test3, label %optional2 optional2: ; D call void @b() br label %test3 test3: ; E %tagbit3 = and i32 %tag, 4 %tagbit3eq0 = icmp eq i32 %tagbit3, 0 br i1 %tagbit3eq0, label %test4, label %optional3 optional3: ; F call void @c() br label %test4 test4: ; G %tagbit4 = and i32 %tag, 8 %tagbit4eq0 = icmp eq i32 %tagbit4, 0 br i1 %tagbit4eq0, label %exit, label %optional4 optional4: ; H call void @d() br label %exit exit: ret void } here is the layout after D27742: straight_test: # @straight_test ; ... Prologue elided ; BB#0: # %entry ; A (merged with test1) ; ... More prologue elided mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_2 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_3 b .LBB0_4 .LBB0_2: # %optional1 ; B (copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_4 .LBB0_3: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_5 b .LBB0_6 .LBB0_4: # %optional2 ; D (copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_5: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 b .LBB0_7 .LBB0_6: # %optional3 ; F (copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit ; Ret ld 30, 96(1) # 8-byte Folded Reload addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr The tail-duplication has produced some benefit, but it has also produced a trellis which is not laid out optimally. With this patch, we improve the layouts of such trellises, and decrease the cost calculation for tail-duplication accordingly. This patch produces the layout A,C,E,G,B,D,F,H,Ret. This layout does have back edges, which is a negative, but it has a bigger compensating positive, which is that it handles the case where there are long strings of skipped blocks much better than the original layout. Both layouts handle runs of executed blocks equally well. Branch prediction also improves if there is any correlation between subsequent optional blocks. Here is the resulting concrete layout: straight_test: # @straight_test ; BB#0: # %entry ; A (merged with test1) mr 30, 3 andi. 3, 30, 1 bc 12, 1, .LBB0_4 ; BB#1: # %test2 ; C rlwinm. 3, 30, 0, 30, 30 bne 0, .LBB0_5 .LBB0_2: # %test3 ; E rlwinm. 3, 30, 0, 29, 29 bne 0, .LBB0_6 .LBB0_3: # %test4 ; G rlwinm. 3, 30, 0, 28, 28 bne 0, .LBB0_7 b .LBB0_8 .LBB0_4: # %optional1 ; B (Copy of C) bl a nop rlwinm. 3, 30, 0, 30, 30 beq 0, .LBB0_2 .LBB0_5: # %optional2 ; D (Copy of E) bl b nop rlwinm. 3, 30, 0, 29, 29 beq 0, .LBB0_3 .LBB0_6: # %optional3 ; F (Copy of G) bl c nop rlwinm. 3, 30, 0, 28, 28 beq 0, .LBB0_8 .LBB0_7: # %optional4 ; H bl d nop .LBB0_8: # %exit Differential Revision: https://reviews.llvm.org/D28522 llvm-svn: 295223	2017-02-15 19:49:14 +00:00
Stanislav Mekhanoshin	b83595fd3c	[AMDGPU] Revert failed scheduling This patch reverts region's scheduling to the original untouched state in case if we have have decreased occupancy. In addition it switches to use TargetRegisterInfo occupancy callback for pressure limits instead of gradually increasing limits which were just passed by. We are going to stay with the best schedule so we do not need to tolerate worsened scheduling anymore. Differential Revision: https://reviews.llvm.org/D29971 llvm-svn: 295206	2017-02-15 17:19:50 +00:00
Stanislav Mekhanoshin	479d45f82d	[AMDGPU] Fix MaxWorkGroupsPerCU for large workgroups This patch corrects the maximum workgroups per CU if we have big workgroups (more than 128). This calculation contributes to the occupancy calculation in respect to LDS size. Differential Revision: https://reviews.llvm.org/D29974 llvm-svn: 295134	2017-02-15 01:03:59 +00:00
Alexander Timofeev	ede4a3e0f8	Revert "[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track" This reverts commit ce06d9cb99298eb844b66e117f5108a06747c907. llvm-svn: 295054	2017-02-14 14:29:05 +00:00
Wei Ding	3609e1230f	AMDGPU : Add trap handler support. Differential Revision: http://reviews.llvm.org/D26010 llvm-svn: 294692	2017-02-10 02:15:29 +00:00
Stanislav Mekhanoshin	99ec2d0f0b	[AMDGPU] Override PSet for M0 This change returns empty PSet list for M0 register. Otherwise its PSet as defined by tablegen is SReg_32. This results in incorrect register pressure calculation every time an instruction uses M0. Such uses count as SReg_32 PSet and inadequately increase pressure on SGPRs. Differential Revision: https://reviews.llvm.org/D29798 llvm-svn: 294691	2017-02-10 02:07:58 +00:00
Matt Arsenault	478a09d3d1	AMDGPU: Add pass to expand memcpy/memmove/memset llvm-svn: 294635	2017-02-09 22:00:42 +00:00
Konstantin Zhuravlyov	f4d4c79e96	[AMDGPU] Calculate number of min/max SGPRs/VGPRs for WavesPerEU instead of using switch statement Differential Revision: https://reviews.llvm.org/D29741 llvm-svn: 294627	2017-02-09 21:33:23 +00:00
Konstantin Zhuravlyov	12928e55f8	[AMDGPU] Add target information that is required by tools to metadata Differential Revision: https://reviews.llvm.org/D28760#fb670e28 llvm-svn: 294449	2017-02-08 14:05:23 +00:00
Matt Arsenault	15bde0a3d5	AMDGPU: Enable InferAddressSpaces llvm-svn: 294408	2017-02-08 06:16:04 +00:00
Alexander Timofeev	325b448b53	[AMDGPU] Fix for SIMachineScheduler crash. SI Scheduler should track lane masks. Differential revision: https://reviews.llvm.org/D29442 llvm-svn: 294324	2017-02-07 17:57:48 +00:00
Yaxun Liu	1fc4bd34db	[AMDGPU] Lower null pointers in static variable initializer For amdgcn target Clang generates addrspacecast to represent null pointers in private and local address spaces. In LLVM codegen, the static variable initializer is lowered by virtual function AsmPrinter::lowerConstant which is target generic. Since addrspacecast is target specific, AsmPrinter::lowerConst This patch overrides AsmPrinter::lowerConstant with AMDGPUAsmPrinter::lowerConstant, which is able to lower the target-specific addrspacecast in the null pointer representation so that -1 is co Differential Revision: https://reviews.llvm.org/D29284 llvm-svn: 294265	2017-02-07 00:43:21 +00:00
Brendon Cahoon	e92b0e4047	[RegisterCoalescer] Do not call getInstructionIndex with DBG_VALUE An assert occurs when calling SlotIndexes::getInstructionIndex with a DBG_VALUE instruction because the function expects an instruction with a slot index. However, there is no slot index for a DBG_VALUE instruction. Differential Revision: https://reviews.llvm.org/D29048 llvm-svn: 294070	2017-02-04 00:10:22 +00:00
Matt Arsenault	b66d05f0e5	AMDGPU: Cleanup scalar_to_vector test llvm-svn: 294038	2017-02-03 20:49:48 +00:00
Matt Arsenault	8ba2864a71	AMDGPU: Set MCAsmInfo::PointerSize llvm-svn: 294031	2017-02-03 20:02:23 +00:00
Matt Arsenault	b1837f2207	AMDGPU: Fold fneg into fmin/fmax_legacy llvm-svn: 293972	2017-02-03 00:51:50 +00:00
Matt Arsenault	a7a8104cc4	AMDGPU: Fold fneg into fminnum/fmaxnum llvm-svn: 293968	2017-02-03 00:23:15 +00:00
Konstantin Zhuravlyov	2d394fc1ab	llvm-readobj: fix next note entry calculation and print unknown note types Differential Revision: https://reviews.llvm.org/D29131 llvm-svn: 293964	2017-02-02 23:44:49 +00:00
Matt Arsenault	fb406eaa7d	AMDGPU: Check if users of fneg can fold mods In multi-use cases this can save a few instructions. llvm-svn: 293962	2017-02-02 23:21:23 +00:00
Nirav Dave	63300d8c5e	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293893 which is miscompiling lua on ARM and bootstrapping for x86-windows. llvm-svn: 293915	2017-02-02 18:24:55 +00:00
Nirav Dave	d4909b474b	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Recommiting after fixing X86 inc/dec chain bug. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293893	2017-02-02 14:39:42 +00:00
Matt Arsenault	93d42a5136	AMDGPU: Use source modifiers with f16->f32 conversions The operand types were defined to fit the fp16_to_fp node, which has the half as an integer type. v_cvt_f32_f16 does support source modifiers, so change this to have an FP type and modifiers. For targets without legal f16, this requires recognizing the bit operations and trying to produce them. llvm-svn: 293857	2017-02-02 02:27:04 +00:00
Stanislav Mekhanoshin	19e9bdea6f	[AMDGPU] Account workgroup size in LDS occupancy limits Functions matching LDS use to occupancy return results for a workgroup of 64 workitems. The numbers has to be adjusted for bigger workgroups. For example a workgroup of size 256 already occupies 4 waves just by itself. Given that all numbers of LDS use in the compiler are per workgroup, occupancy shall be multiplied by 4 in this case. Each 64 workitems still limited by the same number, but 4 subrgoups 64 workitems each can afford 4 times more LDS to get the same occupancy. In addition change initializes LDS size in the subtarget to a real value for SI+ targets. This is required since LDS size is a variable in these calculations. Differential Revision: https://reviews.llvm.org/D29423 llvm-svn: 293837	2017-02-01 22:59:50 +00:00
Matt Arsenault	bec3ec8cca	AMDGPU: Improve nsw/nuw/exact when promoting uniform i16 ops These were simply preserving the flags of the original operation, which was too conservative in most cases and incorrect for mul. nsw/nuw may be needed for some combines to cleanup messes when intermediate sext_inregs are introduced later. Tested valid combinations with alive. llvm-svn: 293776	2017-02-01 16:25:23 +00:00
Kyle Butt	9386601a22	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well, subject to some simple frequency calculations. Differential Revision: https://reviews.llvm.org/D28583 llvm-svn: 293716	2017-01-31 23:48:32 +00:00
Matt Arsenault	18c6a375b8	AMDGPU: Use source mods with fcanonicalize llvm-svn: 293654	2017-01-31 17:28:40 +00:00
Tom Stellard	6191b5bef0	AMDGPU/SI: Fix inst-select-load-smrd.mir on some builds Summary: For some reason instructions are being inserted in the wrong order with some builds. I'm not sure why this is happening. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D29325 llvm-svn: 293639	2017-01-31 15:24:11 +00:00
Nicolai Haehnle	ebbe5ba42e	[DAGCombine] require UnsafeFPMath for re-association of addition Summary: The affected transforms all implicitly use associativity of addition, for which we usually require unsafe math to be enabled. The "Aggressive" flag is only meant to convey information about the performance of the fused ops relative to a fmul+fadd sequence. Fixes Bug 31626. Reviewers: spatel, hfinkel, mehdi_amini, arsenm, tstellarAMD Subscribers: jholewinski, nemanjai, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D28675 llvm-svn: 293635	2017-01-31 14:35:37 +00:00
Matt Arsenault	c230fcbb58	AMDGPU: Generalize matching of v_med3_f32 I think this is safe as long as no inputs are known to ever be nans. Also add an intrinsic for fmed3 to be able to handle all safe math cases. llvm-svn: 293598	2017-01-31 03:07:46 +00:00
Tom Stellard	f2ec17e0e6	Re-commit AMDGPU/GlobalISel: Add support for simple shaders Fix build when global-isel is disabled and fix a warning. Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293551	2017-01-30 21:56:46 +00:00
Stanislav Mekhanoshin	0a8e20606c	[AMDGPU] Internalize non-kernel symbols Since we have no call support and late linking we can produce code only for used symbols. This saves compilation time, size of the final executable, and size of any intermediate dumps. Run Internalize pass early in the opt pipeline followed by global DCE pass. To enable it RT can pass -amdgpu-internalize-symbols option. Differential Revision: https://reviews.llvm.org/D29214 llvm-svn: 293549	2017-01-30 21:05:18 +00:00
Matt Arsenault	293a680e93	AMDGPU: Undo sub x, c -> add x, -c canonicalization This is worse if the original constant is an inline immediate. This should also be done for 64-bit adds, but requires fixing operand folding bugs first. llvm-svn: 293540	2017-01-30 19:30:24 +00:00
Matt Arsenault	58721b2662	AMDGPU: Make i32 uaddo/usubo legal llvm-svn: 293514	2017-01-30 18:11:38 +00:00
Matt Arsenault	6748c8f28b	DAG: Fold fneg into compare with constant into the constant fcmp (fneg x), c, pred -> fcmp x, -c, (swap pred) InstCombine already does this. llvm-svn: 293512	2017-01-30 17:57:28 +00:00
Tom Stellard	d839aa304c	Revert "AMDGPU/GlobalISel: Add support for simple shaders" This reverts commit r293503. Revert while I investigate some of the buildbot failures. llvm-svn: 293509	2017-01-30 17:42:41 +00:00
Tom Stellard	ca8f087f31	AMDGPU/GlobalISel: Add support for simple shaders Summary: We can select constant/global G_LOAD, global G_STORE, and G_GEP. Reviewers: qcolombet, MatzeB, t.p.northover, ab, arsenm Subscribers: mehdi_amini, vkalintiris, kzhuravl, wdng, nhaehnle, mgorny, yaxunl, tony-tye, modocache, llvm-commits, dberris Differential Revision: https://reviews.llvm.org/D26730 llvm-svn: 293503	2017-01-30 17:09:15 +00:00
Matt Arsenault	c4ccc9b791	DAG: Constant fold fp16_to_fp/fp16_to_fp This fixes emitting conversions of constants on targets without legal f16 that need to use these for legalization. llvm-svn: 293499	2017-01-30 16:57:41 +00:00
Matt Arsenault	9317a1de75	AMDGPU: Enable FeatureFlatForGlobal on Volcanic Islands Accomplishes what r292982 was supposed to, which ended up only really making the necessary test changes. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 293310	2017-01-27 17:42:26 +00:00
Stanislav Mekhanoshin	32e634e989	[AMDGPU] Turn AMDGPUUnifyMetadata back into module pass With the adjustPassManager interface that is now possible to use custom early module passes. Differential Revision: https://reviews.llvm.org/D29189 llvm-svn: 293300	2017-01-27 16:38:10 +00:00
Nirav Dave	2a565d7a4e	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r293184 which is failing in LTO builds llvm-svn: 293188	2017-01-26 16:46:13 +00:00
Nirav Dave	c7f26fe4ae	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 293184	2017-01-26 16:02:24 +00:00
Valery Pykhtin	ab0bb6ac0c	[AMDGPU] Fix typo in GCNSchedStrategy Differential revision: https://reviews.llvm.org/D28980 llvm-svn: 293171	2017-01-26 10:51:47 +00:00
Matt Arsenault	a3987fa2ed	AMDGPU: Fold fneg into round instructions llvm-svn: 293127	2017-01-26 01:25:36 +00:00
Matt Arsenault	6b7ba8493b	AMDGPU: Set call_convention bit in kernel_code_t According to the documentation this is supposed to be -1 if indirect calls are not supported. llvm-svn: 293081	2017-01-25 20:21:57 +00:00
Matt Arsenault	9f4074601e	AMDGPU: Check nsz instead of unsafe math llvm-svn: 293028	2017-01-25 06:27:02 +00:00
Matt Arsenault	093401c700	DAG: Recognize no-signed-zeros-fp-math attribute clang already emits this with -cl-no-signed-zeros, but codegen doesn't do anything with it. Treat it like the other fast math attributes, and change one place to use it. llvm-svn: 293024	2017-01-25 06:08:42 +00:00
Matt Arsenault	b570b62964	DAGCombiner: Allow negating ConstantFP after legalize llvm-svn: 293019	2017-01-25 04:54:34 +00:00
Matt Arsenault	ec49368879	AMDGPU: Implement early ifcvt target hooks. Leave early ifcvt disabled for now since there are some shader-db regressions. This causes some immediate improvements, but could be better. The cost checking that the pass does is based on critical path length for out of order CPUs which we do not want so it skips out on many cases we want. llvm-svn: 293016	2017-01-25 04:25:02 +00:00
Matt Arsenault	8060661ae3	AMDGPU: Remove spurious out branches after a kill The sequence like this: v_cmpx_le_f32_e32 vcc, 0, v0 s_branch BB0_30 s_cbranch_execnz BB0_30 ; BB#29: exp null off, off, off, off done vm s_endpgm BB0_30: ; %endif110 is likely wrong. The s_branch instruction will unconditionally jump to BB0_30 and the skip block (exp done + endpgm) inserted for performing the kill instruction will never be executed. This results in a GPU hang with Star Ruler 2. The s_branch instruction is added during the "Control Flow Optimizer" pass which seems to re-organize the basic blocks, and we assume that SI_KILL_TERMINATOR is always the last instruction inside a basic block. Thus, after inserting a skip block we just go to the next BB without looking at the subsequent instructions after the kill, and the s_branch op is never removed. Instead, we should remove the unconditional out branches and let skip the two instructions if the exec mask is non-zero. This patch fixes the GPU hang and doesn't introduce any regressions with "make check". Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99019 Patch by Samuel Pitoiset <samuel.pitoiset@gmail.com> llvm-svn: 292985	2017-01-24 22:18:39 +00:00
Matt Arsenault	81a9bfe915	Enable FeatureFlatForGlobal on Volcanic Islands This switches to the workaround that HSA defaults to for the mesa path. This should be applied to the 4.0 branch. Patch by Vedran Miletić <vedran@miletic.net> llvm-svn: 292982	2017-01-24 22:02:15 +00:00
Changpeng Fang	69ca91bb54	AMDGPU/SI: Give up in promote alloca when a pointer may be captured. Differential Revision: http://reviews.llvm.org/D28970 Reviewer: Matt llvm-svn: 292966	2017-01-24 19:06:28 +00:00
Stanislav Mekhanoshin	324d0de803	[AMDGPU] Add VGPR copies post regalloc fix pass Regalloc creates COPY instructions which do not formally use VALU. That results in v_mov instructions displaced after exec mask modification. One pass which do it is SIOptimizeExecMasking, but potentially it can be done by other passes too. This patch adds a pass immediately after regalloc to add implicit exec use operand to all VGPR copy instructions. Differential Revision: https://reviews.llvm.org/D28874 llvm-svn: 292956	2017-01-24 17:46:17 +00:00
Wei Ding	e779f33d36	AMDGPU : Add trap handler support. llvm-svn: 292893	2017-01-24 06:41:21 +00:00
Matt Arsenault	1595f1e4ce	AMDGPU: Custom lower more vector operations This avoids stack usage. llvm-svn: 292846	2017-01-23 23:09:58 +00:00
Matt Arsenault	b7e8aad4f5	DAG: Don't fold vector extract into load if target doesn't want to Fixes turning a 32-bit scalar load into an extending vector load for AMDGPU when dynamically indexing a vector. llvm-svn: 292842	2017-01-23 22:48:53 +00:00
Matt Arsenault	bd33194651	AMDGPU: Combine fp16/fp64 subtarget features The same control register controls both, and are set to the same defaults. Keep the old names around as aliases. llvm-svn: 292837	2017-01-23 22:31:03 +00:00
Matt Arsenault	bb6aab2eaf	DAG: Allow legalization of fcanonicalize vector types llvm-svn: 292814	2017-01-23 18:52:26 +00:00
Benjamin Kramer	91e0605f7f	Fix some broken CHECK lines. The colon is important. llvm-svn: 292761	2017-01-22 20:28:56 +00:00
Jan Vesely	dc9e7b7339	AMDGPU/R600: Serialize vector trunc stores to private AS Add DUMMY_CHAIN SDNode to denote stores of interest Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=28915 Bugzilla: https://llvm.org/bugs/show_bug.cgi?id=30411 Differential Revision: https://reviews.llvm.org/D27964 llvm-svn: 292651	2017-01-20 21:24:26 +00:00
Greg Parker	8a5271f313	[test] Remove a unwanted match for `XFAIL:`. llvm-svn: 292567	2017-01-20 02:01:04 +00:00
Stanislav Mekhanoshin	e89f0cb8a8	[AMDGPU] Prevent spills before exec mask is restored Inline spiller can decide to move a spill as early as possible in the basic block. It will skip phis and label, but we also need to make sure it skips instructions in the basic block prologue which restore exec mask. Added isPositionLike callback in TargetInstrInfo to detect instructions which shall be skipped in addition to common phis, labels etc. Differential Revision: https://reviews.llvm.org/D27997 llvm-svn: 292554	2017-01-20 00:44:31 +00:00
Matt Arsenault	ff7ad6a7f3	AMDGPU: Disable some fneg combines unless nsz For -(x + y) -> (-x) + (-y), if x == -y, this would change the result from -0.0 to 0.0. Since the fma/fmad combine is an extension of this problem it also applies there. fmul should be fine, and I don't think any of the unary operators or conversions should be a problem either. llvm-svn: 292473	2017-01-19 06:35:27 +00:00
Matt Arsenault	a73166cb45	AMDGPU: Remove modifiers from v_div_scale_* They seem to produce nonsense results when used. This should be applied to the release branch. llvm-svn: 292472	2017-01-19 06:04:12 +00:00
Stanislav Mekhanoshin	3a97f30b01	[AMDGPU] Do not allow register coalescer to create big superregs Limit register coalescer by not allowing it to artificially increase size of registers beyond dword. Such super-registers are in fact register sequences and not distinct HW registers. With more super-regs we would need to allocate adjacent registers and constraint regalloc more than needed. Moreover, our super registers are overlapping. For instance we have VGPR0_VGPR1_VGPR2, VGPR1_VGPR2_VGPR3, VGPR2_VGPR3_VGPR4 etc, which complicates registers allocation even more, resulting in excessive spilling. Differential Revision: https://reviews.llvm.org/D28782 llvm-svn: 292413	2017-01-18 17:30:05 +00:00
Matt Arsenault	15def6a25a	DAG: Consider nnan in isKnownNeverNaN llvm-svn: 292328	2017-01-18 02:10:08 +00:00
Matt Arsenault	fddb6ed60b	AMDGPU: Add replacement export intrinsics llvm-svn: 292205	2017-01-17 07:26:53 +00:00
Jan Vesely	5574abaf58	ADMGPU/EG,CM: Implement _noret global atomics _RTN versions will be a lot more complicated Differential Revision: https://reviews.llvm.org/D28067 llvm-svn: 292162	2017-01-16 21:20:13 +00:00
Konstantin Zhuravlyov	fb2ba32d19	[AMDGPU] Implement f16 fcopysign and fcopysign(f32, f64) Differential Revision: https://reviews.llvm.org/D28496 llvm-svn: 291954	2017-01-13 19:49:25 +00:00
Matt Arsenault	e493fa036b	AMDGPU: Skip fneg/select combine if it can fold into other llvm-svn: 291792	2017-01-12 18:58:15 +00:00
Matt Arsenault	9901525f48	AMDGPU: Fold free fneg into sin llvm-svn: 291790	2017-01-12 18:48:09 +00:00
Matt Arsenault	c660bfe147	AMDGPU: Fold fneg into fmul_legacy llvm-svn: 291784	2017-01-12 18:26:30 +00:00
Matt Arsenault	29ff5d7995	AMDGPU: Fold fneg into rcp llvm-svn: 291779	2017-01-12 17:46:35 +00:00
Matt Arsenault	a1e740ea09	AMDGPU: Fold fneg into fp_round llvm-svn: 291778	2017-01-12 17:46:33 +00:00
Matt Arsenault	e4f0519d94	AMDGPU: Fold fneg into fp_extend llvm-svn: 291777	2017-01-12 17:46:28 +00:00
Matt Arsenault	595d567731	AMDGPU: Fold fneg into fma or fmad Patch mostly by Fiona Glaser llvm-svn: 291733	2017-01-12 00:32:16 +00:00
Matt Arsenault	9b4e6e8e41	AMDGPU: Fold fneg into fmul Patch mostly by Fiona Glaser llvm-svn: 291732	2017-01-12 00:23:20 +00:00
Matt Arsenault	e25ed6e79f	AMDGPU: Fold fneg into fadd Patch mostly by Fiona Glaser llvm-svn: 291731	2017-01-12 00:09:34 +00:00
Matt Arsenault	36b1afa3ef	AMDGPU: Pull fneg/fabs out of a select Allows better source modifier usage. llvm-svn: 291729	2017-01-11 23:57:38 +00:00
Matt Arsenault	fc7a8bb178	AMDGPU: Fix shrinking of addc/subb. To shrink to VOP2 the input carry must also be VCC. llvm-svn: 291720	2017-01-11 22:58:12 +00:00
Matt Arsenault	8582abf465	AMDGPU: Fix sext_inreg for i1 in i16 This produces worse code when i16 is legal, mostly due to combines getting confused by conversions inserted for uniform 16-bit operations. llvm-svn: 291717	2017-01-11 22:35:22 +00:00
Matt Arsenault	e4f8f3be16	AMDGPU: Fix breaking VOP3 v_add_i32s This was shrinking the instruction even though the carry output register was a virtual register, not known VCC. llvm-svn: 291716	2017-01-11 22:35:17 +00:00
Matt Arsenault	f781954dd4	AMDGPU: Fix folding immediates into mac src2 Whether it is legal or not needs to check for the instruction it will be replaced with. llvm-svn: 291711	2017-01-11 22:00:02 +00:00
Kyle Butt	6035a9f12a	Revert "CodeGen: Allow small copyable blocks to "break" the CFG." This reverts commit ada6595a526d71df04988eb0a4b4fe84df398ded. This needs a simple probability check because there are some cases where it is not profitable. llvm-svn: 291695	2017-01-11 19:55:19 +00:00
Matt Arsenault	2c00794e15	DAGCombiner: Add hasOneUse checks to fadd/fma combine Even with aggressive fusion enabled, this requires duplicating the fmul, or increases an fadd to another fma which is not an improvement. llvm-svn: 291642	2017-01-11 02:02:12 +00:00
Jan Vesely	87ecb0edfb	AMDGPU/EG,CM: Add fp16 conversion instructions Differential Revision: https://reviews.llvm.org/D28164 llvm-svn: 291622	2017-01-11 00:12:39 +00:00
Matt Arsenault	6b917afcf9	AMDGPU: Constant fold when immediate is materialized In future commits these patterns will appear after moveToVALU changes. llvm-svn: 291615	2017-01-10 23:32:04 +00:00
Kyle Butt	af32417840	CodeGen: Allow small copyable blocks to "break" the CFG. When choosing the best successor for a block, ordinarily we would have preferred a block that preserves the CFG unless there is a strong probability the other direction. For small blocks that can be duplicated we now skip that requirement as well. Differential revision: https://reviews.llvm.org/D27742 llvm-svn: 291609	2017-01-10 23:04:30 +00:00
Matt Arsenault	f633d550ca	DAG: Avoid OOB when legalizing vector indexing If a vector index is out of bounds, the result is supposed to be undefined but is not undefined behavior. Change the legalization for indexing the vector on the stack so that an out of bounds index does not create an out of bounds memory access. llvm-svn: 291604	2017-01-10 22:02:30 +00:00
Matt Arsenault	9cbcb1f0dc	AMDGPU: Add tests for HasMultipleConditionRegisters This was enabled without many specific tests or the comment. llvm-svn: 291586	2017-01-10 19:08:15 +00:00
Matt Arsenault	4872e0b42c	AMDGPU: Add Assert[SZ]Ext during argument load creation For i16 zeroext arguments when i16 was a legal type, the known bits information from the truncate was lost. Insert a zeroext so the known bits optimizations work with the 32-bit loads. Fixes code quality regressions vs. SI in min.ll test. llvm-svn: 291461	2017-01-09 18:52:39 +00:00
Bjorn Pettersson	b5889ab1a8	[SelectionDAG] Fix in legalization of UMAX/SMAX/UMIN/SMIN. Solves PR31486. Summary: Originally i64 = umax t8, Constant:i64<4> was expanded into i32,i32 = umax Constant:i32<0>, Constant:i32<0> i32,i32 = umax t7, Constant:i32<4> Now instead the two produced umax:es return i32 instead of i32, i32. Thanks to Jan Vesely for help with the test case. Patch by mikael.holmen at ericsson.com Reviewers: bogner, jvesely, tstellarAMD, arsenm Subscribers: test, wdng, RKSimon, arsenm, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D28135 llvm-svn: 291441	2017-01-09 12:03:50 +00:00
Jan Vesely	523782f6c1	AMDGPU/R600: Don't use REGISTER_{LOAD,STORE} ISD nodes This will make transition to SCRATCH_MEMORY easier Differential Revision: https://reviews.llvm.org/D24746 llvm-svn: 291279	2017-01-06 21:00:46 +00:00
Konstantin Zhuravlyov	260cc2dc04	[AMDGPU] Do not emit .AMDGPU.config section for amdhsa Differential Revision: https://reviews.llvm.org/D27732 llvm-svn: 291245	2017-01-06 17:02:10 +00:00
Jan Vesely	25eec05837	AMDGPU/SI: Implement sendmsghalt intrinsic v2: expose using amdgcn prefix Differential Revision: https://reviews.llvm.org/D23511 llvm-svn: 290977	2017-01-04 18:06:55 +00:00
Matt Arsenault	1edd642a1d	AMDGPU: Invert cmp + select with constant Canonicalize a select with a constant to the false side. This enables more instruction shrinking opportunities since an inline immediate can be used for the false side of v_cndmask_b32_e32. This seems to usually be better but causes some code size regressions in some tests. llvm-svn: 290372	2016-12-22 21:40:08 +00:00
Matt Arsenault	66ebaecd36	AMDGPU: Use i16 for i16 shift amount llvm-svn: 290351	2016-12-22 16:36:25 +00:00
Matt Arsenault	9419a1ea69	AMDGPU: Use i16 comparison instructions llvm-svn: 290348	2016-12-22 16:27:11 +00:00
Matt Arsenault	ee5d8d2da0	AMDGPU: Swap order of operands in fadd/fsub combine FMA is canonicalized to constant in the middle operand. Do the same so fmad matches and avoid an extra combine step. llvm-svn: 290313	2016-12-22 04:03:40 +00:00
Matt Arsenault	9d4a891569	AMDGPU: Check fast math flags in fadd/fsub combines llvm-svn: 290312	2016-12-22 04:03:35 +00:00
Matt Arsenault	dd6b858bbf	AMDGPU: Form more FMAs if fusion is allowed Extend the existing fadd/fsub->fmad combines to produce FMA if allowed. llvm-svn: 290311	2016-12-22 03:55:35 +00:00
Matt Arsenault	5ba9667c15	AMDGPU: Enable some f32 fadd/fsub combines for f16 llvm-svn: 290308	2016-12-22 03:40:39 +00:00
Matt Arsenault	d7ec3d5ba4	AMDGPU: Implement isFMAFasterThanFMulAndFAdd for f16 llvm-svn: 290307	2016-12-22 03:21:48 +00:00
Matt Arsenault	808557202a	AMDGPU: setcc test cleanup llvm-svn: 290306	2016-12-22 03:21:45 +00:00
Matt Arsenault	5ecf306700	AMDGPU: Allow rcp and rsq usage with f16 llvm-svn: 290302	2016-12-22 03:05:44 +00:00
Matt Arsenault	a844bf67ff	AMDGPU: Custom lower f16 fdiv llvm-svn: 290301	2016-12-22 03:05:41 +00:00
Matt Arsenault	263e20ee06	AMDGPU: Implement f16 fcanonicalize llvm-svn: 290300	2016-12-22 03:05:37 +00:00
Matt Arsenault	ea409cc20d	AMDGPU: Allow 16-bit types in inline asm constraints llvm-svn: 290193	2016-12-20 19:06:12 +00:00
Matt Arsenault	084fcad6fd	AMDGPU: Run fp combine tests on VI llvm-svn: 290192	2016-12-20 18:55:11 +00:00
Matt Arsenault	63d92e4ebb	AMDGPU: Don't add same instruction multiple times to worklist When the instruction is processed the first time, it may be deleted resulting in crashes. While the new test adds the same user to the worklist twice, this particular case doesn't crash but I'm not sure why. llvm-svn: 290191	2016-12-20 18:55:06 +00:00
Tom Stellard	2c0dd4ec69	AMDGPU/SI: Add a MachineMemOperand when lowering llvm.amdgcn.buffer.load.* Reviewers: arsenm, nhaehnle, mareko Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27834 llvm-svn: 290184	2016-12-20 17:19:44 +00:00
Tom Stellard	ec757acb99	AMDGPU/SI: Add a MachineMemOperand to MIMG instructions Summary: Without a MachineMemOperand, the scheduler was assuming MIMG instructions were ordered memory references, so no loads or stores could be reordered across them. Reviewers: arsenm Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27536 llvm-svn: 290179	2016-12-20 15:52:17 +00:00
Konstantin Zhuravlyov	3febbc8b1e	[AMDGPU] When unifying metadata, add operands to named metadata individually Differential Revision: https://reviews.llvm.org/D27725 llvm-svn: 290114	2016-12-19 16:54:24 +00:00
Matt Arsenault	805345dac1	AMDGPU: Fix broken check prefix in test llvm-svn: 290050	2016-12-17 20:03:59 +00:00
Matt Arsenault	b1034a224d	AMDGPU: Select branch on undef to uniform scc branch llvm-svn: 289877	2016-12-15 21:57:11 +00:00
Matt Arsenault	feb1ec1fb2	AMDGPU: Fix asserting on returned tail calls llvm-svn: 289868	2016-12-15 20:50:12 +00:00
Alexander Timofeev	aa7ea574e9	Fix for regression after Global Load Scalarization patch llvm-svn: 289822	2016-12-15 15:17:19 +00:00
Justin Lebar	50eaf54428	[AMDGPU] Fix runtime-metadata.ll test so it doesn't leave an object file in the source tree. llvm-svn: 289742	2016-12-14 23:24:43 +00:00
Yaxun Liu	98de4b3c84	AMDGPU: Emit runtime metadata version 2 as YAML Differential Revision: https://reviews.llvm.org/D25046 llvm-svn: 289674	2016-12-14 17:16:52 +00:00
Nirav Dave	9fd3ae9cf9	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." Reverting due to ARM MCJIT and MIPS LLD error. This reverts commit r289659. llvm-svn: 289667	2016-12-14 16:43:44 +00:00
Matt Arsenault	c74ace61e1	AMDGPU: Change vintrp printing llvm-svn: 289664	2016-12-14 16:36:12 +00:00
Nirav Dave	afe2eccae3	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing after removing load-store factoring through token factors in favor of improved token factor operand pruning Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289659	2016-12-14 15:44:26 +00:00
Sanjoy Das	a0e8011216	[Verifier] Add verification for TBAA metadata Summary: This change adds some verification in the IR verifier around struct path TBAA metadata. Other than some basic sanity checks (e.g. we get constant integers where we expect constant integers), this checks: - That by the time an struct access tuple `(base-type, offset)` is "reduced" to a scalar base type, the offset is `0`. For instance, in C++ you can't start from, say `("struct-a", 16)`, and end up with `("int", 4)` -- by the time the base type is `"int"`, the offset better be zero. In particular, a variant of this invariant is needed for `llvm::getMostGenericTBAA` to be correct. - That there are no cycles in a struct path. - That struct type nodes have their offsets listed in an ascending order. - That when generating the struct access path, you eventually reach the access type listed in the tbaa tag node. Reviewers: dexonsmith, chandlerc, reames, mehdi_amini, manmanren Subscribers: mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D26438 llvm-svn: 289402	2016-12-11 20:07:15 +00:00
Matt Arsenault	78957bd5fc	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307	2016-12-10 00:52:50 +00:00
Matt Arsenault	c2c2a10170	AMDGPU: Fix handling of 16-bit immediates Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306	2016-12-10 00:39:12 +00:00
Matt Arsenault	61a1b18506	AMDGPU: Change vintrp printing to better match sc Some of the immediates need to be printed differently eventually. llvm-svn: 289291	2016-12-10 00:23:12 +00:00
Matt Arsenault	358c1d6f7e	AMDGPU: Cleanup checks in sext_inreg test llvm-svn: 289272	2016-12-09 21:10:41 +00:00
Marek Olsak	191936bbf2	AMDGPU/SI: Don't reserve XNACK when it's disabled Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262	2016-12-09 19:49:54 +00:00
Marek Olsak	bb1829874b	AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261	2016-12-09 19:49:48 +00:00
Marek Olsak	ed98e90e56	AMDGPU/SI: Allow using SGPRs 96-101 on VI Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260	2016-12-09 19:49:40 +00:00
Matthias Braun	12c61bab00	Move .mir tests to appropriate directories test/CodeGen/MIR should contain tests that intent to test the MIR printing or parsing. Tests that test something else should be in test/CodeGen/TargetName even when they are written in .mir. As a rule of thumb, only tests using "llc -run-pass none" should be in test/CodeGen/MIR. llvm-svn: 289254	2016-12-09 19:08:15 +00:00
Matt Arsenault	eb15d80d21	AMDGPU: Fix i128 mul llvm-svn: 289231	2016-12-09 17:49:14 +00:00
Nirav Dave	0b475dec77	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226	2016-12-09 17:18:24 +00:00
Nirav Dave	075ae0197d	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221	2016-12-09 16:15:12 +00:00
Tom Stellard	5569b3eb62	AMDGPU/SI: Don't mark VINTRP instructions as mayLoad Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219	2016-12-09 15:57:15 +00:00
Matt Arsenault	7729bf075d	AMDGPU: Select i16 instructions to VOP3 forms These were selecting directly to the VOP2 form instead of VOP3 like the i32 instructions. Fixes regressions in future commits where an immediate isn't folded because it was initially used for the second operand. Because uniform 16-bit operations are promoted to i32, it's difficult to get a simple testcase where this matters. Fold failures in SIFoldOperands here tend to be hidden by commute and fold in SIShrinkInstructions. llvm-svn: 289189	2016-12-09 06:19:12 +00:00
Matt Arsenault	2c46312910	AMDGPU: Make f16 ConstantFP legal Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096	2016-12-08 20:14:46 +00:00
Matt Arsenault	c57e18e32f	AMDGPU: Fix commuting v_sub_u16 The correct commutable opcode was set to itself, so this was simply swapping the operands to commute instead of also changing the opcode to v_subrev_u16. llvm-svn: 289093	2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin	71ed6f04a8	[AMDGPU] Add amdgpu-unify-metadata pass Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 llvm-svn: 289092	2016-12-08 19:46:04 +00:00
Alexander Timofeev	3a9e77fc0f	[AMDGPU] Scalarization of global uniform loads. Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076	2016-12-08 17:28:47 +00:00
Nicolai Haehnle	2a19d502fb	AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048	2016-12-08 14:08:02 +00:00
Tom Stellard	42afa7429f	AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879	2016-12-07 02:42:15 +00:00
Tom Stellard	4f6c2a6b37	AMDGPU: Add llvm.amdgcn.interp.mov intrinsic Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26725 llvm-svn: 288865	2016-12-06 23:52:13 +00:00
Matt Arsenault	432b06cd5e	AMDGPU: Fix crash on i16 constant expression llvm-svn: 288861	2016-12-06 23:18:06 +00:00
Tom Stellard	a983d5f77b	AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27416 llvm-svn: 288852	2016-12-06 21:53:10 +00:00
Tom Stellard	3c0e23d86d	AMDGPU/SI: Don't move copies of immediates to the VALU Summary: If we write an immediate to a VGPR and then copy the VGPR to an SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than moving the copy to the SALU. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27272 llvm-svn: 288849	2016-12-06 21:13:30 +00:00
Matt Arsenault	03686f0b49	AMDGPU: Don't required structured CFG The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744	2016-12-06 01:02:51 +00:00
Matt Arsenault	20c475682e	AMDGPU: Change how exp is printed This is an improvement over a long list of unreadable numbers. A follow up patch will try to match how sc formats these. llvm-svn: 288697	2016-12-05 20:31:49 +00:00
Matt Arsenault	fdf7e5830b	AMDGPU: Refactor exp instructions Structure the definitions a bit more like the other classes. The main change here is to split EXP with the done bit set to a separate opcode, so we can set mayLoad = 1 so that it won't be reordered before the other exp stores, since this has the special constraint that if the done bit is set then this should be the last exp in she shader. Previously all exp instructions were inferred to have unmodeled side effects. llvm-svn: 288695	2016-12-05 20:23:10 +00:00
Nicolai Haehnle	65cddb3988	[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506	2016-12-02 16:06:18 +00:00
Matt Arsenault	f24fcd4ad7	AMDGPU: Use wider scalar spills for SGPR spilling Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445	2016-12-02 00:54:45 +00:00
Matthias Braun	d9b9212e23	RegisterCoalscer: Only coalesce complete reserved registers. The coalescer eliminates copies from reserved registers of the form: %vregX = COPY %rY in the case where %rY is a reserved register. However this turns out to be invalid if only some of the subregisters are reserved (see also https://reviews.llvm.org/D26648). Differential Revision: https://reviews.llvm.org/D26687 llvm-svn: 288428	2016-12-01 22:39:51 +00:00
Matt Arsenault	ba1685158e	AMDGPU: Move mir tests into mir test directory llvm-svn: 288262	2016-11-30 18:50:26 +00:00
Matt Arsenault	30967b5c23	AMDGPU: Disallow exec as SMEM instruction operand This is not in the list of valid inputs for the encoding. When spilling, copies from exec can be folded directly into the spill instruction which results in broken stores. This only fixes the operand constraints, more codegen work is required to avoid emitting the invalid spills. This sort of breaks the dbg.value test. Because the register class of the s_load_dwordx2 changes, there is a copy to SReg_64, and the copy is the operand of dbg_value. The copy is later dead, and removed from the dbg_value. llvm-svn: 288191	2016-11-29 19:39:53 +00:00
Matt Arsenault	3c5076a42d	AMDGPU: Materialize frame index before add It isn't generally safe to fold the frame index directly into the operand since it will possibly not be an inline immediate after it is expanded. This surprisingly seems to produce better code, since the FI doesn't prevent folding other immediate operands. llvm-svn: 288185	2016-11-29 19:20:48 +00:00
Tom Stellard	4f879c1dc0	AMDGPU/SI: Avoid moving PHIs to VALU when phi values are defined in scalar branches Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23417 llvm-svn: 288095	2016-11-29 00:46:46 +00:00
Stanislav Mekhanoshin	92eac85076	[AMDGPU] Allow hoisting of comparisons out of a loop and eliminate condition copies Codegen prepare sinks comparisons close to a user is we have only one register for conditions. For AMDGPU we have many SGPRs capable to hold vector conditions. Changed BE to report we have many condition registers. That way IR LICM pass would hoist an invariant comparison out of a loop and codegen prepare will not sink it. With that done a condition is calculated in one block and used in another. Current behavior is to store workitem's condition in a VGPR using v_cndmask_b32 and then restore it with yet another v_cmp instruction from that v_cndmask's result. To mitigate the issue a propagation of source SGPR pair in place of v_cmp is implemented. Additional side effect of this is that we may consume less VGPRs at a cost of more SGPRs in case if holding of multiple conditions is needed, and that is a clear win in most cases. Differential Revision: https://reviews.llvm.org/D26114 llvm-svn: 288053	2016-11-28 18:58:49 +00:00
Tom Stellard	f3e7f685e9	AMDGPU/SI: Use float as the operand type for amdgcn.interp intrinsics Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26724 llvm-svn: 287962	2016-11-26 02:26:04 +00:00
Marek Olsak	30b976334f	AMDGPU/SI: Add back reverted SGPR spilling code, but disable it suggested as a better solution by Matt llvm-svn: 287942	2016-11-25 17:37:09 +00:00
Marek Olsak	9d8f0b805a	Revert "AMDGPU: Implement SGPR spilling with scalar stores" This reverts commit 4404d0d6e354e80dd7f8f0a0e12d8ad809cf007e. llvm-svn: 287936	2016-11-25 16:03:34 +00:00
Marek Olsak	35ac58863e	Revert "AMDGPU: Fix MMO when splitting spill" This reverts commit 79d4f8b8b1ce430c3d5dac4fc72a9eebaed24fe1. llvm-svn: 287935	2016-11-25 16:03:27 +00:00
Marek Olsak	c530d56272	Revert "AMDGPU: Make m0 unallocatable" This reverts commit 124ad83dae04514f943902446520c859adee0e96. llvm-svn: 287932	2016-11-25 16:03:15 +00:00
Marek Olsak	55154098e4	Revert "AMDGPU: Preserve m0 value when spilling" This reverts commit a5a179ffd94fd4136df461ec76fb30f04afa87ce. llvm-svn: 287930	2016-11-25 16:03:02 +00:00
Matt Arsenault	4a06c5b78a	AMDGPU: Preserve m0 value when spilling llvm-svn: 287844	2016-11-24 00:26:50 +00:00
Matt Arsenault	9a257a9a17	AMDGPU: Make m0 unallocatable m0 may need to be written for spill code, so we don't want general code uses relying on the value stored in it. This introduces a few code quality regressions where copies from m0 are not coalesced into copies of a copy of m0. llvm-svn: 287841	2016-11-24 00:26:40 +00:00
Matt Arsenault	dae776c6dc	AMDGPU: Fix MMO when splitting spill The size and offset were wrong. The size of the object was being used for the size of the access, when here it is really being split into 4-byte accesses. The underlying object size is set in the MachinePointerInfo, which also didn't have the offset set. llvm-svn: 287806	2016-11-23 20:52:53 +00:00
Stanislav Mekhanoshin	e6e37f3c8a	[AMDGPU] Fix multiple vreg definitions in si-lower-control-flow Differential Revision: https://reviews.llvm.org/D26939 llvm-svn: 287608	2016-11-22 01:42:34 +00:00
Matt Arsenault	768b88b4bd	DAG: Ignore call site attributes when emitting target intrinsic A target intrinsic may be defined as possibly reading memory, but the call site may have additional knowledge that it doesn't read memory. The intrinsic lowering will expect the pessimistic assumption of the intrinsic definition, so the chain should still be used. llvm-svn: 287593	2016-11-21 22:56:42 +00:00
Konstantin Zhuravlyov	aaff08fa3a	[AMDGPU] Change frexp.exp intrinsic to return i16 for f16 input Differential Revision: https://reviews.llvm.org/D26862 llvm-svn: 287389	2016-11-18 22:31:08 +00:00
Tom Stellard	a3b8754644	AMDGPU/SI: Remove zero_extend patterns for i16 ops selected to 32-bit insts Summary: The 32-bit instructions don't zero the high 16-bits like the 16-bit instructions do. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D26828 llvm-svn: 287342	2016-11-18 13:53:34 +00:00
Nicolai Haehnle	49cca90a3f	AMDGPU: Fix legalization of MUBUF instructions in shaders Summary: The addr64-based legalization is incorrect for MUBUF instructions with idxen set as well as for BUFFER_LOAD/STORE_FORMAT_* instructions. This affects e.g. shaders that access buffer textures. Since we never actually need the addr64-legalization in shaders, this patch takes the easy route and keys off the calling convention. If this ever affects (non-OpenGL) compute, the type of legalization needs to be chosen based on some TSFlag. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98664 Reviewers: arsenm, tstellarAMD Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26747 llvm-svn: 287339	2016-11-18 11:55:52 +00:00
Matt Arsenault	0fe623be4f	AMDGPU: Fix crash on illegal type for inlineasm There are still crashes on non-MVT types in other places. llvm-svn: 287310	2016-11-18 04:42:57 +00:00
Konstantin Zhuravlyov	66cc77bb5b	Revert "AMDGPU: Enable ConstrainCopy DAG mutation" This reverts commit r287146. This breaks few conformance tests. llvm-svn: 287233	2016-11-17 16:41:49 +00:00
Konstantin Zhuravlyov	0a87e18c32	[AMDGPU] Add missing test for rL287203 llvm-svn: 287204	2016-11-17 04:33:20 +00:00
Konstantin Zhuravlyov	950d9c18e0	[AMDGPU] Promote f16/i16 conversions to f32/i32 llvm-svn: 287201	2016-11-17 04:00:46 +00:00

... 2 3 4 5 6 ...

1017 Commits