llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Craig Topper	bff032a8e3	[X86] Split VRNDSCALE/VREDUCE/VGETMANT/VRANGE ISD nodes into versions with and without the rounding operand. NFCI I want to reuse the VRNDSCALE node for the legacy SSE rounding intrinsics so that those intrinsics can use EVEX instructions. All of these nodes share tablegen multiclasses so I split them all so that they all remain similar in their implementations. llvm-svn: 318007	2017-11-13 02:02:58 +00:00
Matt Arsenault	eaea634604	AMDGPU: Select d16 loads into low component of register llvm-svn: 318005	2017-11-13 00:22:09 +00:00
Craig Topper	45210b7de6	[X86] Add an X86ISD::RANGES opcode to use for the scalar intrinsics. This fixes a bug where we selected packed instructions for scalar intrinsics. llvm-svn: 317999	2017-11-12 18:51:09 +00:00
Craig Topper	bc3b5e9708	[X86] Remove some no longer needed intrinsic lowering code. llvm-svn: 317997	2017-11-12 18:51:06 +00:00
Mandeep Singh Grang	32947f7b72	[llvm] Remove redundant return [NFC] Reviewers: davidxl, olista01, Eugene.Zelenko Reviewed By: Eugene.Zelenko Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D39917 llvm-svn: 317995	2017-11-12 03:47:50 +00:00
Craig Topper	d9e3d1df1d	[X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint when avx512vl is enabled. This matches what we do for scalar and 512-bit types. llvm-svn: 317991	2017-11-11 21:44:51 +00:00
Simon Pilgrim	ca9ad6eb90	[X86] Attempt to match multiple binary reduction ops at once. NFCI matchBinOpReduction currently matches against a single opcode, but we already have a case where we repeat calls to try to match against AND/OR and I'll be shortly adding another case for SMAX/SMIN/UMAX/UMIN (D39729). This NFCI patch alters matchBinOpReduction to try and pattern match against any of the provided list of candidate bin ops at once to save time. Differential Revision: https://reviews.llvm.org/D39726 llvm-svn: 317985	2017-11-11 18:16:55 +00:00
Craig Topper	1561471af3	[X86] Add scalar register class versions of VRNDSCALE instructions and rename the existing versions to _Int. This is consistent with out normal implementation of scalar instructions. While there disable load folding for the patterns with IMPLICIT_DEF unless optimizing for size which is also our standard practice. llvm-svn: 317977	2017-11-11 08:24:15 +00:00
Craig Topper	b00f777f2d	[X86] Inline some SDNode operand multiclass operands that don't vary. NFC llvm-svn: 317975	2017-11-11 08:24:12 +00:00
Craig Topper	1828639ac4	[X86] Set the execution domain for VFPCLASS to SSEPackedSingle/Double. llvm-svn: 317974	2017-11-11 06:57:44 +00:00
Craig Topper	afb6899bc5	[X86] Set the execution domain for vptest instruction to the integer domain. llvm-svn: 317973	2017-11-11 06:19:12 +00:00
Craig Topper	eea0adbe11	[X86] Correct the execution domain on ROUND/VROUND instructions. llvm-svn: 317968	2017-11-11 02:26:05 +00:00
Craig Topper	ce5298b157	[X86] Remove the default for one of the arguments to some tablegen multiclasses. NFC No one ever uses this default and probably shouldn't since it sets the execution domain to generic. llvm-svn: 317967	2017-11-11 02:26:02 +00:00
Krzysztof Parzyszek	1a64d3b827	Recommit r317904: [Hexagon] Create HexagonISelDAGToDAG.h, NFC The Windows builder did not reconstruct the HexagonGenDAGISel.inc file after the TableGen binary has changed. llvm-svn: 317921	2017-11-10 20:09:46 +00:00
Konstantin Zhuravlyov	17f057eda8	AMDGPU/NFC: Split Processors.td into GCNProcessors.td and R600Processors.td Differential Revision: https://reviews.llvm.org/D39880 llvm-svn: 317920	2017-11-10 20:01:58 +00:00
Krzysztof Parzyszek	66b30dd685	Revert "[Hexagon] Create HexagonISelDAGToDAG.h, NFC" This reverts r317904: broke Windows build. llvm-svn: 317916	2017-11-10 19:27:18 +00:00
Craig Topper	1ab15f68c8	[X86] Merge the template method selectAddrOfGatherScatterNode into selectVectorAddr. NFCI Just need to initialize a couple variables differently based on the node type. No need for a whole separate template method. llvm-svn: 317915	2017-11-10 19:26:04 +00:00
Mandeep Singh Grang	f387fa649a	[RISCV] Silence an unused variable warning in release builds [NFC] Summary: Also minor cleanups: 1. Avoided multiple calls to Fixup.getKind() 2. Avoided multiple calls to getFixupKindInfo() 3. Removed a redundant return. Reviewers: asb, apazos Reviewed By: asb Subscribers: rbar, johnrusso, llvm-commits Differential Revision: https://reviews.llvm.org/D39881 llvm-svn: 317908	2017-11-10 19:09:28 +00:00
Krzysztof Parzyszek	fd31de271f	[Hexagon] Create HexagonISelDAGToDAG.h, NFC llvm-svn: 317904	2017-11-10 18:39:45 +00:00
Jonas Paulsson	1b514619dd	[RegAlloc, SystemZ] Increase number of LOCRs by passing "hard" regalloc hints. * The method getRegAllocationHints() is now of bool type instead of void. If true is returned, regalloc (AllocationOrder) will only try to allocate the hints, as opposed to merely trying them before non-hinted registers. * TargetRegisterInfo::getRegAllocationHints() is implemented for SystemZ with an increase in number of LOCRs. In this case, it is desired to force the hints even though there is a slight increase in spilling, because if a non-hinted register would be allocated, the LOCRMux pseudo would have to be expanded with a jump sequence. The LOCR (Load On Condition) SystemZ instruction must have both operands in either the low or high part of the 64 bit register. Reviewers: Quentin Colombet and Ulrich Weigand https://reviews.llvm.org/D36795 llvm-svn: 317879	2017-11-10 08:46:26 +00:00
Craig Topper	de1d072d04	[X86] Add support for combining FMADDSUB(A, B, FNEG(C))->FMSUBADD(A, B, C) Support the opposite direction as well. Also add a TODO for not being able to combine FMSUB/FNMADD/FNMSUB with FNEG. llvm-svn: 317878	2017-11-10 08:22:37 +00:00
Yaxun Liu	f2debeba3c	[AMDGPU] Fix pointer info for lowering load/store for r600 for amdgiz environment r600 uses dummy pointer info for lowering load/store. Since dummy pointer info assumes address space 0, this causes isel failure when temporary load/store SDNodes are generated for amdgiz environment. Since the offest is not constant, FixedStack pseudo source value cannot be used to create the pointer info. This patch creates pointer info using llvm undef value. At least this provides correct address space so that isel can be done correctly. Differential Revision: https://reviews.llvm.org/D39698 llvm-svn: 317862	2017-11-10 02:03:28 +00:00
Yaxun Liu	a483fa88c1	[AMDGPU] Fix pointer info for pseudo source for r600 The pointer info for pseudo source for r600 is not correct when alloca addr space is not 0, which causes invalid SDNode for r600---amdgiz. This patch fixes that. Differential Revision: https://reviews.llvm.org/D39670 llvm-svn: 317861	2017-11-10 01:53:24 +00:00
Ulrich Weigand	c3b7f3ba61	[SystemZ] Add support for the "o" inline asm constraint We don't really need any special handling of "offsettable" memory addresses, but since some existing code uses inline asm statements with the "o" constraint, add support for this constraint for compatibility purposes. llvm-svn: 317807	2017-11-09 16:31:57 +00:00
Simon Dardis	49cc1d1c05	[mips] Correct microMIP's jump and add unconditional branch pseudo Correct the definition of 'j' as being unavailable for microMIPS32R6 and provide the 'b' assembly idiom for codegen purposes for microMIPS32r3. Provide the necessary 'br' pattern for microMIPS32R6 as it now longer incorrectly uses the 'j' instruction. Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39741 llvm-svn: 317801	2017-11-09 16:02:18 +00:00
Alex Bradbury	3b5341d83a	[RISCV] MC layer support for the standard RV32A instruction set extension llvm-svn: 317791	2017-11-09 15:00:03 +00:00
Alex Bradbury	284dc625a5	[RISCV] MC layer support for the standard RV32M instruction set extension llvm-svn: 317788	2017-11-09 14:46:30 +00:00
Andrew V. Tischenko	c2204901e8	Sched model improving on btver2: JFPU01 resource, vtestp* for xmm. Differential Revision: https://reviews.llvm.org/D39802 llvm-svn: 317785	2017-11-09 14:19:59 +00:00
Andrew V. Tischenko	1297e99623	Add -print-schedule scheduling comments to inline asm. Differential Revision: https://reviews.llvm.org/D39728 llvm-svn: 317782	2017-11-09 12:45:40 +00:00
Craig Topper	4f63d7a01e	[X86] Give priority to EVEX FMA instructions over FMA4 instructions. No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority. llvm-svn: 317775	2017-11-09 08:26:26 +00:00
Vitaly Buka	c723f0c225	Fix "default label in switch which covers all enumeration values" warning llvm-svn: 317771	2017-11-09 07:46:13 +00:00
Craig Topper	6753f35627	[X86] Make X86ISD::FMADDS3 isel patterns commutable. This was missed when FMADDS3 was split from X86ISD::FMADDS3_RND. llvm-svn: 317769	2017-11-09 06:17:05 +00:00
Marek Olsak	72631b54b3	AMDGPU: Merge BUFFER_STORE_DWORD_OFFEN/OFFSET into x2, x4 Summary: Only 56 shaders (out of 48486) are affected. Totals from affected shaders (changed stats only): SGPRS: 2420 -> 2460 (1.65 %) Spilled VGPRs: 94 -> 112 (19.15 %) Scratch size: 524 -> 528 (0.76 %) dwords per thread Code Size: 187400 -> 184992 (-1.28 %) bytes One DiRT Showdown shader spills 6 more VGPRs. One Grid Autosport shader spills 12 more VGPRs. The other 54 shaders only have a decrease in code size. (I'm ignoring the SGPR noise) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39012 llvm-svn: 317755	2017-11-09 01:52:55 +00:00
Marek Olsak	3170d60e41	AMDGPU: Lower buffer store and atomic intrinsics manually Summary: Without this, SIMemoryLegalizer inserts s_waitcnt vmcnt(0) before every buffer store and atomic instruction. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39060 llvm-svn: 317754	2017-11-09 01:52:48 +00:00
Marek Olsak	fd39065f1d	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFSET into x2, x4 Summary: Only 3 (out of 48486) shaders are affected. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38951 llvm-svn: 317753	2017-11-09 01:52:36 +00:00
Marek Olsak	2bb24b05c7	AMDGPU: Merge BUFFER_LOAD_DWORD_OFFEN into x2, x4 Summary: -9.9% code size decrease in affected shaders. Totals (changed stats only): SGPRS: 2151462 -> 2170646 (0.89 %) VGPRS: 1634612 -> 1640288 (0.35 %) Spilled SGPRs: 8942 -> 8940 (-0.02 %) Code Size: 52940672 -> 51727288 (-2.29 %) bytes Max Waves: 373066 -> 371718 (-0.36 %) Totals from affected shaders: SGPRS: 283520 -> 302704 (6.77 %) VGPRS: 227632 -> 233308 (2.49 %) Spilled SGPRs: 3966 -> 3964 (-0.05 %) Code Size: 12203080 -> 10989696 (-9.94 %) bytes Max Waves: 44070 -> 42722 (-3.06 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38950 llvm-svn: 317752	2017-11-09 01:52:30 +00:00
Marek Olsak	fcc32542ea	AMDGPU: Merge S_BUFFER_LOAD_DWORD_IMM into x2, x4 Summary: Only constant offsets (*_IMM opcodes) are merged. It reuses code for LDS load/store merging. It relies on the scheduler to group loads. The results are mixed, I think they are mostly positive. Most shaders are affected, so here are total stats only: SGPRS: 2072198 -> 2151462 (3.83 %) VGPRS: 1628024 -> 1634612 (0.40 %) Spilled SGPRs: 7883 -> 8942 (13.43 %) Spilled VGPRs: 97 -> 101 (4.12 %) Scratch size: 1488 -> 1492 (0.27 %) dwords per thread Code Size: 60222620 -> 52940672 (-12.09 %) bytes Max Waves: 374337 -> 373066 (-0.34 %) There is 13.4% increase in SGPR spilling, DiRT Showdown spills a few more VGPRs (now 37), but 12% decrease in code size. These are the new stats for SGPR spilling. We already spill a lot SGPRs, so it's uncertain whether more spilling will make any difference since SGPRs are always spilled to VGPRs: SGPR SPILLING APPS Shaders SpillSGPR AvgPerSh alien_isolation 2938 100 0.0 batman_arkham_origins 589 6 0.0 bioshock-infinite 1769 4 0.0 borderlands2 3968 22 0.0 counter_strike_glob.. 1142 60 0.1 deus_ex_mankind_div.. 1410 79 0.1 dirt-showdown 533 4 0.0 dirt_rally 364 1163 3.2 divinity 1052 2 0.0 dota2 1747 7 0.0 f1-2015 776 1515 2.0 grid_autosport 1767 1505 0.9 hitman 1413 273 0.2 left_4_dead_2 1762 4 0.0 life_is_strange 1296 26 0.0 mad_max 358 96 0.3 metro_2033_redux 2670 60 0.0 payday2 1362 22 0.0 portal 474 3 0.0 saints_row_iv 1704 8 0.0 serious_sam_3_bfe 392 1348 3.4 shadow_of_mordor 1418 12 0.0 shadow_warrior 3956 239 0.1 talos_principle 324 1735 5.4 thea 172 17 0.1 tomb_raider 1449 215 0.1 total_war_warhammer 242 56 0.2 ue4_effects_cave 295 55 0.2 ue4_elemental 572 12 0.0 unigine_tropics 210 56 0.3 unigine_valley 278 152 0.5 victor_vran 1262 84 0.1 yofrankie 82 2 0.0 Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38949 llvm-svn: 317751	2017-11-09 01:52:23 +00:00
Marek Olsak	438fa3e3f9	AMDGPU: Fold immediate offset into BUFFER_LOAD_DWORD lowered from SMEM Summary: -5.3% code size in affected shaders. Changed stats only: 48486 shaders in 30489 tests Totals: SGPRS: 2086406 -> 2072430 (-0.67 %) VGPRS: 1626872 -> 1627960 (0.07 %) Spilled SGPRs: 7865 -> 7912 (0.60 %) Code Size: 60978060 -> 60188764 (-1.29 %) bytes Max Waves: 374530 -> 374342 (-0.05 %) Totals from affected shaders: SGPRS: 299664 -> 285688 (-4.66 %) VGPRS: 233844 -> 234932 (0.47 %) Spilled SGPRs: 3959 -> 4006 (1.19 %) Code Size: 14905272 -> 14115976 (-5.30 %) bytes Max Waves: 46202 -> 46014 (-0.41 %) Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D38915 llvm-svn: 317750	2017-11-09 01:52:17 +00:00
Craig Topper	f7a8beb9b6	[X86] Make sure we don't read too many operands from X86ISD::FMADDS1/FMADDS3 nodes when doing FNEG combine. r317453 added new ISD nodes without rounding modes that were added to an existing if/else chain. But all the previous nodes handled there included a rounding mode. The final code after this if/else chain expected an extra operand that isn't present for the new nodes. llvm-svn: 317748	2017-11-09 01:06:47 +00:00
Craig Topper	4a6c481dbb	[X86] X86MaskedGatherSDNode shouldn't inherit from MaskedGatherScatterSDNode The classof implementation in MaskedGatherScatterSDNode doesn't consider X86MaskedGatherSDNode so its misleading. llvm-svn: 317733	2017-11-08 22:26:41 +00:00
Craig Topper	d42b52a9a8	[X86] Preserve memory refs when folding loads into divides. This is similar to what we already do for multiplies. Without this we can't unfold and hoist an invariant load. llvm-svn: 317732	2017-11-08 22:26:39 +00:00
Craig Topper	69f177c5fb	[X86] Remove an if check on the result of a cast. NFC cast takes a non-null input and produces a non-null output. So this if can never fail. llvm-svn: 317731	2017-11-08 22:26:37 +00:00
Reid Kleckner	c37cd8d78d	Revert "Correct dwarf unwind information in function epilogue for X86" This reverts r317579, originally committed as r317100. There is a design issue with marking CFI instructions duplicatable. Not all targets support the CFIInstrInserter pass, and targets like Darwin can't cope with duplicated prologue setup CFI instructions. The compact unwind info emission fails. When the following code is compiled for arm64 on Mac at -O3, the CFI instructions end up getting tail duplicated, which causes compact unwind info emission to fail: int a, c, d, e, f, g, h, i, j, k, l, m; void n(int o, int b) { if (g) f = 0; for (; f < o; f++) { m = a; if (l > j k > i) j = i = k = d; h = b[c] - e; } } We get assembly that looks like this: ; BB#1: ; %if.then Lloh3: adrp x9, _f@GOTPAGE Lloh4: ldr x9, [x9, _f@GOTPAGEOFF] mov w8, wzr Lloh5: str wzr, [x9] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.lt LBB0_3 b LBB0_7 LBB0_2: ; %entry.if.end_crit_edge Lloh6: adrp x8, _f@GOTPAGE Lloh7: ldr x8, [x8, _f@GOTPAGEOFF] Lloh8: ldr w8, [x8] stp x20, x19, [sp, #-16]! ; 8-byte Folded Spill .cfi_def_cfa_offset 16 .cfi_offset w19, -8 .cfi_offset w20, -16 cmp w8, w0 b.ge LBB0_7 LBB0_3: ; %for.body.lr.ph Note the multiple .cfi_def* directives. Compact unwind info emission can't handle that. llvm-svn: 317726	2017-11-08 21:31:14 +00:00
Alex Bradbury	1e5ce63ec5	Set hasSideEffects=0 for PHI and fix affected passes Previously, hasSideEffects was ? for TargetOpcode::PHI and would be inferred as 1. D37065 sets the previously inferred properties explicitly. This patch sets hasSideEffects=0 for PHI, as it is for G_PHI. MachineInstr::isSafeToMove has been updated so it still returns false for PHI. Additionally, HexagonBitSimplify relied on a PHI node having the hasUnmodeledSideEffects property. This patch fixes that assumption. Differential Revision: https://reviews.llvm.org/D37097 llvm-svn: 317721	2017-11-08 20:19:16 +00:00
Craig Topper	3176a2296d	[X86] Correct the implementation of BEXTR load folding to use the shift as the parent node and pass a separate root. We were calling tryFoldLoad with the 'and' node was the root and parent node of the load. But the parent of the load should be the shift that proceeds the and. While the and node is correctly the root node. To fix this I had to make tryFoldLoad take a separate use and root input. I've added a convenience version with the old signature to avoid updating the other call sites. llvm-svn: 317720	2017-11-08 20:17:33 +00:00
Sam Clegg	e1fc40d384	[WebAssembly] Update test expectations I believe these were fixed in rL317707 Differential Revision: https://reviews.llvm.org/D39813 llvm-svn: 317718	2017-11-08 20:14:06 +00:00
Craig Topper	83a529c5a2	[X86] Don't call validateInstruction from MatchAndEmitInstruction when MatchingInlineAsm is set. The MCInst won't be populated. Without this we can't parse gather instructions in ms inline asm blocks. The validateInstruction function was introduced in r316700 to check gather constraints. llvm-svn: 317713	2017-11-08 19:38:48 +00:00
Dan Gohman	31b2bacc07	[WebAssembly] Call signExtend to get sign extended register Patch by Jatin Bhateja! Differential Revision: https://reviews.llvm.org/D39529 llvm-svn: 317710	2017-11-08 19:24:21 +00:00
Dan Gohman	f1ee202e98	[WebAssembly] Revise the strategy for inline asm. Previously, an "r" constraint would mean the compiler provides a value on WebAssembly's operand stack. This was tricky to use properly, particularly since it isn't possible to declare a new local from within an inline asm string. With this patch, "r" provides the value in a WebAssembly local, and the local index is provided to the inline asm string. This requires inline asm to use get_local and set_local to read the register. This does potentially result in larger code size, however inline asm should hopefully be quite rare in WebAssembly. This also means that the "m" constraint can no longer be supported, as WebAssembly has nothing like a "memory operand" that includes an implicit get_local. This fixes PR34599 for the wasm32-unknown-unknown-wasm target (though not for the ELF target). llvm-svn: 317707	2017-11-08 19:18:08 +00:00
Alex Bradbury	58664434e8	[RISCV] Initial support for function calls Note that this is just enough for simple function call examples to generate working code. Support for varargs etc follows in future patches. Differential Revision: https://reviews.llvm.org/D29936 llvm-svn: 317691	2017-11-08 13:41:21 +00:00

1 2 3 4 5 ...

44691 Commits