llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 21:13:02 +02:00

Author	SHA1	Message	Date
Eugene Zelenko	c816ae3436	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 289475	2016-12-12 22:23:53 +00:00
Guozhi Wei	d15522bb98	[PPC] Prefer direct move on power8 if load 1 or 2 bytes to VSR Power8 has MTVSRWZ but no LXSIBZX/LXSIHZX, so move 1 or 2 bytes to VSR through MTVSRWZ is much faster than store the extended value into stack and load it with LXSIWZX. This patch fixes pr31144. Differential Revision: https://reviews.llvm.org/D27287 llvm-svn: 289473	2016-12-12 22:09:02 +00:00
Simon Atanasyan	1d31d89061	[mips] For PIC code convert unconditional jump to unconditional branch Unconditional branch uses relative addressing which is the right choice in case of position independent code. This is a fix for the bug: https://dmz-portal.mips.com/bugz/show_bug.cgi?id=2445 Differential revision: https://reviews.llvm.org/D27483 llvm-svn: 289448	2016-12-12 17:40:26 +00:00
Nicolai Haehnle	baedb7e4bc	AMDGPU: llvm.amdgcn.interp.mov is a source of divergence Summary: While the result is constant across a single primitive, each pixel shader wave can have pixels from multiple primitives. Reviewers: tstellarAMD, arsenm Subscribers: kzhuravl, wdng, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27572 llvm-svn: 289447	2016-12-12 16:52:19 +00:00
Simon Pilgrim	db13b6b628	Update inline argument comment. NFCI. combineX86ShufflesRecursively 'HasPSHUFB' flag has been the more generic 'HasVariableMask' flag for some time. llvm-svn: 289430	2016-12-12 13:43:15 +00:00
Simon Pilgrim	fd2305ff08	[X86][SSE] Add support for combining SSE VSHLI/VSRLI uniform constant shifts. Fixes some missed constant folding opportunities and allows us to combine shuffles that end with a logical bit shift. llvm-svn: 289429	2016-12-12 13:33:58 +00:00
Simon Pilgrim	30c3d9afaf	[X86][SSE] Lower suitably sign-extended mul vXi64 using PMULDQ PMULDQ returns the 64-bit result of the signed multiplication of the lower 32-bits of vXi64 vector inputs, we can lower with this if the sign bits stretch that far. Differential Revision: https://reviews.llvm.org/D27657 llvm-svn: 289426	2016-12-12 10:49:15 +00:00
Craig Topper	0fa1305f4c	[X86] Teach selectScalarSSELoad to accept full 128-bit vector loads and the X86ISD::VZEXT_LOAD opcode. Disable peephole on some of the tests that no longer require it to properly fold scalar intrinsics. llvm-svn: 289424	2016-12-12 07:57:24 +00:00
Craig Topper	fe4ee3f999	[X86] Change CMPSS/CMPSD intrinsic instructions to use sse_load_f32/f64 as its memory pattern instead of full vector load. These intrinsics only load a single element. We should use sse_loadf32/f64 to give more options of what loads it can match. Currently these instructions are often only getting their load folded thanks to the load folding in the peephole pass. I plan to add more types of loads to sse_load_f32/64 so we can match without the peephole. llvm-svn: 289423	2016-12-12 07:57:21 +00:00
Craig Topper	194b0d60e7	[X86] Remove some intrinsic instructions from hasPartialRegUpdate Summary: These intrinsic instructions are all selected from intrinsics that have well defined behavior for where the upper bits come from. It's not the same place as the lower bits. As you can see we were suppressing load folding for these instructions in some cases. In none of the cases was the separate load helping avoid a partial dependency on the destination register. So we should just go ahead and allow the load to be folded. Only foldMemoryOperand was suppressing folding for these. They all have patterns for folding sse_load_f32/f64 that aren't gated with OptForSize, but sse_load_f32/f64 doesn't allow 128-bit vector loads. It only allows scalar_to_vector and vzmovl of scalar loads to match. There's no reason we can't allow a 128-bit vector load to be narrowed so I would like to fix sse_load_f32/f64 to allow that. And if I do that it changes some of these same test cases to fold the load too. Reviewers: spatel, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27611 llvm-svn: 289419	2016-12-12 05:07:17 +00:00
Simon Pilgrim	6462737f18	[X86][SSE] Add support for combining target shuffles to SHUFPD. llvm-svn: 289407	2016-12-11 21:26:25 +00:00
Ayman Musa	e2419bba6b	[X86][AVX512] Add missing patterns for broadcast fallback in case load node has multiple uses (for v4i64 and v4f64). When the load node which the broadcast instruction broadcasts has multiple uses, it cannot be folded. A fallback pattern is added to catch these cases and provide another solution. Differential Revision: https://reviews.llvm.org/D27661 llvm-svn: 289404	2016-12-11 20:11:17 +00:00
Oren Ben Simhon	32831df6b9	[X86] Regcall - Adding support for mask types Regcall calling convention passes mask types arguments in x86 GPR registers. The review includes the changes required in order to support v32i1, v16i1 and v8i1. Differential Revision: https://reviews.llvm.org/D27148 llvm-svn: 289383	2016-12-11 14:10:52 +00:00
Craig Topper	dc2cf85382	[X86] Fix a comment to say 'an FMA' instead of 'a FMA'. NFC llvm-svn: 289352	2016-12-11 01:28:08 +00:00
Craig Topper	083cc0960f	[X86] Remove masking from 512-bit VPERMIL intrinsics in preparation for being able to constant fold them in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289350	2016-12-11 01:26:44 +00:00
Dylan McKay	c20881ecea	[AVR] Fix a signed vs unsigned compiler warning llvm-svn: 289349	2016-12-11 00:24:13 +00:00
Dylan McKay	30fd32d024	[AVR] Remove incorrect comment This should've been removed in r289323. llvm-svn: 289346	2016-12-10 23:50:30 +00:00
Craig Topper	a134317d69	[X86] Remove masking from 512-bit PSHUFB intrinsics in preparation for being able to constant fold it in InstCombineCalls like we do for 128/256-bit. llvm-svn: 289344	2016-12-10 23:09:43 +00:00
Simon Pilgrim	c46193553e	[X86][SSE] Ensure UNPCK inputs are a consistent value type in LowerHorizontalByteSum llvm-svn: 289341	2016-12-10 21:16:45 +00:00
Craig Topper	10f7668110	[AVX-512] Remove 128/256 masked vpermil instrinsics and autoupgrade to a select around the unmasked avx1 intrinsics. llvm-svn: 289340	2016-12-10 21:15:52 +00:00
Matt Arsenault	82492d300b	AMDGPU: Fix asan errors when folding operands This was failing when trying to fold immediates into operand 1 of a phi, which only has one statically known operand. llvm-svn: 289337	2016-12-10 19:58:00 +00:00
Simon Pilgrim	e8ba184783	[X86][SSE] Move ZeroVector creation into the shuffle pattern case where its actually used. Also fix the ZeroVector's type - I've no idea how this hasn't caused problems........ llvm-svn: 289336	2016-12-10 19:49:55 +00:00
Craig Topper	44e876e81e	[AVX-512] Add support for lowering (v2i64 (fp_to_sint (v2f32))) to vcvttps2uqq when AVX512DQ and AVX512VL are available. llvm-svn: 289335	2016-12-10 19:35:39 +00:00
Craig Topper	920ac15741	[X86] Clarify indentation. NFC llvm-svn: 289334	2016-12-10 19:35:36 +00:00
Craig Topper	59cf5de700	[X86] Combine LowerFP_TO_SINT and LowerFP_TO_UINT. They only differ by a single boolean flag passed to a helper function. Just check the opcode and create the flag. llvm-svn: 289333	2016-12-10 19:35:33 +00:00
Simon Atanasyan	8f15ea8025	[mips] Eliminate else-after-return. NFC llvm-svn: 289331	2016-12-10 17:30:09 +00:00
Dylan McKay	ff4fb450d4	[AVR] Add a stub README file llvm-svn: 289326	2016-12-10 12:08:19 +00:00
Dylan McKay	c9ba38b704	[AVR] Fix and clean up the inline assembly tests There was a bug where we would hit an assertion if 'Q' was used as a constraint. I also removed hardcoded register names to prefer regexes so the tests don't break when the register allocator changes. llvm-svn: 289325	2016-12-10 11:49:07 +00:00
Dylan McKay	6fa33b0ae0	[AVR] Fix an inline asm assertion which would always trigger It looks like some time in the past, constraint codes were changed from chars being passed around to enums. llvm-svn: 289323	2016-12-10 11:18:37 +00:00
Dylan McKay	c328ab2ffc	[AVR] Use the register scavenger when expanding 'LDDW' instructions Summary: This gets rid of the hardcoded 'r0' that was used previously. Reviewers: asl Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27567 llvm-svn: 289322	2016-12-10 10:51:55 +00:00
Dylan McKay	12bab3b378	[AVR] Support stores to undefined pointers This would previously trigger an assertion error in AVRISelDAGToDAG. llvm-svn: 289321	2016-12-10 10:16:13 +00:00
Craig Topper	3ab949c081	[X86] Use X86ISD::CVTTP2SI and X86ISD::CVTTP2UI for lowering 128-bit cvttps2qq and cvttps2uqq intrinsics since there is a mismatch between number of input and output elements. Ideally ISD::FP_TO_SINT and ISD::FP_TO_UINT would only be used for cases with the same number of input and output elements. Similar things have already been done for other convert intrinsics. llvm-svn: 289316	2016-12-10 06:02:48 +00:00
Dylan McKay	3c4aaa37af	[AVR] Fix a bunch of incorrect assertion messages These should've been checking whether the immediate is a 6-bit unsigned integer. If the immediate was '63', this would cause an assertion error which shouldn't have occurred. llvm-svn: 289315	2016-12-10 05:48:48 +00:00
Matt Arsenault	78957bd5fc	AMDGPU: Fix AMDGPUPromoteAlloca breaking addrspacecasts The users of the addrspacecast were having their types incorrectly changed, producing invalid bitcasts between address spaces. llvm-svn: 289307	2016-12-10 00:52:50 +00:00
Matt Arsenault	c2c2a10170	AMDGPU: Fix handling of 16-bit immediates Since 32-bit instructions with 32-bit input immediate behavior are used to materialize 16-bit constants in 32-bit registers for 16-bit instructions, determining the legality based on the size is incorrect. Change operands to have the size specified in the type. Also adds a workaround for a disassembler bug that produces an immediate MCOperand for an operand that is supposed to be OPERAND_REGISTER. The assembler appears to accept out of bounds immediates and truncates them, but this seems to be an issue for 32-bit already. llvm-svn: 289306	2016-12-10 00:39:12 +00:00
Matt Arsenault	365f8ab107	AMDGPU: Fix vintrp disassembly llvm-svn: 289292	2016-12-10 00:29:55 +00:00
Matt Arsenault	61a1b18506	AMDGPU: Change vintrp printing to better match sc Some of the immediates need to be printed differently eventually. llvm-svn: 289291	2016-12-10 00:23:12 +00:00
Eugene Zelenko	796f37f3bb	[AMDGPU, PowerPC, TableGen] Fix some Clang-tidy modernize and Include What You Use warnings; other minor fixes (NFC). llvm-svn: 289282	2016-12-09 22:06:55 +00:00
Marek Olsak	68f46f5c8e	AMDGPU/SI: Remove XNACK feature from CI Summary: CI doesn't have XNACK. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27175 llvm-svn: 289263	2016-12-09 19:49:58 +00:00
Marek Olsak	191936bbf2	AMDGPU/SI: Don't reserve XNACK when it's disabled Summary: This frees 2 additional scalar registers. These are results from all of my 3 patches combined: Polaris: Spilled SGPRs: 2231 -> 1517 (-32.00 %) Tonga: Spilled SGPRs: 3829 -> 2608 (-31.89 %) Spilled VGPRs: 100 -> 84 (-16.00 %) Tonga even spills SGPRs via VGPRs to scratch. That's a compute shader limited to 64 VGPRs. Reviewers: tstellarAMD Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27151 llvm-svn: 289262	2016-12-09 19:49:54 +00:00
Marek Olsak	bb1829874b	AMDGPU/SI: Don't reserve FLAT_SCR on non-HSA targets & without stack objects Summary: This frees 2 scalar registers. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27150 llvm-svn: 289261	2016-12-09 19:49:48 +00:00
Marek Olsak	ed98e90e56	AMDGPU/SI: Allow using SGPRs 96-101 on VI Summary: There is no point in setting SGPRS=104, because VI allocates SGPRs in multiples of 16, so 104 -> 112. That enables us to use all 102 SGPRs for general purposes. Reviewers: tstellarAMD Subscribers: qcolombet, arsenm, kzhuravl, wdng, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27149 llvm-svn: 289260	2016-12-09 19:49:40 +00:00
Matt Arsenault	af9df38755	AMDGPU: Fix isTypeDesirableForOp for i16 This should do nothing for targets without i16. llvm-svn: 289235	2016-12-09 17:57:43 +00:00
Simon Pilgrim	62680ef29e	[SelectionDAG] Add knownbits support for EXTRACT_VECTOR_ELT opcodes (REAPPLIED) Reapplied with fix for PR31323 - X86 SSE2 vXi16 multiplies for illegal types were creating CONCAT_VECTORS nodes with vector inputs that might not total the number of elements in the result type. llvm-svn: 289232	2016-12-09 17:53:11 +00:00
Matt Arsenault	eb15d80d21	AMDGPU: Fix i128 mul llvm-svn: 289231	2016-12-09 17:49:14 +00:00
Matt Arsenault	501bdb609f	AMDGPU: Allow TBA, TMA, TTMP* registers with SMEM instructions Fixes assembler regressions. llvm-svn: 289230	2016-12-09 17:49:11 +00:00
Matt Arsenault	1a2773fd0e	AMDGPU: Clean up instruction bits Sort the instruction bits by type and make sure there is one for each format. Also cleanup namespaces. llvm-svn: 289229	2016-12-09 17:49:08 +00:00
Sean Fertile	3b3a29b196	[PPC] Add intrinsics for vector extract word and vector insert word. Revision: https://reviews.llvm.org/D26547 llvm-svn: 289227	2016-12-09 17:21:42 +00:00
Nirav Dave	0b475dec77	Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled." This reverts commit r289221 which appears to be triggering an assertion llvm-svn: 289226	2016-12-09 17:18:24 +00:00
Nirav Dave	075ae0197d	In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled. Retrying after fixing overly aggressive load-store forwarding optimization. Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. Whem merging stores, search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and generally the output CodeGen (with some exceptions). Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seemed sufficient to not cause regressions in tests. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable. Some tests relying on the order were changed to use volatile memory operations Noteworthy tests: CodeGen/AArch64/argument-blocks.ll - It's not entirely clear what the test_varargs_stackalign test is supposed to be asserting, but the new code looks right. CodeGen/AArch64/arm64-memset-inline.lli - CodeGen/AArch64/arm64-stur.ll - CodeGen/ARM/memset-inline.ll - The backend now generates worse code due to store merging succeeding, as we do do a 16-byte constant-zero store efficiently. CodeGen/AArch64/merge-store.ll - Improved, but there still seems to be an extraneous vector insert from an element to itself? CodeGen/PowerPC/ppc64-align-long-double.ll - Worse code emitted in this case, due to the improved store->load forwarding. CodeGen/X86/dag-merge-fast-accesses.ll - CodeGen/X86/MergeConsecutiveStores.ll - CodeGen/X86/stores-merging.ll - CodeGen/Mips/load-store-left-right.ll - Restored correct merging of non-aligned stores CodeGen/AMDGPU/promote-alloca-stored-pointer-value.ll - Improved. Correctly merges buffer_store_dword calls CodeGen/AMDGPU/si-triv-disjoint-mem-access.ll - Improved. Sidesteps loading a stored value and merges two stores CodeGen/X86/pr18023.ll - This test has been removed, as it was asserting incorrect behavior. Non-volatile stores CAN be moved past volatile loads, and now are. CodeGen/X86/vector-idiv.ll - CodeGen/X86/vector-lzcnt-128.ll - It's basically impossible to tell what these tests are actually testing. But, looks like the code got better due to the memory operations being recognized as non-aliasing. CodeGen/X86/win32-eh.ll - Both loads of the securitycookie are now merged. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle Subscribers: wdng, nhaehnle, nemanjai, arsenm, weimingz, niravd, RKSimon, aemerson, qcolombet, dsanders, resistor, tstellarAMD, t.p.northover, spatel Differential Revision: https://reviews.llvm.org/D14834 llvm-svn: 289221	2016-12-09 16:15:12 +00:00
Tom Stellard	5569b3eb62	AMDGPU/SI: Don't mark VINTRP instructions as mayLoad Summary: These instructions technically do read from memory, but the memory is considered to be out of bounds for normal load/store instructions. shader-db stats: SGPRS: 1416075 -> 1413323 (-0.19 %) VGPRS: 867413 -> 863935 (-0.40 %) Spilled SGPRs: 1409 -> 1354 (-3.90 %) Spilled VGPRs: 63 -> 63 (0.00 %) Private memory VGPRs: 880 -> 880 (0.00 %) Scratch size: 2648 -> 2632 (-0.60 %) dwords per thread Code Size: 37889052 -> 37897340 (0.02 %) bytes LDS: 2147 -> 2147 (0.00 %) blocks Max Waves: 279243 -> 280369 (0.40 %) Wait states: 0 -> 0 (0.00 %) Reviewers: nhaehnle, mareko, arsenm Subscribers: kzhuravl, wdng, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D27593 llvm-svn: 289219	2016-12-09 15:57:15 +00:00
Craig Topper	705cf92b52	[X86] Modify patterns from memory form of RCP/RSQRT/SQRT intrinsics to only allow (scalar_to_vector (loadf32/load64)) instead of anything that sse_load_f32/f64 can match. sse_load_f32/f64 can also match loads that are zero extended to vectors. We shouldn't match that because we wouldn't be able to get the instruction to zero the upper bits like the intrinsic semantics would require for such a case. There is a test case that does depend on this behavior. llvm-svn: 289193	2016-12-09 07:57:21 +00:00
Dylan McKay	9d0600bd62	[AVR] Use a more appropriate integer type for wide IN/OUT instructions We could previously select an integer which would hit an assertion error in pseudo expansion. The new type will also generate the appropriate fixups if needed, which wasn't done beforehand. llvm-svn: 289192	2016-12-09 07:49:14 +00:00
Dylan McKay	9f49c87f3f	[AVR] Add tests for a large number of pseudo instructions This adds MIR tests for 24 pseudo instructions. llvm-svn: 289191	2016-12-09 07:49:04 +00:00
Craig Topper	8266f68c49	[AVX-512] Correctly preserve the passthru semantics of the FMA scalar intrinsics Summary: Scalar intrinsics have specific semantics about the which input's upper bits are passed through to the output. The same input is also supposed to be the input we use for the lower element when the mask bit is 0 in a masked operation. We aren't currently keeping these semantics with instruction selection. This patch corrects this by introducing new scalar FMA ISD nodes that indicate whether operand 1(one of the multiply inputs) or operand 3(the additon/subtraction input) should pass thru its upper bits. We use this information to select 213/132 form for the operand 1 version and the 231 form for the operand 3 version. We also use this information to suppress combining FNEG operations on the passthru input since semantically the passthru bits aren't negated. This is stronger than the earlier check added for a user being SELECTS so we can remove that. This fixes PR30913. Reviewers: delena, zvi, v_klochkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27144 llvm-svn: 289190	2016-12-09 06:42:28 +00:00
Matt Arsenault	7729bf075d	AMDGPU: Select i16 instructions to VOP3 forms These were selecting directly to the VOP2 form instead of VOP3 like the i32 instructions. Fixes regressions in future commits where an immediate isn't folded because it was initially used for the second operand. Because uniform 16-bit operations are promoted to i32, it's difficult to get a simple testcase where this matters. Fold failures in SIFoldOperands here tend to be hidden by commute and fold in SIShrinkInstructions. llvm-svn: 289189	2016-12-09 06:19:12 +00:00
Craig Topper	9701bb00ad	[X86] Add masked versions of VPERMT2* and VPERMI2* to load folding tables. llvm-svn: 289186	2016-12-09 05:20:11 +00:00
Craig Topper	9dd955936a	[AVX-512] Add vpermilps/pd to load folding tables. llvm-svn: 289173	2016-12-09 02:18:11 +00:00
Krzysztof Parzyszek	0e5adc0e6f	[RDF] Fix incorrect lane mask calculation This was exposed by some code that used more than one level of sub- registers. There is no testcase, because there is no such code in the Hexagon backend. llvm-svn: 289099	2016-12-08 20:33:45 +00:00
Matt Arsenault	2c46312910	AMDGPU: Make f16 ConstantFP legal Not having this legal led to combine failures, resulting in dumb things like bitcasts of constants not being folded away. The only reason I'm leaving the v_mov_b32 hack that f32 already uses is to avoid madak formation test regressions. PeepholeOptimizer has an ordering issue where the immediate fold attempt is into the sgpr->vgpr copy instead of the actual use. Running it twice avoids that problem. llvm-svn: 289096	2016-12-08 20:14:46 +00:00
Stanislav Mekhanoshin	e735965677	[AMDGPU] Fix number of reserved SGPRs on CI to reflect flat scratch use Differential Revision: https://reviews.llvm.org/D27225 llvm-svn: 289095	2016-12-08 20:07:23 +00:00
Matt Arsenault	c57e18e32f	AMDGPU: Fix commuting v_sub_u16 The correct commutable opcode was set to itself, so this was simply swapping the operands to commute instead of also changing the opcode to v_subrev_u16. llvm-svn: 289093	2016-12-08 19:52:38 +00:00
Stanislav Mekhanoshin	71ed6f04a8	[AMDGPU] Add amdgpu-unify-metadata pass Multiple metadata values for records such as opencl.ocl.version, llvm.ident and similar are created after linking several modules. For some of them, notably opencl.ocl.version, this creates semantic problem because we cannot tell which version of OpenCL the composite module conforms. Moreover, such repetitions of identical values often create a huge list of unneeded metadata, which grows bitcode size both in memory and stored on disk. It can go up to several Mb when linked against our OpenCL library. Lastly, such long lists obscure reading of dumped IR. The pass unifies metadata after linking. Differential Revision: https://reviews.llvm.org/D25381 llvm-svn: 289092	2016-12-08 19:46:04 +00:00
Peter Collingbourne	a2d4395226	IR, X86: Understand !absolute_symbol metadata on global variables. Summary: Attaching !absolute_symbol to a global variable does two things: 1) Marks it as an absolute symbol reference. 2) Specifies the value range of that symbol's address. Teach the X86 backend to allow absolute symbols to appear in place of immediates by extending the relocImm and mov64imm32 matchers. Start using relocImm in more places where it is legal. As previously proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/105800.html Differential Revision: https://reviews.llvm.org/D25878 llvm-svn: 289087	2016-12-08 19:01:00 +00:00
Alexander Timofeev	3a9e77fc0f	[AMDGPU] Scalarization of global uniform loads. Summary: LC can currently select scalar load for uniform memory access basing on readonly memory address space only. This restriction originated from the fact that in HW prior to VI vector and scalar caches are not coherent. With MemoryDependenceAnalysis we can check that the memory location corresponding to the memory operand of the LOAD is not clobbered along the all paths from the function entry. Reviewers: rampitec, tstellarAMD, arsenm Subscribers: wdng, arsenm, nhaehnle Differential Revision: https://reviews.llvm.org/D26917 llvm-svn: 289076	2016-12-08 17:28:47 +00:00
NAKAMURA Takumi	a21d98f25e	LanaiInstPrinter: Prune unused libdeps. llvm-svn: 289054	2016-12-08 14:26:30 +00:00
Nicolai Haehnle	2a19d502fb	AMDGPU: Properly implement SIRegisterInfo::isFrameOffsetLegal and needsFrameBaseReg Summary: Without the fix to isFrameOffsetLegal to consider the instruction's immediate offset, the new test case hits the corresponding assertion in resolveFrameIndex, because the LocalStackSlotAllocation pass re-uses a different base register. With only the fix to isFrameOffsetLegal, code quality reduces in a bunch of places because frame base registers are added where they're not needed. This is addressed by properly implementing needsFrameBaseReg, which also helps to avoid unnecessary zero frame indices in a bunch of other places. Fixes piglit glsl-1.50/execution/variable-indexing/gs-output-array-vec4-index-wr.shader_test Reviewers: arsenm, tstellarAMD Subscribers: qcolombet, kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D27344 llvm-svn: 289048	2016-12-08 14:08:02 +00:00
Dylan McKay	8557bc5be1	[AVR] Add an assertion to ensure we don't emit LPM when it's unsupported llvm-svn: 289030	2016-12-08 08:34:13 +00:00
Matthias Braun	ed9d5255b8	LivePhysReg: Use reference instead of pointer in init(); NFC llvm-svn: 289002	2016-12-08 00:15:51 +00:00
Tim Northover	9cf8f9c151	GlobalISel: simplify MachineIRBuilder interface. MachineIRBuilder had weird before/after and beginning/end flags for the insert point. Unfortunately the non-default means that instructions will be inserted in reverse order which is almost never what anyone wants. Really, I think we just want (like IRBuilder has) the ability to insert at any C++ iterator-style point (i.e. before any instruction or before MBB.end()). So this fixes MIRBuilders to behave like IRBuilders in this respect. llvm-svn: 288980	2016-12-07 21:05:38 +00:00
Michael Kuperstein	eb499fd2d5	[X86] Skip over DEBUG_VALUE while looking for start of call sequence If we don't skip over DEBUG_VALUEs, we get differences between -g and non-g code. This fixes PR31242. Differential Revision: https://reviews.llvm.org/D27485 llvm-svn: 288965	2016-12-07 19:31:08 +00:00
Michael Kuperstein	7a552839e3	[X86] Do not assume "ri" instructions always have an immediate operand The second operand of an "ri" instruction may be an immediate, but it may also be a globalvariable, so we should make any assumptions. This fixes PR31271. Differential Revision: https://reviews.llvm.org/D27481 llvm-svn: 288964	2016-12-07 19:29:18 +00:00
Simon Pilgrim	f5b8df1991	[X86][SSE] Remove AND -> VZEXT combine This is now performed more generally by the target shuffle combine code. Already covered by tests that were originally added in D7666/rL229480 to support combineVectorZext (or VectorZextCombine as it was known then....). Differential Revision: https://reviews.llvm.org/D27510 llvm-svn: 288918	2016-12-07 17:02:41 +00:00
Dylan McKay	60a412a963	[AVR] Expand 'SELECT_CC' nodes whereever possible llvm-svn: 288905	2016-12-07 12:34:47 +00:00
Simon Pilgrim	314df45c83	[X86][SSE] Consistently set MOVD/MOVQ load/store/move instructions to integer domain We are being inconsistent with these instructions (and all their variants.....) with a random mix of them using the default float domain. Differential Revision: https://reviews.llvm.org/D27419 llvm-svn: 288902	2016-12-07 12:10:49 +00:00
Simon Pilgrim	bb13df0fc4	[X86][XOP] Fix VPERMIL2 non-constant pool shuffle decoding (PR31296) The non-constant pool version of DecodeVPERMIL2PMask was not offsetting correctly for the second input. I've updated the code to match the implementation in the constant-pool version. Annoyingly this bug was hidden for so long as it's tricky to combine to useful variable shuffle masks that don't become constant-pool entries. llvm-svn: 288898	2016-12-07 11:19:00 +00:00
Dylan McKay	8e5826c4b7	[AVR] Allow loading from stack slots where src and dest registers are identical Fixes PR 31256 llvm-svn: 288897	2016-12-07 11:08:56 +00:00
Tom Stellard	42afa7429f	AMDGPU : Add S_SETREG instructions to fix fdiv precision issues. Patch By: Wei Ding Summary: This patch fixes the fdiv precision issues. Reviewers: b-sumner, cfang, wdng, arsenm Subscribers: kzhuravl, nhaehnle, yaxunl, tony-tye Differential Revision: https://reviews.llvm.org/D26424 llvm-svn: 288879	2016-12-07 02:42:15 +00:00
Haicheng Wu	20ce778776	[AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode() In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511]. Differential Revision: https://reviews.llvm.org/D27480 llvm-svn: 288876	2016-12-07 01:45:04 +00:00
Tom Stellard	4f6c2a6b37	AMDGPU: Add llvm.amdgcn.interp.mov intrinsic Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D26725 llvm-svn: 288865	2016-12-06 23:52:13 +00:00
Matt Arsenault	432b06cd5e	AMDGPU: Fix crash on i16 constant expression llvm-svn: 288861	2016-12-06 23:18:06 +00:00
Matt Arsenault	e91ea67fb7	AMDGPU: Fix operand name for v_interp_* Other VOP instructions call the output vdst llvm-svn: 288856	2016-12-06 22:29:43 +00:00
Tom Stellard	a983d5f77b	AMDGPU/SI: Set correct value for amd_kernel_code_t::kernarg_segment_alignment Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27416 llvm-svn: 288852	2016-12-06 21:53:10 +00:00
Tom Stellard	3c0e23d86d	AMDGPU/SI: Don't move copies of immediates to the VALU Summary: If we write an immediate to a VGPR and then copy the VGPR to an SGPR, we can replace the copy with a S_MOV_B32 sgpr, imm, rather than moving the copy to the SALU. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D27272 llvm-svn: 288849	2016-12-06 21:13:30 +00:00
Zvi Rackover	3eb819344b	[X86] Prefer reduced width multiplication over pmulld on Silvermont Summary: Prefer expansions such as: pmullw,pmulhw,unpacklwd,unpackhwd over pmulld. On Silvermont [source: Optimization Reference Manual]: PMULLD has a throughput of 1/11 [instruction/cycles]. PMULHUW/PMULHW/PMULLW have a throughput of 1/2 [instruction/cycles]. Fixes pr31202. Analysis of this issue was done by Fahana Aleen. Reviewers: wmi, delena, mkuper Subscribers: RKSimon, llvm-commits Differential Revision: https://reviews.llvm.org/D27203 llvm-svn: 288844	2016-12-06 19:35:20 +00:00
Tim Northover	27693beb8e	GlobalISel: handle G_SEQUENCE fallbacks gracefully. There were two problems: + AArch64 was reusing random data from its binary op tables, which is complete nonsense for G_SEQUENCE. + Even when AArch64 gave up and said it couldn't handle G_SEQUENCE, the generic code asserted. llvm-svn: 288836	2016-12-06 18:38:38 +00:00
Daniel Sanders	cb627610a3	[globalisel][aarch64] Fix unintended assumptions about PartialMappingIdx. NFC. Summary: This is NFC but prevents assertions when PartialMappingIdx is tablegen-erated. The assumptions were: 1) FirstGPR is 0 2) FirstGPR is the first of the First* enumerators. GPR32 is changed to 1 to demonstrate that assumption #1 is fixed. #2 will be covered by a subsequent patch that tablegen-erates information and swaps the order of GPR and FPR as a side effect. Depends on D27336 Reviewers: ab, t.p.northover, qcolombet Subscribers: aemerson, rengolin, vkalintiris, dberris, rovka, llvm-commits Differential Revision: https://reviews.llvm.org/D27337 llvm-svn: 288812	2016-12-06 14:39:57 +00:00
Daniel Sanders	ace3ecf713	[globalisel][aarch64] Replace magic numbers with corresponding enumerators in ValMappings. NFC Reviewers: ab, t.p.northover, qcolombet Subscribers: aemerson, rengolin, vkalintiris, dberris, llvm-commits, rovka Differential Revision: https://reviews.llvm.org/D27336 llvm-svn: 288810	2016-12-06 13:55:01 +00:00
Daniel Sanders	7de9da6900	[globalisel][aarch64] Correct argument names in comments. llvm-svn: 288809	2016-12-06 13:48:58 +00:00
Oliver Stannard	56010af119	[ARM] Better error message for invalid flag-preserving Thumb1 insts When we see a non flag-setting instruction for which only the flag-setting version is available in Thumb1, we should give a better error message than "invalid instruction". Differential Revision: https://reviews.llvm.org/D27414 llvm-svn: 288805	2016-12-06 12:59:08 +00:00
Ayman Musa	57ce5e0b1e	[X86][AVX512] Detect repeated constant patterns in BUILD_VECTOR suitable for broadcasting. Check if a build_vector node includes a repeated constant pattern and replace it with a broadcast of that pattern. For example: "build_vector <0, 1, 2, 3, 0, 1, 2, 3>" would be replaced by "broadcast <0, 1, 2, 3>" Differential Revision: https://reviews.llvm.org/D26802 llvm-svn: 288804	2016-12-06 12:24:14 +00:00
Nemanja Ivanovic	f20e005ea6	[PowerPC] Improvements for BUILD_VECTOR Vol. 4 This is the final patch in the series of patches that improves BUILD_VECTOR handling on PowerPC. This adds a few peephole optimizations to remove redundant instructions. It also adds a large test case which encompasses a large set of code patterns that build vectors - this test case was the motivator for this series of patches. Differential Revision: https://reviews.llvm.org/D26066 llvm-svn: 288800	2016-12-06 11:47:14 +00:00
Daniel Sanders	c27f3eef0a	[globalisel][aarch64] Prefix PartialMappingIdx enumerators with 'PMI_' to fit coding standards. This also stops things like 'None' polluting the llvm::AArch64 namespace. llvm-svn: 288799	2016-12-06 11:33:04 +00:00
Florian Hahn	620808c2dd	[framelowering] Improve tracking of first CS pop instruction. Summary: This patch makes sure FirstCSPop and MBBI never point to DBG_VALUE instructions, which affected the code generated. Reviewers: mkuper, aprantl, MatzeB Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27343 llvm-svn: 288794	2016-12-06 10:24:55 +00:00
Craig Topper	eeda3519db	[X86] Remove another weird scalar sqrt/rcp/rsqrt pattern. This pattern turned a vector sqrt/rcp/rsqrt operation of sse_load_f32/f64 into the the scalar instruction for the operation and put undef into the upper bits. For correctness, the resulting code should still perform the sqrt/rcp/rsqrt on the upper bits after the load is extended since that's what the operation asked for. Particularly in the case where the upper bits are 0, in that case we need calculate the sqrt/rcp/rsqrt of the zeroes and keep the result in the upper-bits. This implies we should be using the packed instruction still. The only test case for this pattern is one I just added so there was no coverage of this. llvm-svn: 288784	2016-12-06 08:08:12 +00:00
Craig Topper	5cd739b585	[X86] Remove bad pattern that caused 128-bit loads being used by scalar sqrt/rcp/rsqrt intrinsics to select the memory form of the corresponding instruction and violate the semantics of the intrinsic. The intrinsics are supposed to pass the upper bits straight through to their output register. This means we need to make sure we still perform the 128-bit load to get those upper bits to pass to give to the instruction since the memory form of the instruction only reads 32 or 64 bits. llvm-svn: 288781	2016-12-06 08:08:04 +00:00
Craig Topper	c5bde35592	[X86] Correct pattern for VSQRTSSr_Int, VSQRTSDr_Int, VRCPSSr_Int, and VRSQRTSSr_Int to not have an IMPLICIT_DEF on the first input. The semantics of the intrinsic are clear and not undefined. The intrinsic takes one argument, the lower bits are affected by the operation and the upper bits should be passed through. The instruction itself takes two operands, the high bits of the first operand are passed through and the low bits of the second operand are modified by the operation. To match this to the intrinsic we should pass the single intrinsic input to both operands. I had to remove the stack folding test for these instructions since they depended on the incorrect behavior. The same register is now used for both inputs so the load can't be folded. llvm-svn: 288779	2016-12-06 08:07:58 +00:00
Craig Topper	94a3ec0053	[X86] Remove scalar logical op alias instructions. Just use COPY_FROM/TO_REGCLASS and the normal packed instructions instead Summary: This patch removes the scalar logical operation alias instructions. We can just use reg class copies and use the normal packed instructions instead. This removes the need for putting these instructions in the execution domain fixing tables as was done recently. I removed the loadf64_128 and loadf32_128 patterns as DAG combine creates a narrower load for (extractelt (loadv4f32)) before we ever get to isel. I plan to add similar patterns for AVX512DQ in a future commit to allow use of the larger register class when available. Reviewers: spatel, delena, zvi, RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27401 llvm-svn: 288771	2016-12-06 04:58:39 +00:00
Chris Bieneman	2716b34915	[CMake] Cleanup TableGen include flags It is kinda crazy to have llvm/include and llvm/lib/Target in the include path for every tablegen invocation for every tablegen-like tool. This patch removes those flags from the tablgen function that is called everywhere by instead creating a variable LLVM_TABLEGEN_FLAGS which is setup in the LLVM source directories. This removes TableGen.cmake's dependency on LLVM_MAIN_SRC_DIR, and LLVM_MAIN_INCLUDE_DIR. llvm-svn: 288770	2016-12-06 04:45:11 +00:00
Matt Arsenault	03686f0b49	AMDGPU: Don't required structured CFG The structured CFG is just an aid to inserting exec mask modification instructions, once that is done we don't really need it anymore. We also do not analyze blocks with terminators that modify exec, so this should only be impacting true branches. llvm-svn: 288744	2016-12-06 01:02:51 +00:00

1 2 3 4 5 ...

40528 Commits