llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 05:01:59 +01:00

Author	SHA1	Message	Date
Matt Arsenault	8e07f5aa14	R600/SI: Add some mubuf testcases. I noticed some odd looking cases where addr64 wasn't set when storing to a pointer in an SGPR. This seems to be intentional, and partially tested already. The documentation seems to describe addr64 in terms of which registers addressing modifiers come from, but I would expect to always need addr64 when using 64-bit pointers. If no offset is applied, it makes sense to not need to worry about doing a 64-bit add for the final address. A small immediate offset can be applied, so is it OK to not have addr64 set if a carry is necessary when adding the base pointer in the resource to the offset? llvm-svn: 217785	2014-09-15 16:48:01 +00:00
Matt Arsenault	236d59890d	R600/SI: Add preliminary support for flat address space llvm-svn: 217777	2014-09-15 15:41:53 +00:00
Chandler Carruth	3ee73d8f19	[x86] Begin emitting PBLENDW instructions for integer blend operations when SSE4.1 is available. This removes a ton of domain crossing from blend code paths that were ending up in the floating point code path. This is just the tip of the iceberg though. The real switch is for integer blend lowering to more actively rely on this instruction being available so we don't hit shufps at all any longer. =] That will come in a follow-up patch. Another place where we need better support is for using PBLENDVB when doing so avoids the need to have two complementary PSHUFB masks. llvm-svn: 217767	2014-09-15 12:40:54 +00:00
Chandler Carruth	2b509d476d	[x86] Add an explicit SSE3 run to this test and flesh out a bunch of missing specific checks. While there is a lot of redundancy here where all-but-one mode use the same code generation, I'd rather have each variant spelled out and checked so that readers aren't misled by an omission in the test suite. llvm-svn: 217765	2014-09-15 11:40:20 +00:00
Chandler Carruth	7097776d2e	[x86] Teach the x86 DAG combiner to form UNPCKLPS and UNPCKHPS instructions from the relevant shuffle patterns. This is the last tweak I'm aware of to generate essentially perfect v4f32 and v2f64 shuffles with the new vector shuffle lowering up through SSE4.1. I'm sure I've missed some and it'd be nice to check since v4f32 is amenable to exhaustive exploration, but this is all of the tricks I'm aware of. With AVX there is a new trick to use the VPERMILPS instruction, that's coming up in a subsequent patch. llvm-svn: 217761	2014-09-15 11:26:25 +00:00
Chandler Carruth	1aad39bc7a	[x86] Teach the x86 DAG combiner to form MOVSLDUP and MOVSHDUP instructions when it finds an appropriate pattern. These are lovely instructions, and its a shame to not use them. =] They are fast, and can hand loads folded into their operands, etc. I've also plumbed the comment shuffle decoding through the various layers so that the test cases are printed nicely. llvm-svn: 217758	2014-09-15 11:15:23 +00:00
Chandler Carruth	8d1918703c	[x86] Undo a flawed transform I added to form UNPCK instructions when AVX is available, and generally tidy up things surrounding UNPCK formation. Originally, I was thinking that the only advantage of PSHUFD over UNPCK instruction variants was its free copy, and otherwise we should use the shorter encoding UNPCK instructions. This isn't right though, there is a larger advantage of being able to fold a load into the operand of a PSHUFD. For UNPCK, the operand must be in a register so it can be the second input. This removes the UNPCK formation in the target-specific DAG combine for v4i32 shuffles. It also lifts the v8 and v16 cases out of the AVX-specific check as they are potentially replacing multiple instructions with a single instruction and so should always be valuable. The floating point checks are simplified accordingly. This also adjusts the formation of PSHUFD instructions to attempt to match the shuffle mask to one which would fit an UNPCK instruction variant. This was originally motivated to allow it to match the UNPCK instructions in the combiner, but clearly won't now. Eventually, we should add a MachineCombiner pass that can form UNPCK instructions post-RA when the operand is known to be in a register and thus there is no loss. llvm-svn: 217755	2014-09-15 10:35:41 +00:00
Chandler Carruth	c0324d3763	[x86] Teach the new vector shuffle lowering to use 'punpcklwd' and 'punpckhwd' instructions when suitable rather than falling back to the generic algorithm. While we could canonicalize to these patterns late in the process, that wouldn't help when the freedom to use them is only visible during initial lowering when undef lanes are well understood. This, it turns out, is very important for matching the shuffle patterns that are used to lower sign extension. Fixes a small but relevant regression in gcc-loops with the new lowering. When I changed this I noticed that several 'pshufd' lowerings became unpck variants. This is bad because it removes the ability to freely copy in the same instruction. I've adjusted the widening test to handle undef lanes correctly and now those will correctly continue to use 'pshufd' to lower. However, this caused a bunch of churn in the test cases. No functional change, just churn. Both of these changes are part of addressing a general weakness in the new lowering -- it doesn't sufficiently leverage undef lanes. I've at least a couple of patches that will help there at least in an academic sense. llvm-svn: 217752	2014-09-15 09:02:37 +00:00
Chandler Carruth	3aaf043ab0	[x86] Teach the new vector shuffle lowering to use BLENDPS and BLENDPD. These are super simple. They even take precedence over crazy instructions like INSERTPS because they have very high throughput on modern x86 chips. I still have to teach the integer shuffle variants about this to avoid so many domain crossings. However, due to the particular instructions available, that's a touch more complex and so a separate patch. Also, the backend doesn't seem to realize it can commute blend instructions by negating the mask. That would help remove a number of copies here. Suggestions on how to do this welcome, it's an area I'm less familiar with. llvm-svn: 217744	2014-09-14 23:43:33 +00:00
NAKAMURA Takumi	ee2b59b691	llvm/test/CodeGen/X86/vec_shuffle-38.ll: Add explicit -mtriple=x86_64-unknown to avoid incompatibility of win32. llvm-svn: 217742	2014-09-14 23:39:01 +00:00
Chandler Carruth	20b263c37d	[x86] Add an SSE41 mode to this test. Nothing interesting here, its the same as SSE3. llvm-svn: 217741	2014-09-14 23:28:12 +00:00
Chandler Carruth	a6586c7127	[x86] Switch this test to use an ALL prefix with special SSE2 and SSE3 variants where significant. This will make it more obvious what is happening when we start using blends in SSE41. llvm-svn: 217740	2014-09-14 23:19:37 +00:00
Chandler Carruth	22df61c791	[x86] Add some test cases where we should emit blendpd in SSE4.1. No actual change yet though. llvm-svn: 217739	2014-09-14 23:15:52 +00:00
Chandler Carruth	31d65bc142	[x86] Teach the vector combiner that picks a canonical shuffle from to support transforming the forms from the new vector shuffle lowering to use 'movddup' when appropriate. A bunch of the cases where we actually form 'movddup' don't actually show up in the test results because something even later than DAG legalization maps them back to 'unpcklpd'. If this shows back up as a performance problem, I'll probably chase it down, but it is at least an encoded size loss. =/ To make this work, also always do this canonicalizing step for floating point vectors where the baseline shuffle instructions don't provide any free copies of their inputs. This also causes us to canonicalize unpck[hl]pd into mov{hl,lh}ps (resp.) which is a nice encoding space win. There is one test which is "regressed" by this: extractelement-load. There, the test case where the optimization it is testing fails, the exact instruction pattern which results is slightly different. This should probably be fixed by having the appropriate extract formed earlier in the DAG, but that would defeat the purpose of the test.... If this test case is critically important for anyone, please let me know and I'll try to work on it. The prior behavior was actually contrary to the comment in the test case and seems likely to have been an accident. llvm-svn: 217738	2014-09-14 22:41:37 +00:00
Matt Arsenault	89c98636b8	R600/SI: Fix broken check lines llvm-svn: 217736	2014-09-14 18:32:05 +00:00
Juergen Ributzka	e238e394d2	[FastISel][AArch64] Add support for non-native types for logical ops. Extend the logical ops selection to also support non-native types such as i1, i8, and i16. Fixes rdar://problem/18330589. llvm-svn: 217732	2014-09-13 23:46:28 +00:00
Chad Rosier	e7b10df26b	[AArch64] Enable post-RA MI scheduler. Phabricator Revision: http://reviews.llvm.org/D5278 Patch by Sanjin Sijaric! llvm-svn: 217693	2014-09-12 17:40:39 +00:00
NAKAMURA Takumi	1668088977	llvm/test/CodeGen/X86/vec_ctbits.ll: Add explicit -mtriple=x86_64-unknown. It was incompatible to Win32 x64. llvm-svn: 217683	2014-09-12 15:10:56 +00:00
Bill Schmidt	b47e7e1e1d	Address comments on r217622 llvm-svn: 217680	2014-09-12 14:26:36 +00:00
Benjamin Kramer	ed321129a0	Legalizer: Use the scalar bit width when promoting bit counting instrs on vectors. e.g. when promoting ctlz from <2 x i32> to <2 x i64> we have to fixup the result by 32 bits, not 64. PR20917. llvm-svn: 217671	2014-09-12 12:50:27 +00:00
Matt Arsenault	0d246062cd	R600/SI: Fix off by 1 error in used register count The register numbers start at 0, so if only 1 register was used, this was reported as 0. llvm-svn: 217636	2014-09-11 22:51:37 +00:00
Quentin Colombet	74d3a27ed8	[CodeGenPrepare] Teach the addressing mode matcher how to promote zext. I.e., teach it about 'sext (zext a to ty) to ty2' => zext a to ty2. llvm-svn: 217629	2014-09-11 21:22:14 +00:00
Bill Schmidt	f543639f31	Add missing colon to RUN line... llvm-svn: 217623	2014-09-11 20:13:52 +00:00
Bill Schmidt	020f302f01	[PATCH, PowerPC] Accept 'U' and 'X' constraints in inline asm Inline asm may specify 'U' and 'X' constraints to print a 'u' for an update-form memory reference, or an 'x' for an indexed-form memory reference. However, these are really only useful in GCC internal code generation. In inline asm the operand of the memory constraint is typically just a register containing the address, so 'U' and 'X' make no sense. This patch quietly accepts 'U' and 'X' in inline asm patterns, but otherwise does nothing. If we ever unexpectedly see a non-register, we'll assert and sort it out afterwards. I've added a new test for these constraints; the test case should be used for other asm-constraints changes down the road. llvm-svn: 217622	2014-09-11 20:10:03 +00:00
Matt Arsenault	fa1f8e81fb	Add triple to test to fix bots llvm-svn: 217612	2014-09-11 17:50:20 +00:00
Brad Smith	9805224039	Provide an implementation of getNoopForMachoTarget for SPARC. llvm-svn: 217611	2014-09-11 17:40:51 +00:00
Matt Arsenault	dbc82e3483	Add DAG combine for shl + add of constants. Do (shl (add x, c1), c2) -> (add (shl x, c2), c1 << c2) This is already done for multiplies, but since multiplies by powers of two are turned into shifts, we also need to handle it here. This might want checks for isLegalAddImmediate to avoid transforming an add of a legal immediate with one that isn't. llvm-svn: 217610	2014-09-11 17:34:19 +00:00
Adam Nemet	50897a4c50	[AVX512] Fix miscompile for unpack r189189 implemented AVX512 unpack by essentially performing a 256-bit unpack between the low and the high 256 bits of src1 into the low part of the destination and another unpack of the low and high 256 bits of src2 into the high part of the destination. I don't think that's how unpack works. AVX512 unpack simply has more 128-bit lanes but other than it works the same way as AVX. So in each 128-bit lane, we're always interleaving certain parts of both operands rather different parts of one of the operands. E.g. for this: __v16sf a = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 }; __v16sf b = { 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31 }; __v16sf c = __builtin_shufflevector(a, b, 0, 8, 1, 9, 4, 12, 5, 13, 16, 24, 17, 25, 20, 28, 21, 29); we generated punpcklps (notice how the elements of a and b are not interleaved in the shuffle). In turn, c was set to this: 0 16 1 17 4 20 5 21 8 24 9 25 12 28 13 29 Obviously this should have just returned the mask vector of the shuffle vector. I mostly reverted this change and made sure the original AVX code worked for 512-bit vectors as well. Also updated the tests because they matched the logic from the code. llvm-svn: 217602	2014-09-11 16:51:10 +00:00
Sanjay Patel	92c3cafc18	Add triple and remove hashes to account for buildbot differences in comment strings. llvm-svn: 217601	2014-09-11 16:08:44 +00:00
Sanjay Patel	099c1958cc	Combine fmul vector FP constants when unsafe math is allowed. This is an extension of the change made with r215820: http://llvm.org/viewvc/llvm-project?view=revision&revision=215820 That patch allowed combining of splatted vector FP constants that are multiplied. This patch allows combining non-uniform vector FP constants too by relaxing the check on the type of vector. Also, canonicalize a vector fmul in the same way that we already do for scalars - if only one operand of the fmul is a constant, make it operand 1. Otherwise, we miss potential folds. This fold is also done by -instcombine, but it's possible that extra fmuls may have been generated during lowering. Differential Revision: http://reviews.llvm.org/D5254 llvm-svn: 217599	2014-09-11 15:45:27 +00:00
Aaron Watry	c2b54eada1	R600: Test local atomics for evergreen Now that the operations are all implemented, we can test this sub-arch here. Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 217595	2014-09-11 15:02:52 +00:00
Tilmann Scheller	9817f96600	[ARM] Add Thumb-2 code size optimization regression test for LSR (register). llvm-svn: 217582	2014-09-11 10:45:50 +00:00
Tilmann Scheller	2b9d5dcb97	[ARM] Add Thumb-2 code size optimization regression test for LSR (immediate). llvm-svn: 217581	2014-09-11 10:42:17 +00:00
Arnaud A. de Grandmaison	1101db7b14	[AArch64] Reenable the PBQP test now that the leak issue has been fixed. David Blaikie's commits r217563 & r217564, which added shared_ptr to the CostPool have fixed some memory leak issues exposed by the PBQP with coalescing constraints. The sanitizer bot was failing because of those leaks. Now that the leaks are gone, we can reenable the aarch64/pbqp test. llvm-svn: 217580	2014-09-11 10:39:52 +00:00
Tilmann Scheller	9745855d08	[ARM] Add Thumb-2 code size optimization regression test for LSL (register). llvm-svn: 217579	2014-09-11 10:33:39 +00:00
Tilmann Scheller	c86b96c835	[ARM] Add Thumb2 code size optimization regression test for LSL (immediate). llvm-svn: 217576	2014-09-11 10:29:42 +00:00
Chandler Carruth	3a087dd25f	[x86] Fixup r217565 which baked in an assumption about the function name that breaks on some platforms. This part of the test just doesn't matter... llvm-svn: 217575	2014-09-11 10:21:25 +00:00
David Xu	4d7f013423	Build correct vector filled with undef nodes llvm-svn: 217570	2014-09-11 05:10:28 +00:00
Chandler Carruth	d9125d805f	[x86] FileCheck-ize this test. llvm-svn: 217565	2014-09-11 00:13:35 +00:00
Matt Arsenault	fbad29f334	R600/SI: Fix losing chain when fixing reg class of loads. The lost chain resulting in earlier side effecting nodes being deleted. llvm-svn: 217561	2014-09-10 23:26:19 +00:00
Matt Arsenault	e6f7aa4c63	R600: Custom lower frem llvm-svn: 217553	2014-09-10 21:44:27 +00:00
Arnaud A. de Grandmaison	292a69bd57	[AArch64] Temporarily desactivate the PBQP test, while I investigate some leaks in the allocator llvm-svn: 217531	2014-09-10 18:40:18 +00:00
Arnaud A. de Grandmaison	99989cbd0c	[AArch64] Add experimental PBQP support This adds target specific support for using the PBQP register allocator on the AArch64, for the A57 cpu. By default, the PBQP allocator is not used, unless explicitely required on the command line with "-aarch64-pbqp". llvm-svn: 217504	2014-09-10 14:06:10 +00:00
Asiri Rathnayake	245674ae13	[AArch 64] Use a constant pool load for weak symbol references when using static relocation model and small code model. Summary: currently we generate GOT based relocations for weak symbol references regardless of the underlying relocation model. This should be change so that in static relocation model we use a constant pool load instead. Patch from: Keith Walker Reviewers: Renato Golin, Tim Northover llvm-svn: 217503	2014-09-10 13:54:38 +00:00
Tim Northover	c24289aa47	ARM: don't size-reduce STMs using the LR register. The only Thumb-1 multi-store capable of using LR is the PUSH instruction, which translates to STMDB, so we shouldn't convert STMIAs. Patch by Sergey Dmitrouk. llvm-svn: 217498	2014-09-10 12:53:28 +00:00
Job Noorman	23a3720448	Drop the W postfix on the 16-bit registers. This ensures the inline assembly register constraints are properly recognised in TargetLowering::getRegForInlineAsmConstraint. llvm-svn: 217479	2014-09-10 06:58:14 +00:00
Pavel Chupin	45cea6e50f	[x32] Emit callq for CALLpcrel32 Summary: In AT&T annotation for both x86_64 and x32 calls should be printed as callq in assembly. It's only a matter of correct mnemonic, object output is ok. Test Plan: trivial test added Reviewers: nadav, dschuff, craig.topper Subscribers: llvm-commits, zinovy.nis Differential Revision: http://reviews.llvm.org/D5213 llvm-svn: 217435	2014-09-09 11:54:12 +00:00
Renato Golin	024d56e8f8	ARM: Negative offset support problem This patch is to permit a negative offset usage for a non frame access. Patch by Igor Oblakov. llvm-svn: 217431	2014-09-09 09:57:59 +00:00
Bob Wilson	cc87ed123c	Set trunc store action to Expand for all X86 targets. When compiling without SSE2, isTruncStoreLegal(F64, F32) would return Legal, whereas with SSE2 it would return Expand. And since the Target doesn't seem to actually handle a truncstore for double -> float, it would just output a store of a full double in the space for a float hence overwriting other bits on the stack. Patch by Luqman Aden! llvm-svn: 217410	2014-09-09 01:13:36 +00:00
Hans Wennborg	6ebf2a8b60	Fast-ISel: Remove dead code after falling back from selecting call instructions (PR20863) Previously, fast-isel would not clean up after failing to select a call instruction, because it would have called flushLocalValueMap() which moves the insertion point, making SavedInsertPt in selectInstruction() invalid. Fixing this by making SavedInsertPt a member variable, and having flushLocalValueMap() update it. This removes some redundant code at -O0, and more importantly fixes PR20863. Differential Revision: http://reviews.llvm.org/D5249 llvm-svn: 217401	2014-09-08 20:24:10 +00:00

1 2 3 4 5 ...

10738 Commits