llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00

Author	SHA1	Message	Date
Chandler Carruth	4c0a2a8001	[x86] Add some tests for a common unpack pattern of vector shuffle that has a remarkably unique and efficient lowering. While we get this some of the time already, we miss a few cases and there wasn't a principled reason we got it. We should at least test this. v8 already has tests for this pattern. llvm-svn: 222607	2014-11-22 05:44:43 +00:00
Tom Stellard	3929e69978	R600/SI: Add a failing test case for offset order in ds_read2 instructions llvm-svn: 222585	2014-11-21 22:31:47 +00:00
Tom Stellard	a112fe4e40	R600/SI: Emit s_mov_b32 m0, -1 before every DS instruction This s_mov_b32 will write to a virtual register from the M0Reg class and all the ds instructions now take an extra M0Reg explicit argument. This change is necessary to prevent issues with the scheduler mixing together instructions that expect different values in the m0 registers. llvm-svn: 222583	2014-11-21 22:31:44 +00:00
Tom Stellard	484f10138e	R600/SI: Add SIFoldOperands pass This pass attempts to fold the source operands of mov and copy instructions into their uses. llvm-svn: 222581	2014-11-21 22:06:37 +00:00
Jozef Kolek	52fa965cf8	[mips][microMIPS] This patch implements functionality in MIPS delay slot filler such as if delay slot filler have to put NOP instruction into the delay slot of microMIPS BEQ or BNE instruction which uses the register $0, then instead of emitting NOP this instruction is replaced by the corresponding microMIPS compact branch instruction, i.e. BEQZC or BNEZC. Differential Revision: http://reviews.llvm.org/D3566 llvm-svn: 222580	2014-11-21 22:04:35 +00:00
Tom Stellard	267a21d6d7	R600/SI: Use hex notation for constant in test llvm-svn: 222578	2014-11-21 22:00:13 +00:00
Sanjay Patel	776e5485fb	Add a feature flag for slow 32-byte unaligned memory accesses [x86]. This patch adds a feature flag to avoid unaligned 32-byte load/store AVX codegen for Sandy Bridge and Ivy Bridge. There is no functionality change intended for those chips. Previously, the absence of AVX2 was being used as a proxy to detect this feature. But that hindered codegen for AVX-enabled AMD chips such as btver2 that do not have the 32-byte unaligned access slowdown. Performance measurements are included in PR21541 ( http://llvm.org/bugs/show_bug.cgi?id=21541 ). Differential Revision: http://reviews.llvm.org/D6355 llvm-svn: 222544	2014-11-21 17:40:04 +00:00
Chandler Carruth	5e598c0342	[x86] Restructure the checking patterns for v16 and v32 avx2 vector shuffle lowering to allow much better blend matching. Specifically, with the new structure the code seems clearer to me and we correctly can hit the cases where merging two 128-bit lanes is a clear win and can be shuffled cheaply afterward. llvm-svn: 222539	2014-11-21 14:53:03 +00:00
Chandler Carruth	7491f1f32f	[x86] Make the previous logic significantly less conservative and get a bunch more improvements. Non-lane-crossing is fine, the key is that lane merging only makes sense for single-input shuffles. Not sure why I got so turned around here. The code all works, I was just using the wrong model for it. This only updates v4 and v8 lowering. The v16 and v32 lowering requires restructuring the entire check sequence. llvm-svn: 222537	2014-11-21 14:33:24 +00:00
Andrea Di Biagio	0a8cf1ad5a	[DAG] Teach how to turn a build_vector into a shuffle if some of the operands are zero. Before this patch, the DAGCombiner only tried to convert build_vector dag nodes into shuffles if all operands were either extract_vector_elt or undef. This patch improves that logic and teaches the DAGCombiner how to deal with build_vector dag nodes where one or more operands are zero. A build_vector dag node with some zero operands is turned into a shuffle only if the resulting shuffle mask is legal for the target. llvm-svn: 222536	2014-11-21 14:32:06 +00:00
Chandler Carruth	8387bec088	[x86] Teach the x86 vector shuffle lowering to detect mergable 128-bit lanes. By special casing these we can often either reduce the total number of shuffles significantly or reduce the number of (high latency on Haswell) AVX2 shuffles that potentially cross 128-bit lanes. Even when these don't actually cross lanes, they have much higher latency to support that. Doing two of them and a blend is worse than doing a single insert across the 128-bit lanes to blend and then doing a single interleaved shuffle. While this seems like a narrow case, it kept cropping up on me and the difference is huge as you can see in many of the test cases. I first hit this trying to perfectly fix the interleaving shuffle patterns used by Halide for AVX2. llvm-svn: 222533	2014-11-21 13:56:05 +00:00
Chandler Carruth	5646862a2e	[x86] Remove more windows line endings that slipped into this file... llvm-svn: 222528	2014-11-21 12:33:46 +00:00
Chandler Carruth	2db7c4cf32	[x86] Add a bunch of test cases to 256-bit shuffles that exercise merging 128-bit subvectors and also shuffling all the elements of those subvectors. Currently we generate pretty bad code for many of these, but I'm testing a patch that should dramatically improve this in addition to making the shuffle lowering robust to other changes. llvm-svn: 222525	2014-11-21 12:17:50 +00:00
Alexey Volkov	235268b4ed	[X86] For Silvermont CPU use 16-bit division instead of 64-bit for small positive numbers Differential Revision: http://reviews.llvm.org/D5938 llvm-svn: 222521	2014-11-21 11:19:34 +00:00
Hao Liu	9cb82be410	DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510	2014-11-21 06:39:58 +00:00
Hal Finkel	ac26448a5c	[PPC] Use SeparateConstOffsetFromGEP This mirrors r222331, which enabled SeparateConstOffsetFromGEP on AArch64, in the PowerPC backend. Yields, on a POWER7 machine, a 30% speedup on SingleSource/Benchmarks/Shootout/nestedloop (this might just be from LICM, there is a store moved out of the inner loop) and a potential speedup on MultiSource/Benchmarks/mediabench/mpeg2/mpeg2dec/mpeg2decode. Regardless, it makes some code look cleaner, and synchronizing the backends in this regard seems like a generally good thing. llvm-svn: 222504	2014-11-21 04:35:51 +00:00
Quentin Colombet	8fea50c066	[X86] Do not custom lower UINT_TO_FP when the target type does not match the custom lowering. <rdar://problem/19026326> llvm-svn: 222489	2014-11-21 00:47:19 +00:00
Saleem Abdulrasool	0de13e90eb	X86: use the correct alloca symbol for Windows Itanium Windows itanium targets the MSVCRT, and the stack probe symbol is provided by MSVCRT. This corrects the emission of stack probes on i686-windows-itanium. llvm-svn: 222439	2014-11-20 18:01:26 +00:00
Andrea Di Biagio	b770dd344e	[X86] Improved lowering of v4x32 build_vector dag nodes. This patch improves the lowering of v4f32 and v4i32 build_vector dag nodes that are known to have at least two non-zero elements. With this patch, a build_vector that performs a blend with zero is converted into a shuffle. This is done to let the shuffle legalizer expand the dag node in a optimal way. For example, if we know that a build_vector performs a blend with zero, we can try to lower it as a movq/blend instead of always selecting an insertps. This patch also improves the logic that lowers a build_vector into a insertps with zero masking. See for example the extra test cases added to test sse41.ll. Differential Revision: http://reviews.llvm.org/D6311 llvm-svn: 222375	2014-11-19 19:34:29 +00:00
Tom Stellard	271c0a936e	R600/SI: Make SIInstrInfo::isOperandLegal() more strict A register operand that has a common sub-class with its instruction's defined register class is not always legal. For example, SReg_32 and M0Reg both have a common sub-class, but we can't use an SReg_32 in instructions that expect a M0Reg. This prevents the llvm.SI.sendmsg.ll test from failing when the fold operand pass is added. llvm-svn: 222368	2014-11-19 16:58:49 +00:00
Jozef Kolek	d19675f448	[mips][microMIPS] Implement CodeGen support for 16-bit instruction ADDIUR2. Differential Revision: http://reviews.llvm.org/D5800 llvm-svn: 222352	2014-11-19 13:23:58 +00:00
Jozef Kolek	9fbf00198c	[mips][microMIPS] Implement CodeGen support for ADDIUS5 instruction. Differential Revision: http://reviews.llvm.org/D5799 llvm-svn: 222351	2014-11-19 13:11:09 +00:00
Jozef Kolek	e466cd5b54	[mips][microMIPS] Implement SDBBP and RDHWR instructions. Differential Revision: http://reviews.llvm.org/D5240 llvm-svn: 222347	2014-11-19 11:25:50 +00:00
Simon Pilgrim	e5f972f1c1	[X86][SSE] pslldq/psrldq byte shifts/rotation for SSE2 This patch builds on http://reviews.llvm.org/D5598 to perform byte rotation shuffles (lowerVectorShuffleAsByteRotate) on pre-SSSE3 (palignr) targets - pre-SSSE3 is only enabled on i8 and i16 vector targets where it is a more definite performance gain. I've also added a separate byte shift shuffle (lowerVectorShuffleAsByteShift) that makes use of the ability of the SLLDQ/SRLDQ instructions to implicitly shift in zero bytes to avoid the need to create a zero register if we had used palignr. Differential Revision: http://reviews.llvm.org/D5699 llvm-svn: 222340	2014-11-19 10:06:49 +00:00
Hao Liu	00d285aca3	[AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on AArch64 backend. SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE and LICM. By enabling these passes we can have better address calculations and generate a better addressing mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57. Reviewed in http://reviews.llvm.org/D5864. llvm-svn: 222331	2014-11-19 06:39:53 +00:00
Weiming Zhao	c7ce2ee93f	[Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availability llvm-svn: 222292	2014-11-19 00:29:14 +00:00
Matt Arsenault	73f4bd8758	R600/SI: Implement areMemAccessesTriviallyDisjoint This partially makes up for not having address spaces used for alias analysis in some simple cases. This is not yet enabled by default so shouldn't change anything yet. llvm-svn: 222286	2014-11-19 00:01:31 +00:00
Simon Pilgrim	daabed160f	[X86][AVX] 256-bit vector stack unaligned load/stores identification Under many circumstances the stack is not 32-byte aligned, resulting in the use of the vmovups/vmovupd/vmovdqu instructions when inserting ymm reloads/spills. This minor patch adds these instructions to the isFrameLoadOpcode/isFrameStoreOpcode helpers so that they can be correctly identified and not be treated as folded reloads/spills. This has also been noticed by http://llvm.org/bugs/show_bug.cgi?id=18846 where it was causing redundant spills - I've added a reduced test case at test/CodeGen/X86/pr18846.ll Differential Revision: http://reviews.llvm.org/D6252 llvm-svn: 222281	2014-11-18 23:38:19 +00:00
Chad Rosier	ccf41a5c21	[FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmetic shift-right for booleans (i1). Arithmetic shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222272	2014-11-18 22:41:49 +00:00
Chad Rosier	7153154a79	[FastISel][AArch64] Also allow folding of sign-/zero-extend and logical shift-right for booleans (i1). Logical shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222270	2014-11-18 22:38:42 +00:00
Juergen Ributzka	b3791ee3a7	[FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for "zero" shifts." Shifts also perform sign-/zero-extends to larger types, which requires us to emit an integer extend instead of a simple COPY. Related to PR21594. llvm-svn: 222257	2014-11-18 21:20:17 +00:00
Matt Arsenault	84d2214a94	R600/SI: Move SIFixSGPRCopies to inst selector passes This should expose more of the actually used VALU instructions to the machine optimization passes. This also should help getting i1 handling into a better state. For not entirly understood reasons, this fixes the split-scalar-i64-add.ll test where a 64-bit add would only partially be moved to the VALU resulting in use of undefined VCC. llvm-svn: 222256	2014-11-18 21:06:58 +00:00
Tom Stellard	962ccd7f85	R600/SI: Make sure resource descriptors are always stored in SGPRs llvm-svn: 222253	2014-11-18 20:39:39 +00:00
Juergen Ributzka	a2005be2b4	[FastISel][AArch64] Fix shift-immediate emission for "zero" shifts. This change emits a COPY for a shift-immediate with a "zero" shift value. This fixes PR21594 where we emitted a shift instruction with an incorrect immediate operand. llvm-svn: 222247	2014-11-18 19:58:59 +00:00
Alexey Volkov	3cc8e8e28b	[X86] Use ADD/SUB instead of INC/DEC for Haswell and Broadwell CPUs Differential Revision: http://reviews.llvm.org/D5934 llvm-svn: 222141	2014-11-17 16:17:51 +00:00
Renato Golin	b92ad16856	Fix ARM triple parsing The triple parser should only accept existing architecture names when the triple starts with armv, armebv, thumbv or thumbebv. Patch by Gabor Ballabas. llvm-svn: 222129	2014-11-17 14:08:57 +00:00
Oliver Stannard	93a823bc7a	[Thumb1] Re-write emitThumbRegPlusImmediate This was motivated by a bug which caused code like this to be miscompiled: declare void @take_ptr(i8) define void @test() { %addr1.32 = alloca i8 %addr2.32 = alloca i32, i32 1028 call void @take_ptr(i8 %addr1) ret void } This was emitting the following assembly to get the value of %addr1: add r0, sp, #1020 add r0, r0, #8 However, "add r0, r0, #8" is not a valid Thumb1 instruction, and this could not be assembled. The generated object file contained this, resulting in r0 holding SP+8 rather tha SP+1028: add r0, sp, #1020 add r0, sp, #8 This function looked like it could have caused miscompilations for other combinations of registers and offsets (though I don't think it is currently called with these), and the heuristic it used did not match the emitted code in all cases. llvm-svn: 222125	2014-11-17 11:18:10 +00:00
Oliver Stannard	2efad103c3	Fix optimisations of SELECT_CC which assumed result is boolean Some optimisations in DAGCombiner cause miscompilations for targets that use TargetLowering::UndefinedBooleanContent, because they assume that the results of a SELECT_CC node are boolean values, and can be safely ANDed, ORed and XORed. These optimisations are only valid for targets that use ZeroOrOneBooleanContent or ZeroOrNegativeOneBooleanContent. This is a follow-up to D6210/r221693. llvm-svn: 222123	2014-11-17 10:49:31 +00:00
Bob Wilson	0774bc6c56	Fix CR/LF line endings in test case. llvm-svn: 222120	2014-11-17 08:00:45 +00:00
Andrea Di Biagio	5475bc1d1b	[DAG] Improved target independent vector shuffle folding logic. This patch teaches the DAGCombiner how to combine shuffles according to rules: shuffle(shuffle(A, Undef, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), B, M1) -> shuffle(B, A, M2) shuffle(shuffle(A, B, M0), A, M1) -> shuffle(B, A, M2) llvm-svn: 222090	2014-11-15 22:56:25 +00:00
Simon Pilgrim	5e170d7652	[X86][SSE] Improve legal SHUFP and PSHUFD shuffle matching Updated X86TargetLowering::isShuffleMaskLegal to match SHUFP masks with commuted inputs and PSHUFD masks that reference the second input. As part of this I've refactored isPSHUFDMask to work in a more general manner and allow it to match against either the first or second input vector. Differential Revision: http://reviews.llvm.org/D6287 llvm-svn: 222087	2014-11-15 21:13:05 +00:00
Matt Arsenault	1298bf6e9c	R600: Permute operands when selecting legacy min/max This gets the correct NaN behavior based on the compare type the hardware uses. This now passes the new piglit test I have for this on SI. Add stricter tests for the operand order. llvm-svn: 222079	2014-11-15 05:02:57 +00:00
Tim Northover	f511e3a633	ARM: refactor .cfi_def_cfa_offset emission. We use to track quite a few "adjusted" offsets through the FrameLowering code to account for changes in the prologue instructions as we went and allow the emission of correct CFA annotations. However, we were missing a couple of cases and the code was almost impenetrable. It's easier to just add any stack-adjusting instruction to a list and emit them together. llvm-svn: 222057	2014-11-14 22:45:33 +00:00
Tim Northover	5eff3fb39c	ARM: correctly calculate the offset of FP in its push. When we folded the DPR alignment gap into a push, we weren't noting the extra distance from the beginning of the push to the FP, and so FP ended up pointing at an incorrect offset. The .cfi_def_cfa_offset directives are still wrong in this case, but I think that can be improved by refactoring. llvm-svn: 222056	2014-11-14 22:45:31 +00:00
Tim Northover	3327a92a74	ARM: simplify test. The test's DWARF stubs were there just to trigger the emission of .cfi directives. Fortunately, the NetBSD ABI already demands proper DWARF unwind info, so it's easier to just use that triple. llvm-svn: 222055	2014-11-14 22:45:23 +00:00
Tom Stellard	c18e457d39	R600/SI: Fix spilling of m0 register If we have spilled the value of the m0 register, then we need to restore it with v_readlane_b32 to a regular sgpr, because v_readlane_b32 can't write to m0. v_readlane_b32 can't write to m0, so llvm-svn: 222036	2014-11-14 20:43:26 +00:00
Matt Arsenault	633ce0ecd7	R600/SI: Combine min3/max3 instructions llvm-svn: 222032	2014-11-14 20:08:52 +00:00
Matt Arsenault	d4196f855a	R600/SI: Fix verifier error from a branch on IMPLICIT_DEF SIILowerI1Copies wasn't correctly handling this case. llvm-svn: 222020	2014-11-14 18:43:41 +00:00
Matt Arsenault	346bdd92b1	R600/SI: Match integer min / max instructions llvm-svn: 222015	2014-11-14 18:30:06 +00:00
Matt Arsenault	8ddb68217c	R600/SI: Use S_BFE_I64 for 64-bit sext_inreg llvm-svn: 222012	2014-11-14 18:18:16 +00:00

1 2 3 4 5 ...

11284 Commits