llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-29 23:12:55 +01:00

Author	SHA1	Message	Date
Benjamin Kramer	302eac4a38	Fix tests not to depend on specific regalloc or instruction order. They were failing with -mcpu=atom. llvm-svn: 192890	2013-10-17 12:41:05 +00:00
Andrea Di Biagio	f45ea088f7	Fix edge condition in DAGCombiner to improve codegen of shift sequences. When canonicalizing dags according to the rule (shl (zext (shr X, c1) ), c1) ==> (zext (shl (shr X, c1), c1)) remember to add the new shl dag to the DAGCombiner worklist of nodes. If we don't explicitly add it to the worklist of nodes to visit, we may not trigger later on the rule that folds the shift left + logical shift right into a AND instruction with bitmask. llvm-svn: 192883	2013-10-17 11:02:58 +00:00
Jim Grosbach	504d93cae7	x86: Move bitcasts outside concat_vector. Consider the following: typedef unsigned short ushort4U __attribute__((ext_vector_type(4), aligned(2))); typedef unsigned short ushort4 __attribute__((ext_vector_type(4))); typedef unsigned short ushort8 __attribute__((ext_vector_type(8))); typedef int int4 __attribute__((ext_vector_type(4))); int4 __bbase_cvt_int(ushort4 v) { ushort8 a; a.lo = v; return _mm_cvtepu16_epi32(a); } This generates the, not unreasonable, IR: define <4 x i32> @foo0(double %v.coerce) nounwind ssp { %tmp = bitcast double %v.coerce to <4 x i16> %tmp1 = shufflevector <4 x i16> %tmp, <4 x i16> undef, <8 x i32> <i32 %0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %tmp2 = tail call <4 x i32> @llvm.x86.sse41.pmovzxwd(<8 x i16> %tmp1) ret <4 x i32> %tmp2 } The problem is when type legalization gets hold of the v4i16. It legalizes that by spilling to the stack, then doing a zero-extending load. Things go even more silly from there, ending up with something like: _foo0: movsd %xmm0, -8(%rsp) <== Spill to the stack. movq -8(%rsp), %xmm0 <== Reload it right back out. pmovzxwd %xmm0, %xmm1 <== Here's what we actually asked for. pblendw $1, %xmm1, %xmm0 <== We don't need this at all pmovzxwd %xmm0, %xmm0 <== We already did this ret The v8i8 to v8i16 zext intrinsic gives even worse results, with two table lookups via pshufb instructions(!!). To avoid all that, we can move the bitcasting until after we've formed the wider (legal) vector type. Then our normal codegen flows along nicely and we get the expected: _foo0: pmovzxwd %xmm0, %xmm0 ret rdar://15245794 llvm-svn: 192866	2013-10-17 02:58:06 +00:00
Hans Wennborg	822f0080df	Re-commit r192758 - MC: quote tricky symbol names in asm output The reason this got reverted was that the @feat.00 symbol which was emitted for every TU became quoted, and on cygwin/mingw we use the gas assembler which couldn't handle the quotes. This commit fixes the problem by only emitting @feat.00 for win32, where we use clang -cc1as to assemble. gas would just drop this symbol anyway, so there is no loss there. With @feat.00 gone, there shouldn't be quoted symbols showing up on cygwin since it uses the Itanium ABI, which doesn't put these funny characters in symbols. > Because of win32 mangling, we produce symbol and section names with > funny characters in them, most notably @ characters. > > MC would choke on trying to parse its own assembly output. This patch addresses > that by: > > - Making @ trigger quoting of symbol names > - Also quote section names in the same way > - Just parse section names like other identifiers (to allow for quotes) > - Don't assume @ signifies a symbol variant if it is in a string. llvm-svn: 192859	2013-10-17 01:13:02 +00:00
Yunzhong Gao	23e948dd2f	Enabling 3DNow! prefetch instruction for a few AMD processors: bobcat, jaguar, bulldozer and piledriver. Support for the instruction itself seems to have already been added in r178040. Differential Revision: http://llvm-reviews.chandlerc.com/D1933 llvm-svn: 192828	2013-10-16 19:04:11 +00:00
Benjamin Kramer	0d16abf302	DAGCombiner: Don't fold xor into not if getNOT would introduce an illegal constant. This happens e.g. with <2 x i64> -1 on x86_32. It cannot be generated directly because i64 is illegal. It would be nice if getNOT would handle this transparently, but I don't see a way to generate a legal constant there right now. Fixes PR17487. llvm-svn: 192795	2013-10-16 14:16:19 +00:00
NAKAMURA Takumi	ab81f8a305	Revert r192758 (and r192759), "MC: Better handling of tricky symbol and section names" GNU AS didn't like quotes in symbol names. Error: junk at end of line, first unrecognized character is `"' .def "@feat.00"; "@feat.00" = 1 Reproduced on Cygwin's 2.23.52.20130309 and mingw32's 2.20.1.20100303. llvm-svn: 192775	2013-10-16 08:22:49 +00:00
Rafael Espindola	3779f822e1	Add a triple to this test. llvm-svn: 192767	2013-10-16 02:27:33 +00:00
Rafael Espindola	c17b7cf2ed	Add support for metadata representing .ident directives. llvm-svn: 192764	2013-10-16 01:49:05 +00:00
Hans Wennborg	3b3efddc64	MC: Better handling of tricky symbol and section names Because of win32 mangling, we produce symbol and section names with funny characters in them, most notably @ characters. MC would choke on trying to parse its own assembly output. This patch addresses that by: - Making @ trigger quoting of symbol names - Also quote section names in the same way - Just parse section names like other identifiers (to allow for quotes) - Don't assume @ signifies a symbol variant if it is in a string. Differential Revision: http://llvm-reviews.chandlerc.com/D1945 llvm-svn: 192758	2013-10-16 01:20:40 +00:00
Andrew Trick	e3e67d4a0a	Enable MI Sched for x86. This changes the SelectionDAG scheduling preference to source order. Soon, the SelectionDAG scheduler can be bypassed saving a nice chunk of compile time. Performance differences that result from this change are often a consequence of register coalescing. The register coalescer is far from perfect. Bugs can be filed for deficiencies. On x86 SandyBridge/Haswell, the source order schedule is often preserved, particularly for small blocks. Register pressure is generally improved over the SD scheduler's ILP mode. However, we are still able to handle large blocks that require latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also attempts to discover the critical path in single-block loops and adjust heuristics accordingly. The MI scheduler relies on the new machine model. This is currently unimplemented for AVX, so we may not be generating the best code yet. Unit tests are updated so they don't depend on SD scheduling heuristics. llvm-svn: 192750	2013-10-15 23:33:07 +00:00
Michael Liao	1081bbac6c	Fix PR17546 - Type of index used in extract_vector_elt or insert_vector_elt supposes to be TLI.getVectorIdxTy() which is pointer type on most targets. It'd better to truncate (or zero-extend in case it's changed later) it to mask element type to guarantee they are matching instead of asserting that. llvm-svn: 192722	2013-10-15 17:51:58 +00:00
Michael Liao	a94d0a900a	Fix PR16807 - Lower signed division by constant powers-of-2 to target-independent DAG operators instead of target-dependent ones to support them better on targets where vector types are legal but shift operators on that types are illegal. E.g., on AVX, PSRAW is only available on <8 x i16> though <16 x i16> is a legal type. llvm-svn: 192721	2013-10-15 17:51:02 +00:00
NAKAMURA Takumi	8c0f09fed1	llvm/test/CodeGen/X86/break-avx-dep.ll: Relax an expression to be matched to also r[89], not only rXX. llvm-svn: 192675	2013-10-15 06:36:36 +00:00
Andrew Trick	e196a05dc8	Improve on r192635, ExeDepsFix for avx, and add a test case. rdar:15221834 False AVX register dependencies cause 5x slowdown on flops-5/6 and significant slowdown on several others. This was blocking the switch to MI-Sched. llvm-svn: 192669	2013-10-15 03:39:43 +00:00
Quentin Colombet	cb4b84532c	[X86][FastISel] During X86 fastisel, the address of indirect call was resolved through bitcast, ptrtoint, and inttoptr instructions. This is valid only if the related instructions are in that same basic block, otherwise we may reference variables that were not live accross basic blocks resulting in undefined virtual registers. The bug was exposed when both SDISel and FastISel were used within the same function, i.e., one basic block is issued with FastISel and another with SDISel, as demonstrated with the testcase. <rdar://problem/15192473> llvm-svn: 192636	2013-10-14 22:32:09 +00:00
Nick Lewycky	0da8d88a82	Fix a typo, in a comment, in a test. llvm-svn: 192632	2013-10-14 22:02:53 +00:00
Eric Christopher	1a04817b81	Revert part of a fix from 2010, changes since then: a) x86-64 TLS has been documented b) the code path should use movq for the correct relocation to be generated. I've also added a fixme for the test case that we should improve the code generated, it should look something like is documented in the tls abi document. llvm-svn: 192631	2013-10-14 21:52:26 +00:00
Will Dietz	ad27c13a64	MachineSink: Fix and tweak critical-edge breaking heuristic. Per original comment, the intention of this loop is to go ahead and break the critical edge (in order to sink this instruction) if there's reason to believe doing so might "unblock" the sinking of additional instructions that define registers used by this one. The idea is that if we have a few instructions to sink "together" breaking the edge might be worthwhile. This commit makes a few small changes to help better realize this goal: First, modify the loop to ignore registers defined by this instruction. We don't sink definitions of physical registers, and sinking an SSA definition isn't going to unblock an upstream instruction. Second, ignore uses of physical registers. Instructions that define physical registers are rejected for sinking, and so moving this one won't enable moving any defining instructions. As an added bonus, while virtual register use-def chains are generally small due to SSA goodness, iteration over the uses and definitions (used by hasOneNonDBGUse) for physical registers like EFLAGS can be rather expensive in practice. (This is the original reason for looking at this) Finally, to keep things simple continue to only consider this trick for registers that have a single use (via hasOneNonDBGUse), but to avoid spuriously breaking critical edges only do so if the definition resides in the same MBB and therefore this one directly blocks it from being sunk as well. If sinking them together is meant to be, let the iterative nature of this pass sink the definition into this block first. Update tests to accomodate this change, add new testcase where sinking avoids pipeline stalls. llvm-svn: 192608	2013-10-14 16:57:17 +00:00
Elena Demikhovsky	c460e7e50a	Fixed a bug in dynamic allocation memory on stack. The alignment of allocated space was wrong, see Bugzila 17345. Done by Zvi Rackover <zvi.rackover@intel.com>. llvm-svn: 192573	2013-10-14 07:26:51 +00:00
Benjamin Kramer	3000b81f9a	Force a CPU on test so it doesn't depend on microarchitectural scheduling decisions. llvm-svn: 192532	2013-10-12 11:17:12 +00:00
Quentin Colombet	7ba3455dfc	[DAGCombiner] Load slicing test case: attempt to really fix the buildbots (used sse4.2 instead of avx!). <rdar://problem/14477220> llvm-svn: 192480	2013-10-11 18:54:49 +00:00
Quentin Colombet	c02e5604f4	[DAGCombiner] Reapply load slicing (192471) with a test that explicitly set sse4.2 support. This should fix the buildbots. Original commit message: [DAGCombiner] Slice a big load in two loads when the element are next to each other in memory and the target has paired load and performs post-isel loads combining. E.g., this optimization will transform something like this: a = load i64* addr b = trunc i64 a to i32 c = lshr i64 a, 32 d = trunc i64 c to i32 into: b = load i32* addr1 d = load i32* addr2 Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and performs post-isel loads combining. One should overload TargetLowering::hasPairedLoad to provide this information. The default is false. <rdar://problem/14477220> llvm-svn: 192476	2013-10-11 18:29:42 +00:00
Quentin Colombet	fd0097531f	[DAGCombiner] Revert load slicing (r192471), until I figure out why it fails on ubuntu. llvm-svn: 192474	2013-10-11 18:17:17 +00:00
Quentin Colombet	b60dc81c8b	[DAGCombiner] Slice a big load in two loads when the element are next to each other in memory and the target has paired load and performs post-isel loads combining. E.g., this optimization will transform something like this: a = load i64* addr b = trunc i64 a to i32 c = lshr i64 a, 32 d = trunc i64 c to i32 into: b = load i32* addr1 d = load i32* addr2 Where addr1 = addr2 +/- sizeof(i32), if the target supports paired load and performs post-isel loads combining. One should overload TargetLowering::hasPairedLoad to provide this information. The default is false. <rdar://problem/14477220> llvm-svn: 192471	2013-10-11 18:01:14 +00:00
Matthias Braun	9e9e0f5d4c	Tests: Do not unnecessarily depend on kill comments llvm-svn: 192404	2013-10-10 22:37:49 +00:00
Benjamin Kramer	11421cc493	Disable function padding to get this test to pass on atom. llvm-svn: 192348	2013-10-10 12:46:23 +00:00
Elena Demikhovsky	f24ecf7862	AVX-512: Added VRCP28 and VRSQRT28 instructions and intrinsics. llvm-svn: 192283	2013-10-09 08:16:14 +00:00
Craig Topper	d5082631e1	Add in64BitMode/in32BitMode to the MMX/SSE2/AVX maskmovq/dq instructions. This way the asm parser will pick the right one based on the mode. Instruction selection already did the right thing based on the pointer size. llvm-svn: 192266	2013-10-09 02:18:34 +00:00
Craig Topper	2d70555027	Fix a typo in the mattr part of the run line. llvm-svn: 192174	2013-10-08 06:12:26 +00:00
Craig Topper	3d7c6afb79	Explicitly disable AVX on a bunch of tests so they won't fail on AVX machines post r192171. llvm-svn: 192173	2013-10-08 06:06:57 +00:00
Craig Topper	aa1a4d51f0	Remove some instructions that existed to provide aliases to the assembler. Can be done with InstAlias instead. Unfortunately, this was causing printer to use 'vmovq' or 'vmovd' based on what was parsed. To cleanup the inconsistencies convert all 'vmovd' with 64-bit registers to 'vmovq', but provide an alias so that 'vmovd' will still parse. llvm-svn: 192171	2013-10-08 05:53:50 +00:00
Benjamin Kramer	feace9b737	X86: Fix type check. Just because an integer type is illegal doesn't mean it's i64. Fixes PR17495, where an i24 triggered this code. It's intended to optimize i64 loads on 32 bit x86. llvm-svn: 192123	2013-10-07 19:11:35 +00:00
Matt Arsenault	9c8541d286	Change objectsize intrinsic to accept different address spaces. Bitcasting everything to i8* won't work. Autoupgrade the old intrinsic declarations to use the new mangling. llvm-svn: 192117	2013-10-07 18:06:48 +00:00
Rafael Espindola	499aaf305a	Add support for aliases with linkonce_odr. This will be used to extend constructor aliases in clang. llvm-svn: 192066	2013-10-06 15:10:43 +00:00
Benjamin Kramer	44710574cb	Force a CPU that doesn't have AVX, otherwise this test fails. llvm-svn: 192065	2013-10-06 13:52:41 +00:00
Benjamin Kramer	a7e734d765	X86: Don't fold spills into SSE operations if the stack is unaligned. Regalloc can emit unaligned spills nowadays, but we can't fold the spills into SSE ops if we can't guarantee alignment. PR12250. llvm-svn: 192064	2013-10-06 13:48:22 +00:00
Elena Demikhovsky	cb8eaca2e4	AVX-512: added scalar convert instructions and intrinsics. Fixed load folding in VPERM2I instruction. llvm-svn: 192063	2013-10-06 13:11:09 +00:00
Elena Demikhovsky	0ff833ab99	AVX-512: fixed shuffle lowering in case of BLEND and added VSHUFPS patterns. llvm-svn: 192055	2013-10-06 06:11:18 +00:00
Benjamin Kramer	3a6afef4e7	Emit a better error when running out of registers on inline asm. The most likely case where this error happens is when the user specifies too many register operands. Don't make it look like an internal LLVM bug when we can see that the error is coming from an inline asm instruction. For other instructions we keep the "ran out of registers" error. llvm-svn: 192041	2013-10-05 19:33:37 +00:00
Craig Topper	0a8f3fc996	Remove unneeded TBM intrinsics. The arithmetic/logical operation patterns are sufficient. llvm-svn: 192039	2013-10-05 19:22:59 +00:00
Craig Topper	d0a63f6722	Add an additional pattern for BLCI since opt can turn (not (add x, 1)) into (sub -2, x). llvm-svn: 192037	2013-10-05 17:17:53 +00:00
Rafael Espindola	86d473eceb	Convert test to FileCheck. llvm-svn: 192025	2013-10-05 02:58:36 +00:00
Craig Topper	5b62ea95ec	Remove duplicated test cases that occurred when I applied the same patch file to my model twice. llvm-svn: 191873	2013-10-03 04:27:14 +00:00
Craig Topper	5ac188d0f2	Add patterns for selecting TBM instructions from logical operations. Patch from Yunzhong Gao. llvm-svn: 191871	2013-10-03 04:16:45 +00:00
Elena Demikhovsky	ee11e148e9	AVX-512: fixed a bug in getLoadStoreRegOpcode() for AVX-512 target llvm-svn: 191818	2013-10-02 12:20:42 +00:00
Preston Gurd	c52dfda610	Add test case for PR16785. Thanks for Dimitry Andric, Rafael Espindola, and Benjamin Kramer for providing and progressively reducing the test case! llvm-svn: 191782	2013-10-01 17:02:48 +00:00
Elena Demikhovsky	84c6cd222d	AVX-512: Added X86vzmovl patterns llvm-svn: 191733	2013-10-01 08:38:02 +00:00
Manman Ren	799fd39420	TBAA: remove !tbaa from testing cases when they are not needed. llvm-svn: 191689	2013-09-30 18:17:35 +00:00
Yunzhong Gao	e51da27a74	Adding intrinsics to the llvm backend for TBM instruction set. Phabricator code review is located here: http://llvm-reviews.chandlerc.com/D1750 llvm-svn: 191539	2013-09-27 18:38:42 +00:00

1 2 3 4 5 ...

4259 Commits