llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-01 00:12:50 +01:00

Author	SHA1	Message	Date
Craig Topper	448790d566	Fix 128-bit ptest intrinsics to take v2i64 instead of v4f32 since these are integer instructions. llvm-svn: 154580	2012-04-12 07:23:00 +00:00
Nadav Rotem	c922b4f2a3	Reapply 154396 after fixing a test. Original message: Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendV uses a register for the selection while Vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154483	2012-04-11 06:40:27 +00:00
Jakob Stoklund Olesen	4f44c26f15	Fix test to be register assignment invariant. llvm-svn: 154453	2012-04-11 00:00:24 +00:00
Duncan Sands	6d360055c5	Add a comment noting that the fdiv -> fmul conversion won't generate multiplication by a denormal, and some tests checking that. llvm-svn: 154431	2012-04-10 20:35:27 +00:00
Eric Christopher	f8886e8f48	Temporarily revert this patch to see if it brings the buildbots back. llvm-svn: 154425	2012-04-10 19:33:16 +00:00
Nadav Rotem	74f87a6bd8	Modify the code that lowers shuffles to blends from using blendvXX to vblendXX. blendv uses a register for the selection while vblend uses an immediate. On sandybridge they still have the same latency and execute on the same execution ports. llvm-svn: 154396	2012-04-10 14:33:13 +00:00
Lang Hames	800642b224	Test case for PR12495. llvm-svn: 154359	2012-04-09 23:58:59 +00:00
Rafael Espindola	6b7bf4d0aa	Pattern match a setcc of boolean value with 0 as a truncate. llvm-svn: 154322	2012-04-09 16:06:03 +00:00
Nadav Rotem	9f7f17826e	Lower some x86 shuffle sequences to the vblend family of instructions. llvm-svn: 154313	2012-04-09 08:33:21 +00:00
Nadav Rotem	4499fb1d50	Fix a bug in the lowering of broadcasts: ConstantPools need to use the target pointer type. Move NormalizeVectorShuffle and LowerVectorBroadcast into X86TargetLowering. llvm-svn: 154310	2012-04-09 07:45:58 +00:00
Chandler Carruth	bb1db0e66a	Cleanup and relax a restriction on the matching of global offsets into x86 addressing modes. This allows PIE-based TLS offsets to fit directly into an addressing mode immediate offset, which is the last remaining code quality issue from PR12380. With this patch, that PR is completely fixed. To understand why this patch is correct to match these offsets into addressing mode immediates, break it down by cases: 1) 32-bit is trivially correct, and unmodified here. 2) 64-bit non-small mode is unchanged and never matches. 3) 64-bit small PIC code which is RIP-relative is handled specially in the match to try to fit RIP into the base register. If it fails, it now early exits. This behavior is unchanged by the patch. 4) 64-bit small non-PIC code which is not RIP-relative continues to work as it did before. The reason these immediates are safe is because the ABI ensures they fit in small mode. This behavior is unchanged. 5) 64-bit small PIC code which is not using RIP-relative addressing. This is the only case changed by the patch, and the primary place you see it is in TLS, either the win64 section offset TLS or Linux local-exec TLS model in a PIC compilation. Here the ABI again ensures that the immediates fit because we are in small mode, and any other operations required due to the PIC relocation model have been handled externally to the Wrapper node (extra loads etc are made around the wrapper node in ISelLowering). I've tested this as much as I can comparing it with GCC's output, and everything appears safe. I discussed this with Anton and it made sense to him at least at face value. That said, if there are issues with PIC code after this patch, yell and we can revert it. llvm-svn: 154304	2012-04-09 02:13:06 +00:00
Chandler Carruth	c29528d66b	Fold 15 tiny test cases into a single file that implements the comprehensive testing of TLS codegen for x86. Convert all of the ones that were still using grep to use FileCheck. Remove some redundancies between them. Perhaps most interestingly expand the test cases so that they actually fully list the instruction snippet being tested. TLS operations are very narrowly defined, and so these seem reasonably stable. More importantly, the existing test cases already were crazy fine grained, expecting specific registers to be allocated. This just clarifies that no other instructions are expected, and fills in some crucial gaps that weren't being tested at all. This will make any subsequent changes to TLS much more clear during review. llvm-svn: 154303	2012-04-09 01:43:17 +00:00
Duncan Sands	28b9aa998e	Only have codegen turn fdiv by a constant into fmul by the reciprocal when -ffast-math, i.e. don't just always do it if the reciprocal can be formed exactly. There is already an IR level transform that does that, and it does it more carefully. llvm-svn: 154296	2012-04-08 18:08:12 +00:00
Chandler Carruth	11c412fd2c	Teach LLVM about a PIE option which, when enabled on top of PIC, makes optimizations which are valid for position independent code being linked into a single executable, but not for such code being linked into a shared library. I discussed the design of this with Eric Christopher, and the decision was to support an optional bit rather than a completely separate relocation model. Fundamentally, this is still PIC relocation, its just that certain optimizations are only valid under a PIC relocation model when the resulting code won't be in a shared library. The simplest path to here is to expose a single bit option in the TargetOptions. If folks have different/better designs, I'm all ears. =] I've included the first optimization based upon this: changing TLS models to the *Exec models when PIE is enabled. This is the LLVM component of PR12380 and is all of the hard work. llvm-svn: 154294	2012-04-08 17:51:45 +00:00
Nadav Rotem	8957364ae5	AVX2: Build splat vectors by broadcasting a scalar from the constant pool. Previously we used three instructions to broadcast an immediate value into a vector register. On Sandybridge we continue to load the broadcasted value from the constant pool. llvm-svn: 154284	2012-04-08 12:54:54 +00:00
Nadav Rotem	37734277f0	1. Remove the part of r153848 which optimizes shuffle-of-shuffle into a new shuffle node because it could introduce new shuffle nodes that were not supported efficiently by the target. 2. Add a more restrictive shuffle-of-shuffle optimization for cases where the second shuffle reverses the transformation of the first shuffle. llvm-svn: 154266	2012-04-07 21:19:08 +00:00
Duncan Sands	cd52f3d447	Convert floating point division by a constant into multiplication by the reciprocal if converting to the reciprocal is exact. Do it even if inexact if -ffast-math. This substantially speeds up ac.f90 from the polyhedron benchmarks. llvm-svn: 154265	2012-04-07 20:04:00 +00:00
Alexis Hunt	03ad83efbd	Make the test for r154235 more platform-independent with a shorter string. llvm-svn: 154243	2012-04-07 01:33:14 +00:00
Alexis Hunt	5c14769849	Output UTF-8-encoded characters as identifier characters into assembly by default. This is a behaviour configurable in the MCAsmInfo. I've decided to turn it on by default in (possibly optimistic) hopes that most assemblers are reasonably sane. If this proves a problem, switching to default seems reasonable. I'm not sure if this is the opportune place to test, but it seemed good to make sure it was tested somewhere. llvm-svn: 154235	2012-04-07 00:37:53 +00:00
Craig Topper	40ac46c3d7	Test case for PR12413 llvm-svn: 154172	2012-04-06 14:38:25 +00:00
Craig Topper	ffae2f8986	Allow 256-bit shuffles to be split if a 128-bit lane contains elements from a single source. This is a rewrite of the 256-bit shuffle splitting code based on similar code from legalize types. Fixes PR12413. llvm-svn: 154166	2012-04-06 07:45:23 +00:00
Jakob Stoklund Olesen	28edb011c4	Don't break the IV update in TLI::SimplifySetCC(). LSR always tries to make the ICmp in the loop latch use the incremented induction variable. This allows the induction variable to be kept in a single register. When the induction variable limit is equal to the stride, SimplifySetCC() would break LSR's hard work by transforming: (icmp (add iv, stride), stride) --> (cmp iv, 0) This forced us to use lea for the IC update, preventing the simpler incl+cmp. <rdar://problem/7643606> <rdar://problem/11184260> llvm-svn: 154119	2012-04-05 20:30:20 +00:00
Nadav Rotem	d72bf636aa	Add an additional testcase which checks ops with multiple users. llvm-svn: 153939	2012-04-03 07:39:36 +00:00
Jakob Stoklund Olesen	97f47c37b6	Allocate virtual registers in ascending order. This is just the fallback tie-breaker ordering, the main allocation order is still descending size. Patch by Shamil Kurmangaleev! llvm-svn: 153904	2012-04-02 22:30:39 +00:00
Nadav Rotem	a9ec0e024f	Optimizing swizzles of complex shuffles may generate additional complex shuffles. Do not try to optimize swizzles of shuffles if the source shuffle has more than a single user, except when the source shuffle is also a swizzle. llvm-svn: 153864	2012-04-02 07:11:12 +00:00
Nadav Rotem	2729f54295	This commit contains a few changes that had to go in together. 1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B)) (and also scalar_to_vector). 2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src). Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B)) 3. Optimize swizzles of shuffles: shuff(shuff(x, y), undef) -> shuff(x, y). 4. Fix an X86ISelLowering optimization which was very bitcast-sensitive. Code which was previously compiled to this: movd (%rsi), %xmm0 movdqa .LCPI0_0(%rip), %xmm2 pshufb %xmm2, %xmm0 movd (%rdi), %xmm1 pshufb %xmm2, %xmm1 pxor %xmm0, %xmm1 pshufb .LCPI0_1(%rip), %xmm1 movd %xmm1, (%rdi) ret Now compiles to this: movl (%rsi), %eax xorl %eax, (%rdi) ret llvm-svn: 153848	2012-04-01 19:31:22 +00:00
Rafael Espindola	194491d9b0	Add a triple to the test. llvm-svn: 153818	2012-03-31 18:59:07 +00:00
Rafael Espindola	2da83dbcd4	Teach CodeGen's version of computeMaskedBits to understand the range metadata. This is the CodeGen equivalent of r153747. I tested that there is not noticeable performance difference with any combination of -O0/-O2 /-g when compiling gcc as a single compilation unit. llvm-svn: 153817	2012-03-31 18:14:00 +00:00
Bill Wendling	9746fa5b94	Testcase for r153710. llvm-svn: 153711	2012-03-30 00:26:54 +00:00
Lang Hames	20f2192d37	The shuffle scheduler is only available in asserts build - make misched-new.ll testcase require asserts. llvm-svn: 153687	2012-03-29 21:11:47 +00:00
Lang Hames	94d892c492	Make x86 REP_MOV* and REP_STO instructions use the correct operand sizes in 64-bit mode. llvm-svn: 153680	2012-03-29 19:54:28 +00:00
Joel Jones	486c38b0cf	For X86, change load/dec-or-inc/store into dec-or-inc, respectively. This is a code change to add support for changing instruction sequences of the form: load inc/dec of 8/16/32/64 bits store into the appropriate X86 inc/dec through memory instruction: inc[qlwb] / dec[qlwb] The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded. llvm-svn: 153635	2012-03-29 05:45:48 +00:00
Joel Jones	32f97db4b2	Reverted to revision 153616 to unblock build llvm-svn: 153623	2012-03-29 01:20:56 +00:00
Joel Jones	b4477ee31f	For X86, change load/dec-or-inc/store into dec-or-inc, respectively. This is a code change to add support for changing instruction sequences of the form: load inc/dec of 8/16/32/64 bits store into the appropriate X86 inc/dec through memory instruction: inc[qlwb] / dec[qlwb] The checks that were in X86DAGToDAGISel::Select(SDNode *Node)>>ISD::STORE have been extracted to isLoadIncOrDecStore and reworked to use the better named wrappers for getOperand(unsigned) (e.g. getOffset()) and replaced Chain.getNode() with LoadNode. The comments have also been expanded. llvm-svn: 153617	2012-03-29 00:37:47 +00:00
Eric Christopher	324ff39943	Add a test for the previous commit. Also, remove two tests that were testing a) the wrong behavior or b) something that I'm already testing in the new test. llvm-svn: 153525	2012-03-27 18:35:57 +00:00
Evan Cheng	2ea4c182ae	Post-ra LICM should take care not to hoist an instruction that would clobber a register that's read by the preheader terminator. rdar://11095580 llvm-svn: 153492	2012-03-27 01:50:58 +00:00
Eli Bendersky	3ef88c1833	Continue cleanup of LIT, getting rid of the remaining artifacts from dejagnu * Removed test/lib/llvm.exp - it is no longer needed * Deleted the dg.exp reading code from test/lit.cfg. There are no dg.exp files left in the test suite so this code is no longer required. test/lit.cfg is now much shorter and clearer * Removed a lot of duplicate code in lit.local.cfg files that need access to the root configuration, by adding a "root" attribute to the TestingConfig object. This attribute is dynamically computed to provide the same information as was previously provided by the custom getRoot functions. * Documented the config.root attribute in docs/CommandGuide/lit.pod llvm-svn: 153408	2012-03-25 09:02:19 +00:00
Andrew Trick	121e3e153c	Remove -enable-lsr-nested in time for 3.1. Tests cases have been removed but attached to open PR12330. llvm-svn: 153286	2012-03-22 22:42:45 +00:00
Andrew Trick	f298457c8c	misched: tag a few XFAILs that I plan to fix llvm-svn: 153222	2012-03-21 22:31:31 +00:00
Chad Rosier	17f25ea47b	[avx] Add patterns for combining vextractf128 + vmovaps/vmovups/vmobdqu to vextractf128 with 128-bit mem dest. Combines vextractf128 $0, %ymm0, %xmm0 vmovaps %xmm0, (%rdi) to vextractf128 $0, %ymm0, (%rdi) rdar://11082570 llvm-svn: 153139	2012-03-20 21:43:40 +00:00
Chad Rosier	ffd2dbd676	[avx] Move the vextractf128 patterns closer to the vextractf128 def. Remove whitespace from test case. No functional change intended. llvm-svn: 153103	2012-03-20 18:24:55 +00:00
Chad Rosier	c667f10cbb	Fix test. llvm-svn: 153095	2012-03-20 17:20:46 +00:00
Chad Rosier	143f33dc92	[avx] Adjust the VINSERTF128rm pattern to allow for unaligned loads. This results in things such as vmovups 16(%rdi), %xmm0 vinsertf128 $1, %xmm0, %ymm0, %ymm0 to be combined to vinsertf128 $1, 16(%rdi), %ymm0, %ymm0 rdar://11076953 llvm-svn: 153092	2012-03-20 17:08:51 +00:00
Bill Wendling	338ddac8f7	It's possible to have a constant expression who's size is quite big (e.g., i128). In that case, we may not be able to print out the MCExpr as an expression. For instance, we could have an MCExpr like this: 0xBEEF0000BEEF0000 \| (0xBEEF0000BEEF0000 << 64) The MCExpr printer handles sizes up to 64-bits, but this expression would require 128-bits. In this situation, try to evaluate the constant expression and emit that as the value into 64-bit chunks. <rdar://problem/11070338> llvm-svn: 153081	2012-03-20 08:56:43 +00:00
Preston Gurd	d1ae391210	This patch adds X86 instruction itineraries for non-pseudo opcodes in X86InstrCompiler.td. It also adds –mcpu-generic to the legalize-shift-64.ll test so the test will pass if run on an Intel Atom CPU, which would otherwise produce an instruction schedule which differs from that which the test expects. llvm-svn: 153033	2012-03-19 14:10:12 +00:00
Nadav Rotem	8cf9105f96	When optimizing certain BUILD_VECTOR nodes into other BUILD_VECTOR nodes, add the new node into the work list because there is a potential for further optimizations. llvm-svn: 152784	2012-03-15 08:49:06 +00:00
Chad Rosier	bd3e55d39c	[avx] Add patterns for VINSERTF128rm. This results in things such as vmovaps -96(%rbx), %xmm1 vinsertf128 $1, %xmm1, %ymm0, %ymm0 to be combined to vinsertf128 $1, -96(%rbx), %ymm0, %ymm0 rdar://10643481 llvm-svn: 152762	2012-03-15 00:45:30 +00:00
Chad Rosier	a10cf5e1b9	Fix a regression from r147481. Original commit message from r147481: DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. Fix: Unaligned loads need to generate a vmovups. rdar://10974078 llvm-svn: 152366	2012-03-09 02:00:48 +00:00
Jakob Stoklund Olesen	9fed2852ff	Remove a test case that no longer makes sense. This was testing the handling of sub-register coalescing followed by remat. The original problem was caused by the extra <imp-def> operands added by sub-register coalescing. Those <imp-def> operands are not added any longer, and the test case passes even when the original patch is reverted. llvm-svn: 152040	2012-03-05 19:10:13 +00:00
Bill Wendling	88f55b45b2	Do trivial CSE of dead BBs during codegen preparation. Some BBs can become dead after codegen preparation. If we delete them here, it could help enable tail-call optimizations later on. <rdar://problem/10256573> llvm-svn: 152002	2012-03-04 10:46:01 +00:00
Chad Rosier	c6fad847e9	Prevent obscure and incorrect tail-call optimization. In this instance we are generating the tail-call during legalizeDAG. The 2nd floor call can't be a tail call because it clobbers %xmm1, which is defined by the first floor call. The first floor call can't be a tail-call because it's not in the tail position. The only reasonable way I could think to fix this in a target-independent manner was to check for glue logic on the copy reg. rdar://10930395 llvm-svn: 151877	2012-03-02 02:50:46 +00:00
Preston Gurd	29cb4871db	Trivial change to make the test use Use –mcpu=generic, so that the test will not fail when run on an Intel Atom processor, due to the Atom scheduler producing an instruction sequence that is different from that which is normally expected. llvm-svn: 151832	2012-03-01 19:57:20 +00:00
Chad Rosier	3bdd700004	Revert r151816 as Jim has the appropriate fix. llvm-svn: 151818	2012-03-01 17:41:19 +00:00
Chad Rosier	89b9ecae75	Fix testcases from r151807. llvm-svn: 151816	2012-03-01 17:31:30 +00:00
Jim Grosbach	e41072bbb4	Add missing triple for tests. Make darwin bots happier. llvm-svn: 151813	2012-03-01 17:30:32 +00:00
James Molloy	1038b57cac	Fix a codegen fault in which log2 or exp2 could be dead-code eliminated even though they could have sideeffects. Only allow log2/exp2 to be converted to an intrinsic if they are declared "readnone". llvm-svn: 151807	2012-03-01 14:32:18 +00:00
Lang Hames	6cd018d0bc	Don't redundantly copy implicit operands when rematerializing. While we're at it - don't copy vreg implicit operands while rematerializing. This fixes PR12138. llvm-svn: 151779	2012-03-01 00:41:17 +00:00
Benjamin Kramer	1b5aa9f5cd	LegalizeIntegerTypes: Reorder operations in the "big shift by small amount" optimization, making the lives of later passes easier. llvm-svn: 151722	2012-02-29 13:27:00 +00:00
Benjamin Kramer	daa291f4fd	LegalizeIntegerTypes: Reenable the large shift with small amount optimization. To avoid problems with zero shifts when getting the bits that move between words we use a trick: first shift the by amount-1, then do another shift by one. When amount is 0 (and size 32) we first shift by 31, then by one, instead of by 32. Also fix a latent bug that emitted the low and high words in the wrong order when shifting right. Fixes PR12113. llvm-svn: 151637	2012-02-28 17:58:00 +00:00
Nadav Rotem	75b36e6716	Fix a bug in the code that builds SDNodes from vector GEPs. When the GEP index is a vector of pointers, the code that calculated the size of the element started from the vector type, and not the contained pointer type. As a result, instead of looking at the data element pointed by the vector, this code used the size of the vector. This works for 32bit members (on 32bit systems), but not for other types. Added code to peel the vector type and added a test. llvm-svn: 151626	2012-02-28 11:54:05 +00:00
Preston Gurd	81931b8e3b	test commit. llvm-svn: 151588	2012-02-27 23:31:51 +00:00
NAKAMURA Takumi	17b6271b41	Target/X86: Fix assertion failures and warnings caused by r151382 _ftol2 lowering for i386-*-win32 targets. Patch by Joe Groff. [Joe Groff] Hi everyone. My previous patch applied as r151382 had a few problems: Clang raised a warning, and X86 LowerOperation would assert out for fptoui f64 to i32 because it improperly lowered to an illegal BUILD_PAIR. Here's a patch that addresses these issues. Let me know if any other changes are necessary. Thanks. llvm-svn: 151432	2012-02-25 03:37:25 +00:00
Michael J. Spencer	d2f0ce2674	Add WIN_FTOL_* psudo-instructions to model the unique calling convention used by the Win32 _ftol2 runtime function. Patch by Joe Groff! llvm-svn: 151382	2012-02-24 19:01:22 +00:00
NAKAMURA Takumi	d8b4183963	test/CodeGen/X86/2012-02-23-mmx-inlineasm.ll: Fixup to add -march=x86. -mcpu does not choose arch automatically, on non-x86 hosts. llvm-svn: 151362	2012-02-24 13:29:50 +00:00
Pete Cooper	135769381b	Turn avx insert intrinsic calls into INSERT_SUBVECTOR DAG nodes and remove duplicate patterns for selecting the intrinsics llvm-svn: 151342	2012-02-24 03:51:49 +00:00
Bill Wendling	1a35321235	Allow an integer to be converted into an MMX type when it's used in an inline asm. <rdar://problem/10106006> llvm-svn: 151303	2012-02-23 23:25:25 +00:00
Anton Korobeynikov	fb863cd279	Fix to make sure that a comdat group gets generated correctly for a static member of instantiated C++ templates. Patch by Kristof Beyls! llvm-svn: 151250	2012-02-23 10:36:04 +00:00
Daniel Dunbar	cac06bf0c6	MC: Fix the MCNullStreamer which was broken in r147763. llvm-svn: 151213	2012-02-22 23:49:50 +00:00
Michael J. Spencer	24f6d49962	Properly emit _fltused with FastISel. Refactor to share code with SDAG. Patch by Joe Groff! llvm-svn: 151183	2012-02-22 19:06:13 +00:00
Aaron Ballman	a76a5b7265	Adding support for Microsoft's thiscall calling convention. LLVM side of the patch. llvm-svn: 151123	2012-02-22 03:04:40 +00:00
NAKAMURA Takumi	0fac05f8e2	test/CodeGen/X86/2012-02-20-MachineCPBug.ll: Fix on generic(non-x86) hosts to add -mattr=+sse. llvm-svn: 151053	2012-02-21 11:56:42 +00:00
Evan Cheng	3bffc22fc2	Fix machine-cp by having it to check sub-register indicies. e.g. ecx = mov eax al = mov ch The second copy is not a nop because the sub-indices of ecx,ch is not the same of that of eax/al. Re-enabled machine-cp. PR11940 llvm-svn: 151002	2012-02-20 23:28:17 +00:00
Eric Christopher	c2e76f573d	Testcase for the previous commit. llvm-svn: 150852	2012-02-18 00:05:45 +00:00
David Chisnall	86b0f069d6	It turns out that putting an 8-byte symbol in a 4-byte section makes Solaris ld sulk. GNU ld is perfectly happy with it, which is worrying for a whole other set of reasons... Thanks to Anton, Duncan and Rafael for helping me track this down. Pointy hat to Rafael for introducing the bug in the first place. llvm-svn: 150811	2012-02-17 16:05:50 +00:00
Bill Wendling	c137296347	Use –mcpu=generic, so that the test will not fail when run on an Intel Atom processor, due to the Atom scheduler producing an instruction sequence that is different from that which is expected. Patch by Michael Spencer! llvm-svn: 150736	2012-02-16 22:42:48 +00:00
Benjamin Kramer	814de25917	Disable machine copy propagation for now. It's known to be buggy (PR11940) and introduces subtle miscompiles in many places. llvm-svn: 150703	2012-02-16 17:29:50 +00:00
Eli Bendersky	4afdeeb682	Replace all instances of dg.exp file with lit.local.cfg, since all tests are run with LIT now and now Dejagnu. dg.exp is no longer needed. Patch reviewed by Daniel Dunbar. It will be followed by additional cleanup patches. llvm-svn: 150664	2012-02-16 06:28:33 +00:00
Bill Wendling	0ea63367ec	Add a test for generating Objective-C metadata from module flags. llvm-svn: 150635	2012-02-15 23:43:37 +00:00
Pete Cooper	21409dd760	Stop custom lowering forr x86 DEC64m from happening if the load in the lowered sequence has more than 1 user llvm-svn: 150537	2012-02-15 00:33:37 +00:00
Nadav Rotem	5da800572a	Fix PR12000. Some vector operations may use scalar operands with types that are greater than the vector element type. For example BUILD_VECTOR of type <1 x i1> with a constant i8 operand. This patch fixes the assertion. llvm-svn: 150477	2012-02-14 13:06:32 +00:00
Nadav Rotem	2141a8413e	Fix a bug in DAGCombine for the optimization of BUILD_VECTOR. We cant generate a shuffle node from two vectors of different types. llvm-svn: 150383	2012-02-13 12:42:26 +00:00
Craig Topper	1487726cdf	Revert accidental commit of a pruned testcase from r150360. llvm-svn: 150361	2012-02-13 04:33:33 +00:00
Craig Topper	250c8fb194	Update CanXFormVExtractWithShuffleIntoLoad to ensure bitcasts of loads only have one use. Matches DAGCombiner and prevents vector_shuffles from reaching isel. llvm-svn: 150360	2012-02-13 04:30:38 +00:00
Pete Cooper	b1229a8866	Fixed bug when custom lowering DEC64m on x86. If the DEC node had more than one user, it was doing this lowering but leaving the original DEC node around and so decrementing twice. Fixes PR11964. llvm-svn: 150356	2012-02-13 00:10:03 +00:00
Nadav Rotem	ea4aecb3e5	This patch addresses the problem of poor code generation for the zext v8i8 -> v8i32 on AVX machines. The codegen often scalarizes ANY_EXTEND nodes. The DAGCombiner has two optimizations that can mitigate the problem. First, if all of the operands of a BUILD_VECTOR node are extracted from an ZEXT/ANYEXT nodes, then it is possible to create a new simplified BUILD_VECTOR which uses UNDEFS/ZERO values to eliminate the scalar ZEXT/ANYEXT nodes. Second, another dag combine optimization lowers BUILD_VECTOR into a shuffle vector instruction. In the case of zext v8i8->v8i32 on AVX, a value in an XMM register is to be shuffled into a wide YMM register. This patch modifes the second optimization and allows the creation of shuffle vectors even when the newly generated vector and the original vector from which we extract the values are of different types. llvm-svn: 150340	2012-02-12 15:05:31 +00:00
Anton Korobeynikov	5996573d4b	Add support for implicit TLS model used with MS VC runtime. Patch by Kai Nacke! llvm-svn: 150307	2012-02-11 17:26:53 +00:00
Andrew Trick	c3cc8fa604	RegAlloc superpass: includes phi elimination, coalescing, and scheduling. Creates a configurable regalloc pipeline. Ensure specific llc options do what they say and nothing more: -reglloc=... has no effect other than selecting the allocator pass itself. This patch introduces a new umbrella flag, "-optimize-regalloc", to enable/disable the optimizing regalloc "superpass". This allows for example testing coalscing and scheduling under -O0 or vice-versa. When a CodeGen pass requires the MachineFunction to have a particular property, we need to explicitly define that property so it can be directly queried rather than naming a specific Pass. For example, to check for SSA, use MRI->isSSA, not addRequired<PHIElimination>. CodeGen transformation passes are never "required" as an analysis ProcessImplicitDefs does not require LiveVariables. We have a plan to massively simplify some of the early passes within the regalloc superpass. llvm-svn: 150226	2012-02-10 04:10:36 +00:00
NAKAMURA Takumi	81f7ad5b9b	test/CodeGen/X86/atom-lea-sp.ll: Add explicit -mtriple=i686-linux. llvm-svn: 150151	2012-02-09 05:12:58 +00:00
Evan Cheng	1be96ff50e	Commit Andy Zhang's test for the lea patch. llvm-svn: 150107	2012-02-08 22:33:17 +00:00
Elena Demikhovsky	87a6e08d3a	Fixed a bug in printing "cmp" pseudo ops. > This IR code > %res = call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a0, <8 x float> %a1, i8 14) > fails with assertion: > > llc: X86ATTInstPrinter.cpp:62: void llvm::X86ATTInstPrinter::printSSECC(const llvm::MCInst, unsigned int, llvm::raw_ostream&): Assertion `0 && "Invalid ssecc argument!"' failed. > 0 llc 0x0000000001355803 > 1 llc 0x0000000001355dc9 > 2 libpthread.so.0 0x00007f79a30575d0 > 3 libc.so.6 0x00007f79a23a1945 gsignal + 53 > 4 libc.so.6 0x00007f79a23a2f21 abort + 385 > 5 libc.so.6 0x00007f79a239a810 __assert_fail + 240 > 6 llc 0x00000000011858d5 llvm::X86ATTInstPrinter::printSSECC(llvm::MCInst const, unsigned int, llvm::raw_ostream&) + 119 I added the full testing for all possible pseudo-ops of cmp. I extended X86AsmPrinter.cpp and X86IntelInstPrinter.cpp. You'l also see lines alignments (unrelated to this fix) in X86IselLowering.cpp from my previous check-in. llvm-svn: 150068	2012-02-08 08:37:26 +00:00
Craig Topper	a8a69356e1	Add instruction selection for 256-bit VPSHUFD and 128-bit VPERMILPS/VPERMILPD. llvm-svn: 149968	2012-02-07 06:28:42 +00:00
Benjamin Kramer	8e54f21216	Testing vector code without sse doesn't make much sense. Should bring arm and ppc testers back to life (they default to -mcpu=generic) llvm-svn: 149821	2012-02-05 11:19:39 +00:00
Chris Lattner	4881c9ecb0	Add a test for the miscompilation my recent ConstantDataArray patches introduced, to make sure we don't regress on it in the future. llvm-svn: 149803	2012-02-05 02:37:36 +00:00
Craig Topper	c289726019	Remove most of the intrinsics for XOP VPCMOV instruction. They all aliased to the same instruction with different types. This would be better accomplished with casts in the not yet created xopintrin.h header file. llvm-svn: 149795	2012-02-05 00:55:56 +00:00
Nadav Rotem	5c5681cf27	The type-legalizer often scalarizes code. One of the common patterns is extract-and-truncate. In this patch we optimize this pattern and convert the sequence into extract op of a narrow type. This allows the BUILD_VECTOR dag optimizations to construct efficient shuffle operations in many cases. llvm-svn: 149692	2012-02-03 13:18:25 +00:00
Matt Beaumont-Gay	8b5dfe05f5	Unix line endings llvm-svn: 149615	2012-02-02 19:00:49 +00:00
Elena Demikhovsky	7ca11b6e3f	Optimization for SIGN_EXTEND operation on AVX. Special handling was added for v4i32 -> v4i64 and v8i16 -> v8i32 extensions. llvm-svn: 149600	2012-02-02 09:10:43 +00:00
Lang Hames	004f627ed6	Set EFLAGS correctly in EmitLoweredSelect on X86. llvm-svn: 149597	2012-02-02 07:48:37 +00:00
Andrew Trick	d09b64fc25	Instruction scheduling itinerary for Intel Atom. Adds an instruction itinerary to all x86 instructions, giving each a default latency of 1, using the InstrItinClass IIC_DEFAULT. Sets specific latencies for Atom for the instructions in files X86InstrCMovSetCC.td, X86InstrArithmetic.td, X86InstrControl.td, and X86InstrShiftRotate.td. The Atom latencies for the remainder of the x86 instructions will be set in subsequent patches. Adds a test to verify that the scheduler is working. Also changes the scheduling preference to "Hybrid" for i386 Atom, while leaving x86_64 as ILP. Patch by Preston Gurd! llvm-svn: 149558	2012-02-01 23:20:51 +00:00
Mon P Wang	7313ffe333	Avoid creating an extract element to an illegal type after LegalizeTypes has run. llvm-svn: 149548	2012-02-01 22:15:20 +00:00
NAKAMURA Takumi	0bb21fdfce	test/CodeGen/X86/avx-minmax.ll: Relax expressions for Win32 targets. YMM arguments are passed as indirect on Win32 x64. llvm-svn: 149505	2012-02-01 14:35:29 +00:00
Elena Demikhovsky	455db87d41	Passing AVX 256-bit structures in Win64 was wrong. Fixed Win64 calling conventions. llvm-svn: 149494	2012-02-01 10:46:14 +00:00
Elena Demikhovsky	da37eb48d8	Optimization for "truncate" operation on AVX. Truncating v4i64 -> v4i32 and v8i32 -> v8i16 may be done with set of shuffles. llvm-svn: 149485	2012-02-01 07:56:44 +00:00
Craig Topper	2b764de6ab	Remove pcmpgt/pcmpeq intrinsics as clang is not using them. llvm-svn: 149367	2012-01-31 06:52:44 +00:00
Bill Wendling	7761976036	Remove all references to the old EH. There was always the current EH. -- Ministry of Truth llvm-svn: 149335	2012-01-31 02:09:07 +00:00
Chandler Carruth	865317627e	Chris's constant data sequence refactoring actually enabled printing vectors of all one bits to be printed more cleverly in the AsmPrinter. Unfortunately, the byte value for all one bits is the same with -fsigned-char as the error return of '-1'. Force this to be the unsigned byte value when returning it to avoid this problem, and update the test case for the shiny new behavior. Yay for building LLVM and Clang with -funsigned-char. Chris, please review, and let me know if there is any reason to not desire this change. It seems good on the surface, and certainly intended based on the code written. llvm-svn: 149299	2012-01-30 23:47:44 +00:00
Craig Topper	9a8c6c1633	Fix pattern for memory form of PSHUFD for use with FP vectors to remove bitcast to an integer vector that normal code wouldn't have. Also remove bitcasts from code that turns splat vector loads into a shuffle as it was making the broken pattern necessary. llvm-svn: 149232	2012-01-30 07:50:31 +00:00
Rafael Espindola	7bddde2b49	Add r149110 back with a fix for when the vector and the int have the same width. llvm-svn: 149151	2012-01-27 23:33:07 +00:00
Rafael Espindola	7800e62486	Revert r149110 and add a testcase that was crashing since that revision. Unfortunately I also had to disable constant-pool-sharing.ll the code it tests has been updated to use the IL logic. llvm-svn: 149148	2012-01-27 22:42:48 +00:00
Matt Beaumont-Gay	f1e0eb546a	Unix line endings llvm-svn: 149115	2012-01-27 02:31:29 +00:00
Jakob Stoklund Olesen	c63a45ebe6	Handle call-clobbered ymm registers on Win64. The Win64 calling convention has xmm6-15 as callee-saved while still clobbering all ymm registers. Add a YMM_HI_6_15 pseudo-register that aliases the clobbered part of the ymm registers, and mark that as call-clobbered. This allows live xmm registers across calls. This hack wouldn't be necessary with RegisterMask operands representing the call clobbers, but they are not quite operational yet. llvm-svn: 149088	2012-01-26 22:59:28 +00:00
Victor Umansky	bf35274368	Fix for the following bug in AVX codegen for double-to-int conversions: . "fptosi" and "fptoui" IR instructions are defined with round-to-zero rounding mode. . Currently for AVX mode for <4xdouble> and <8xdouble> the "VCVTPD2DQ.128" and "VCVTPD2DQ.256" instructions are selected (for .fp_to_sint. DAG node operation ) by AVX codegen. However they use round-to-nearest-even rounding mode. . Consequently, the conversion produces incorrect numbers. The fix is to replace selection of VCVTPD2DQ instructions with VCVTTPD2DQ instructions. The latter use truncate (i.e. round-to-zero) rounding mode. As .fp_to_sint. DAG node operation is used only for lowering of "fptosi" and "fptoui" IR instructions, the fix in X86InstrSSE.td definition file doesn.t have an impact on other LLVM flows. The patch includes changes in the .td file, LIT test for the changes and a fix in a legacy LIT test (which produced asm code conflicting with LLVN IR spec). llvm-svn: 149056	2012-01-26 08:51:39 +00:00
Anton Korobeynikov	682b2821ce	Properly emit ctors / dtors with priorities into desired sections and let linker handle the rest. This finally fixes PR5329 llvm-svn: 148990	2012-01-25 22:24:19 +00:00
Elena Demikhovsky	ee8c87b433	ZERO_EXTEND operation is optimized for AVX. v8i16 -> v8i32, v4i32 -> v4i64 - used vpunpck* instructions. llvm-svn: 148803	2012-01-24 13:54:13 +00:00
Craig Topper	e2d3f3060d	Add support for selecting 256-bit PALIGNR. llvm-svn: 148532	2012-01-20 05:53:00 +00:00
Eli Friedman	d13443515e	Remove a low-quality test which was failing on Windows; test/CodeGen/X86/sret.ll is a better test for the relevant behavior. llvm-svn: 148526	2012-01-20 02:06:40 +00:00
Eli Friedman	ffa3b8b5f1	Support MSVC x86-32 sret convention. PR11688. Patch by Joe Groff. llvm-svn: 148513	2012-01-20 00:05:46 +00:00
Nick Lewycky	9888a01041	Space after punctuation. llvm-svn: 148451	2012-01-19 01:13:47 +00:00
Nick Lewycky	c1e7e2eaf6	Add a TargetOption for disabling tail calls. llvm-svn: 148442	2012-01-19 00:34:10 +00:00
Nadav Rotem	afc446fbcb	Fix a bug in the type-legalization of vector integers. When we bitcast one vector type to another, we must not bitcast the result if one type is widened while the other is promoted. llvm-svn: 148383	2012-01-18 08:33:18 +00:00
Nadav Rotem	c23e698b5c	Transform: (EXTRACT_VECTOR_ELT( VECTOR_SHUFFLE )) -> EXTRACT_VECTOR_ELT. llvm-svn: 148337	2012-01-17 21:44:01 +00:00
Nadav Rotem	c460c870b2	Fix 11769. In CanXFormVExtractWithShuffleIntoLoad we assumed that EXTRACT_VECTOR_ELT can be later handled by the DAGCombiner. However, in some cases on AVX, the EXTRACT_VECTOR_ELT is legalized to EXTRACT_SUBVECTOR + EXTRACT_VECTOR_ELT, which currently is not handled by the DAGCombiner. In this patch I added a check that we only extract from the XMM part. llvm-svn: 148298	2012-01-17 09:13:19 +00:00
Eli Friedman	a343d87eac	Make sure the non-SSE lowering for fences correctly clobbers EFLAGS. PR11768. llvm-svn: 148240	2012-01-16 16:42:21 +00:00
Nadav Rotem	b36c029e3b	[AVX] Optimize x86 VSELECT instructions using SimplifyDemandedBits. We know that the blend instructions only use the MSB, so if the mask is sign-extended then we can convert it into a SHL instruction. This is a common pattern because the type-legalizer sign-extends the i1 type which is used by the LLVM-IR for the condition. Added a new optimization in SimplifyDemandedBits for SIGN_EXTEND_INREG -> SHL. llvm-svn: 148225	2012-01-15 19:27:55 +00:00
Chandler Carruth	7ee4fa971b	Relax the FileCheck assertion a bit -- all we really care about is that we're loading from the global array, not how it is spelled in the asm. This should fix the MSVC bots. llvm-svn: 148214	2012-01-15 09:38:59 +00:00
Chandler Carruth	e57b0f2a8b	FileCheck-ize a test, make it more specific to directly test the shift removal desired. llvm-svn: 148213	2012-01-15 09:32:57 +00:00
Chad Rosier	b4f2ccc1e0	Cleanup test case by adding checks for test names. llvm-svn: 148166	2012-01-14 01:46:51 +00:00
Rafael Espindola	551f422d34	Add a test showing how the Leh_func_endN symbol is used. llvm-svn: 148161	2012-01-14 00:12:59 +00:00
Craig Topper	71ea42cc29	Add patterns for v16i16 and v32i8 immAllZerosV to select VPXOR to match v4i64 and v8i32. llvm-svn: 148106	2012-01-13 06:59:47 +00:00
Elena Demikhovsky	beb66de0f9	Fixed a bug in LowerVECTOR_SHUFFLE caused assertion failure lc: X86ISelLowering.cpp:6480: llvm::SDValue llvm::X86TargetLowering::LowerVECTOR_SHUFFLE(llvm::SDValue, llvm::SelectionDAG&) const: Assertion `V1.getOpcode() != ISD::UNDEF&& "Op 1 of shuffle should not be undef"' failed. Added a test. llvm-svn: 148044	2012-01-12 20:33:10 +00:00
Rafael Espindola	d079866e36	Add error-reporting tests for platforms that don't support segmented stacks. Patch by Brian Anderson. llvm-svn: 148042	2012-01-12 20:26:13 +00:00
Rafael Espindola	959adf57db	Support segmented stacks on 64-bit FreeBSD. This patch uses tcb_spare field in the tcb structure to store info. Patch by Jyun-Yan You. llvm-svn: 148041	2012-01-12 20:24:30 +00:00
Rafael Espindola	dda46f4081	Support segmented stacks on win32. Uses the pvArbitrary slot of the TIB, which is reserved for applications. We only support frames with a static size. llvm-svn: 148040	2012-01-12 20:22:08 +00:00
Nadav Rotem	618722de09	Fix a bug in the AVX 256-bit shuffle code in cases where the splat element is on the boundary of two 128-bit vectors. The attached testcase was stuck in an endless loop. llvm-svn: 148027	2012-01-12 15:31:55 +00:00
Benjamin Kramer	ae4ad5f924	X86: Generalize the x << (y & const) optimization to also catch masks with more set bits set than 31 or 63. llvm-svn: 148024	2012-01-12 12:41:34 +00:00
Nadav Rotem	18cdee3bee	On AVX, we can load v8i32 at a time. The bug happens when two uneven loads are used. When we load the v12i32 type, the GenWidenVectorLoads method generates two loads: v8i32 and v4i32 and attempts to use CONCAT_VECTORS to join them. In this fix I concat undef values to widen the smaller value. The test "widen_load-2.ll" also exposes this bug on AVX. llvm-svn: 147964	2012-01-11 20:19:17 +00:00
Bill Wendling	cab765a3d5	Check to make sure that the CFString's back store ends up in the correct section. llvm-svn: 147961	2012-01-11 19:33:37 +00:00
Rafael Espindola	cd3a456eef	Support segmented stacks on mac. This uses TLS slot 90, which actually belongs to JavaScriptCore. We only support frames with static size Patch by Brian Anderson. llvm-svn: 147960	2012-01-11 19:00:37 +00:00
Rafael Espindola	431029d375	Split segmented stacks tests into tests for static- and dynamic-size frames. Patch by Brian Anderson. llvm-svn: 147959	2012-01-11 18:51:03 +00:00
Rafael Espindola	f7e3528d92	Generate the segmented stack prologue for fastcc too. Patch by Brian Anderson. llvm-svn: 147958	2012-01-11 18:41:19 +00:00
Chandler Carruth	e3facd7860	Revert r147945 which disabled an addressing mode transformation. I had hoped this would revive one of the llvm-gcc selfhost build bots, but it didn't so it doesn't appear that my transform is the culprit. If anyone else is seeing failures, please let me know! llvm-svn: 147957	2012-01-11 18:36:12 +00:00
Rafael Espindola	12313ca273	Use unsigned comparison in segmented stack prologue. This is a comparison of two addresses, and GCC does the comparison unsigned. Patch by Brian Anderson. llvm-svn: 147954	2012-01-11 18:23:35 +00:00
Rafael Espindola	dd2f5fe210	Explicitly set the scale to 1 on some segstack prologue instrs. Patch by Brian Anderson. llvm-svn: 147952	2012-01-11 18:14:03 +00:00
Jan Sjödin	cb98434ebe	Add XOP Intrinsics and tests llvm-svn: 147949	2012-01-11 15:20:20 +00:00
Nadav Rotem	01bf090299	Fix a bug in the lowering of BUILD_VECTOR for AVX. SCALAR_TO_VECTOR does not zero untouched elements. Use INSERT_VECTOR_ELT instead. llvm-svn: 147948	2012-01-11 14:07:51 +00:00
Chandler Carruth	e443e5e55d	Disable the transformation I added in r147936 to see if it fixes some strange build bot failures that look like a miscompile into an infloop. I'll investigate this tomorrow, but I'd both like to know whether my patch is the culprit, and get the bots back to green. llvm-svn: 147945	2012-01-11 12:17:47 +00:00
Jakob Stoklund Olesen	9baf09c11a	Fix undefined code and reenable test case. I don't think the compact encoding code is right, but at least is has defined behavior now. llvm-svn: 147938	2012-01-11 09:08:04 +00:00
Chandler Carruth	b3371fa250	Teach the X86 instruction selection to do some heroic transforms to detect a pattern which can be implemented with a small 'shl' embedded in the addressing mode scale. This happens in real code as follows: unsigned x = my_accelerator_table[input >> 11]; Here we have some lookup table that we look into using the high bits of 'input'. Each entity in the table is 4-bytes, which means this implicitly gets turned into (once lowered out of a GEP): (unsigned)((char)my_accelerator_table + ((input >> 11) << 2)); The shift right followed by a shift left is canonicalized to a smaller shift right and masking off the low bits. That hides the shift right which x86 has an addressing mode designed to support. We now detect masks of this form, and produce the longer shift right followed by the proper addressing mode. In addition to saving a (rather large) instruction, this also reduces stalls in Intel chips on benchmarks I've measured. In order for all of this to work, one part of the DAG needs to be canonicalized still further* than it currently is. This involves removing pointless 'trunc' nodes between a zextload and a zext. Without that, we end up generating spurious masks and hiding the pattern. llvm-svn: 147936	2012-01-11 08:41:08 +00:00
NAKAMURA Takumi	71c80e7fe7	llvm/test/CodeGen/X86/zext-fold.ll: Relax an expression in stack offset. llvm-svn: 147928	2012-01-11 07:34:22 +00:00
NAKAMURA Takumi	83fb05f0c7	llvm/test/CodeGen/X86/sub-with-overflow.ll: Add explicit -mtriple=i686-linux. llvm-svn: 147927	2012-01-11 07:34:14 +00:00
Jakob Stoklund Olesen	77f594c37b	Disable test that seems to expose an unrelated Linux issue. llvm-svn: 147921	2012-01-11 03:42:27 +00:00
Jakob Stoklund Olesen	37e4396a06	Detect when a value is undefined on an edge to a landing pad. Consider this code: int h() { int x; try { x = f(); g(); } catch (...) { return x+1; } return x; } The variable x is undefined on the first edge to the landing pad, but it has the f() return value on the second edge to the landing pad. SplitAnalysis::getLastSplitPoint() would assume that the return value from f() was live into the landing pad when f() throws, which is of course impossible. Detect these cases, and treat them as if the landing pad wasn't there. This allows spill code to be inserted after the function call to f(). <rdar://problem/10664933> llvm-svn: 147912	2012-01-11 02:07:05 +00:00
Chad Rosier	830f1909d8	Add test case for r147881. llvm-svn: 147891	2012-01-10 23:09:53 +00:00
Joerg Sonnenberger	fccad68f65	Default stack alignment for 32bit x86 should be 4 Bytes, not 8 Bytes. Add a test that checks the stack alignment of a simple function for Darwin, Linux and NetBSD for 32bit and 64bit mode. llvm-svn: 147888	2012-01-10 22:43:53 +00:00
Nadav Rotem	969b8a6903	Fix a bug in the legalization of shuffle vectors. When we emulate shuffles using BUILD_VECTORS we may be using a BV of different type. Make sure to cast it back. llvm-svn: 147851	2012-01-10 14:28:46 +00:00
Craig Topper	01eba20904	Fix a crash in AVX2 when trying to broadcast a double into a 128-bit vector. There is no vbroadcastsd xmm, but we do need to support 64-bit integers broadcasted into xmm. Also factor the AVX check into the isVectorBroadcast function. This makes more sense since the AVX2 check was already inside. llvm-svn: 147844	2012-01-10 08:23:59 +00:00
Evan Cheng	7855c5d08f	Allow machine-cse to look across MBB boundary when cse'ing instructions that define physical registers. It's currently very restrictive, only catching cases where the CE is in an immediate (and only) predecessor. But it catches a surprising large number of cases. rdar://10660865 llvm-svn: 147827	2012-01-10 02:02:58 +00:00
Chandler Carruth	9a418a2713	Cleanup and FileCheck-ize a test. llvm-svn: 147772	2012-01-09 09:44:26 +00:00
Craig Topper	ee2dabebe3	Clean up patterns for MOVNT*. Not sure why there were floating point types on MOVNTPS and MOVNTDQ. And v4i64 was completely missing. llvm-svn: 147767	2012-01-09 06:52:46 +00:00
Rafael Espindola	7618aa1c64	Don't print an unused label before .cfi_endproc. llvm-svn: 147763	2012-01-09 00:17:29 +00:00
Craig Topper	1fdcf7071d	Don't disable MMX support when AVX is enabled. Fix predicates for MMX instructions that were added along with SSE instructions to check for AVX in addition to SSE level. llvm-svn: 147762	2012-01-09 00:11:29 +00:00
Victor Umansky	5d24f5f51a	Reverted commit #147601 upon Evan's request. llvm-svn: 147748	2012-01-08 17:20:33 +00:00
Rafael Espindola	19a13321f8	Don't print a label before .cfi_startproc when we don't need to. This makes the produce assembly when using CFI just a bit more readable. llvm-svn: 147743	2012-01-07 22:42:19 +00:00
Evan Cheng	8af07ba749	Added a late machine instruction copy propagation pass. This catches opportunities that only present themselves after late optimizations such as tail duplication .e.g. ## BB#1: movl %eax, %ecx movl %ecx, %eax ret The register allocator also leaves some of them around (due to false dep between copies from phi-elimination, etc.) This required some changes in codegen passes. Post-ra scheduler and the pseudo-instruction expansion passes have been moved after branch folding and tail merging. They were before branch folding before because it did not always update block livein's. That's fixed now. The pass change makes independently since we want to properly schedule instructions after branch folding / tail duplication. rdar://10428165 rdar://10640363 llvm-svn: 147716	2012-01-07 03:02:36 +00:00
Eric Christopher	09a41e8939	Make the 'x' constraint work for AVX registers as well. Fixes rdar://10614894 llvm-svn: 147704	2012-01-07 01:02:09 +00:00
Chandler Carruth	90f124f420	Prevent a DAGCombine from firing where there are two uses of a combined-away node and the result of the combine isn't substantially smaller than the input, it's just canonicalized. This is the first part of a significant (7%) performance gain for Snappy's hot decompression loop. llvm-svn: 147604	2012-01-05 11:05:55 +00:00
Chandler Carruth	75ff869d6a	Cleanup and FileCheck-ize a test. llvm-svn: 147603	2012-01-05 11:05:47 +00:00
Victor Umansky	87d5ada510	Peephole optimization of ptest-conditioned branch in X86 arch. Performs instruction combining of sequences generated by ptestz/ptestc intrinsics to ptest+jcc pair for SSE and AVX. Testing: passed 'make check' including LIT tests for all sequences being handled (both SSE and AVX) Reviewers: Evan Cheng, David Blaikie, Bruno Lopes, Elena Demikhovsky, Chad Rosier, Anton Korobeynikov llvm-svn: 147601	2012-01-05 08:46:19 +00:00
Benjamin Kramer	e5589bccdd	FileCheck hygiene. llvm-svn: 147580	2012-01-05 00:43:34 +00:00
NAKAMURA Takumi	6ebbc05c9d	test/CodeGen/X86/jump_sign.ll: Add -mcpu=pentiumpro for non-x86 hosts. It uses "cmov". llvm-svn: 147521	2012-01-04 03:52:23 +00:00
Evan Cheng	caba5d2fc2	For x86, canonicalize max (x > y) ? x : y => (x >= y) ? x : y So for something like (x - y) > 0 : (x - y) ? 0 It will be (x - y) >= 0 : (x - y) ? 0 This makes is possible to test sign-bit and eliminate a comparison against zero. e.g. subl %esi, %edi testl %edi, %edi movl $0, %eax cmovgl %edi, %eax => xorl %eax, %eax subl %esi, $edi cmovsl %eax, %edi rdar://10633221 llvm-svn: 147512	2012-01-04 01:41:39 +00:00
Nadav Rotem	79f1692fe0	Revert 147426 because it caused pr11696. llvm-svn: 147485	2012-01-03 22:19:42 +00:00
Nadav Rotem	cc49c4d74d	Fix incorrect widening of the bitcast sdnode in case the incoming operand is integer-promoted. llvm-svn: 147484	2012-01-03 22:12:28 +00:00
Chad Rosier	afcaa8f38a	Enhance DAGCombine for transforming 128->256 casts into a vmovaps, rather then a vxorps + vinsertf128 pair if the original vector came from a load. rdar://10594409 llvm-svn: 147481	2012-01-03 21:05:52 +00:00
Elena Demikhovsky	40f2c3077f	Fixed a bug in SelectionDAG.cpp. The failure seen on win32, when i64 type is illegal. It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR. The failure message is: llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT \|\| (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed. I added a special test that checks vector shuffle on win32. llvm-svn: 147445	2012-01-03 11:59:04 +00:00
Nadav Rotem	6929a8868b	Optimize the sequence blend(sign_extend(x)) to blend(shl(x)) since SSE blend instructions only look at the highest bit. llvm-svn: 147426	2012-01-02 08:05:46 +00:00
Craig Topper	f7c9bf17dd	Allow CRC32 instructions to be selected when AVX is enabled. llvm-svn: 147411	2012-01-01 19:51:58 +00:00
Craig Topper	d8ae2d9f27	Fix sfence, lfence, mfence, and clflush to be able to be selected when AVX is enabled. Fix monitor and mwait to require SSE3 or AVX, previously they worked even if SSE3 was disabled. Make prefetch instructions not set the execution domain since they don't use XMM registers. llvm-svn: 147409	2012-01-01 19:40:22 +00:00
Rafael Espindola	1cb17796db	Revert 147399. It broke CodeGen/ARM/vext.ll. llvm-svn: 147400	2012-01-01 17:36:23 +00:00
Elena Demikhovsky	9b74049783	Fixed a bug in SelectionDAG.cpp. The failure seen on win32, when i64 type is illegal. It happens on stage of conversion VECTOR_SHUFFLE to BUILD_VECTOR. The failure message is: llc: SelectionDAG.cpp:784: void VerifyNodeCommon(llvm::SDNode*): Assertion `(I->getValueType() == EltVT \|\| (EltVT.isInteger() && I->getValueType().isInteger() && EltVT.bitsLE(I->getValueType()))) && "Wrong operand type!"' failed. I added a special test that checks vector shuffle on win32. llvm-svn: 147399	2012-01-01 16:22:47 +00:00
Craig Topper	0311c45aed	Add patterns for integer forms of SHUFPD/VSHUFPD with a memory load. llvm-svn: 147393	2011-12-31 23:24:49 +00:00
Craig Topper	c01ce759d7	Fix typo in a SHUFPD and VSHUFPD pattern that prevented SHUFPD/VSHUFPD with a load from being selected. llvm-svn: 147392	2011-12-31 23:15:11 +00:00
Craig Topper	33091db89a	Change FMA4 memory forms to use memopv* instead of alignedloadv*. No need to force alignment on these instructions. Add a couple testcases for memory forms. llvm-svn: 147361	2011-12-30 02:18:36 +00:00
Craig Topper	e066262284	Fix load size for FMA4 SS/SD instructions. They need to use f32 and f64 size, but with the special handling to be compatible with the intrinsic expecting a vector. Similar handling is already used elsewhere. llvm-svn: 147360	2011-12-30 01:49:53 +00:00
Eli Friedman	db54b4b68f	Fix type-checking for load transformation which is not legal on floating-point types. PR11674. llvm-svn: 147323	2011-12-28 21:24:44 +00:00
Nadav Rotem	d8c4880903	PR11662. Promotion of the mask operand needs to be done using PromoteTargetBoolean, and not padded with garbage. llvm-svn: 147309	2011-12-28 13:08:20 +00:00
Elena Demikhovsky	9b4613ff14	Fixed a bug in LowerVECTOR_SHUFFLE and LowerBUILD_VECTOR. Matching MOVLP mask for AVX (265-bit vectors) was wrong. The failure was detected by conformance tests. llvm-svn: 147308	2011-12-28 08:14:01 +00:00
Eli Friedman	064187912e	Make sure DAGCombiner doesn't introduce multiple loads from the same memory location. PR10747, part 2. llvm-svn: 147283	2011-12-26 22:49:32 +00:00
Chandler Carruth	7a5c52fadf	Use standard promotion for i8 CTTZ nodes and i8 CTLZ nodes when the LZCNT instructions are available. Force promotion to i32 to get a smaller encoding since the fix-ups necessary are just as complex for either promoted type We can't do standard promotion for CTLZ when lowering through BSR because it results in poor code surrounding the 'xor' at the end of this instruction. Essentially, if we promote the entire CTLZ node to i32, we end up doing the xor on a 32-bit CTLZ implementation, and then subtracting appropriately to get back to an i8 value. Instead, our custom logic just uses the knowledge of the incoming size to compute a perfect xor. I'd love to know of a way to fix this, but so far I'm drawing a blank. I suspect the legalizer could be more clever and/or it could collude with the DAG combiner, but how... ;] llvm-svn: 147251	2011-12-24 12:12:34 +00:00
Chandler Carruth	82b7a7478b	Add systematic testing for cttz as well, and fix the bug I spotted by inspection earlier. llvm-svn: 147250	2011-12-24 11:46:10 +00:00
Chandler Carruth	b52ba33d0a	Add i8 and i64 testing for ctlz on x86. Also simplify the i16 test. llvm-svn: 147249	2011-12-24 11:26:59 +00:00
Chandler Carruth	800a803717	Tidy up this rather crufty test. Put the declarations at the top to make my C-brain happy. Remove the unnecessary bits of pedantic IR fluff like nounwind. Remove stray uses comments. Name things semantically rather than tN so that adding a new test in the middle doesn't cause pain, and so that new tests can be grouped semantically. This exposes how little systematic testing is going on here. I noticed this by finding several bugs via inspection and wondering why this test wasn't catching any of them. =[ llvm-svn: 147248	2011-12-24 11:26:57 +00:00
Chandler Carruth	48f5be6ce0	Expand more when we have a nice 'tzcnt' instruction, to avoid generating 'bsf' instructions here. This one is actually debatable to my eyes. It's not clear that any chip implementing 'tzcnt' would have a slow 'bsf' for any reason, and unless EFLAGS or a zero input matters, 'tzcnt' is just a longer encoding. Still, this restores the old behavior with 'tzcnt' enabled for now. llvm-svn: 147246	2011-12-24 11:11:38 +00:00
Chandler Carruth	1846086903	Tidy up some of these tests. llvm-svn: 147245	2011-12-24 11:11:36 +00:00
Chandler Carruth	9ef50ef1f7	Switch the lowering of CTLZ_ZERO_UNDEF from a .td pattern back to the X86ISelLowering C++ code. Because this is lowered via an xor wrapped around a bsr, we want the dagcombine which runs after isel lowering to have a chance to clean things up. In particular, it is very common to see code which looks like: (sizeof(x)8 - 1) ^ __builtin_clz(x) Which is trying to compute the most significant bit of 'x'. That's actually the value computed directly by the 'bsr' instruction, but if we match it too late, we'll get completely redundant xor instructions. The more naive code for the above (subtracting rather than using an xor) still isn't handled correctly due to the dagcombine getting confused. Also, while here fix an issue spotted by inspection: we should have been expanding the zero-undef variants to the normal variants when there is an 'lzcnt' instruction. Do so, and test for this. We don't want to generate unnecessary 'bsr' instructions. These two changes fix some regressions in encoding and decoding benchmarks. However, there is still a lot* to be improve on in this type of code. llvm-svn: 147244	2011-12-24 10:55:54 +00:00
Chandler Carruth	514920d53b	Cleanup this test a bit, sorting things and grouping them more clearly. llvm-svn: 147243	2011-12-24 10:55:42 +00:00
Elena Demikhovsky	b37883fe87	This is the second fix related to VZEXT_MOVL node. The failure that I see in the current version is: LLVM ERROR: Cannot select: 0x18b8f70: v4i64 = X86ISD::VZEXT_MOVL 0x18beee0 [ID=14] 0x18beee0: v4i64 = insert_subvector 0x18b8c70, 0x18b9170, 0x18b9570 [ID=13] 0x18b8c70: v4i64 = insert_subvector 0x18b9870, 0x18bf4e0, 0x18b9970 [ID=12] 0x18b9870: v4i64 = undef [ID=4] 0x18bf4e0: v2i64 = bitcast 0x18bf3e0 [ID=10] 0x18bf3e0: v4i32 = BUILD_VECTOR 0x18b9770, 0x18b9770, 0x18b9770, 0x18b9770 [ID=8] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9770: i32 = TargetConstant<0> [ID=6] 0x18b9970: i32 = Constant<0> [ID=3] 0x18b9170: v2i64 = undef [ORD=1] [ID=1] 0x18b9570: i32 = Constant<2> [ID=5] llvm-svn: 146975	2011-12-20 13:34:28 +00:00
Chandler Carruth	7564e8371a	Begin teaching the X86 target how to efficiently codegen patterns that use the zero-undefined variants of CTTZ and CTLZ. These are just simple patterns for now, there is more to be done to make real world code using these constructs be optimized and codegen'ed properly on X86. The existing tests are spiffed up to check that we no longer generate unnecessary cmov instructions, and that we generate the very important 'xor' to transform bsr which counts the index of the most significant one bit to the number of leading (most significant) zero bits. Also they now check that when the variant with defined zero result is used, the cmov is still produced. llvm-svn: 146974	2011-12-20 11:19:37 +00:00
Lang Hames	e32ef23ba8	Make sure that the lower bits on the VSELECT condition are properly set. llvm-svn: 146800	2011-12-17 01:08:46 +00:00
Craig Topper	88e2bfef0a	Don't try to match 'unpackl/h v, v' for 32xi8 and 16xi16 when only AVX1 is supported. Fix 'unpackh v, v' for 256-bit types to understand 128-bit lanes. llvm-svn: 146726	2011-12-16 08:06:31 +00:00

... 2 3 4 5 6 ...

3407 Commits