llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00

Author	SHA1	Message	Date
Rafael Espindola	daefe7aa54	Use movq to move 64 bits in and out of mmx registers. Fixes PR4669 llvm-svn: 77940	2009-08-03 02:45:34 +00:00
Chris Lattner	e1cc2c27b3	fix a problem Eli noticed where we would compile the attached ptrtoint to: .quad X even on a 32-bit system, where X is not 64-bits. There isn't much that we can do here, so we just print: .quad ((X) & 4294967295) instead. llvm-svn: 77818	2009-08-01 22:25:12 +00:00
Dan Gohman	512a0d9f42	Add nounwind to this test. llvm-svn: 77792	2009-08-01 19:11:04 +00:00
David Greene	543296ed84	Simplify operand padding by keying off tabs in the asm stream. If padding is disabled, tabs get replaced by spaces except in the case of the first operand, where the tab is output to line up the operands after the mnemonics. Add some better comments and eliminate redundant code. Fix some testcases to not assume tabs. llvm-svn: 77740	2009-07-31 21:57:10 +00:00
Chris Lattner	85a3632c7a	fix PR4650: we only track sizes for certain objects, so only put something into the mergable section if it is one of our special cases. This could obviously be improved, but this is the minimal fix and restores us to the previous behavior. llvm-svn: 77679	2009-07-31 16:17:13 +00:00
Evan Cheng	148032a1a2	Optimize some common usage patterns of atomic built-ins __sync_add_and_fetch() and __sync_sub_and_fetch. When the return value is not used (i.e. only care about the value in the memory), x86 does not have to use add to implement these. Instead, it can use add, sub, inc, dec instructions with the "lock" prefix. This is currently implemented using a bit of instruction selection trick. The issue is the target independent pattern produces one output and a chain and we want to map it into one that just output a chain. The current trick is to select it into a merge_values with the first definition being an implicit_def. The proper solution is to add new ISD opcodes for the no-output variant. DAG combiner can then transform the node before it gets to target node selection. Problem #2 is we are adding a whole bunch of x86 atomic instructions when in fact these instructions are identical to the non-lock versions. We need a way to add target specific information to target nodes and have this information carried over to machine instructions. Asm printer (or JIT) can use this information to add the "lock" prefix. llvm-svn: 77582	2009-07-30 08:33:02 +00:00
Dan Gohman	3c7e8160f6	Add a new register class to describe operands that can't be SP, due to x86 encoding restrictions. This is currently off by default because it may cause code quality regressions. This is for PR4572. llvm-svn: 77565	2009-07-30 01:56:29 +00:00
Chris Lattner	f8a9c2f843	fix PR4584 with a trivial patch now that the pieces are in place. llvm-svn: 77434	2009-07-29 05:20:33 +00:00
Eric Christopher	88c1b51020	Add a couple more tests for the ptest intrinsics to make sure we're grabbing them all correctly. llvm-svn: 77413	2009-07-29 00:51:15 +00:00
Eric Christopher	c7b97d1f03	Add support for gcc __builtin_ia32_ptest{z,c,nzc} intrinsics. Lower to ptest instruction plus setcc. Revamp ptest instruction. Add test. llvm-svn: 77407	2009-07-29 00:28:05 +00:00
Chris Lattner	a8faf6b1b6	fix testcase for previous patch. llvm-svn: 77338	2009-07-28 18:04:18 +00:00
Chris Lattner	986bd2bd0a	Fix PR4639, a ELF-TLS regression from some of my refactoring. llvm-svn: 77336	2009-07-28 17:57:51 +00:00
Chris Lattner	19c9914343	update testcase. llvm-svn: 77192	2009-07-27 15:52:58 +00:00
Chris Lattner	5547fd80ad	put normal data into .data instead of .data.rel on elf systems. llvm-svn: 77116	2009-07-26 03:06:11 +00:00
Chris Lattner	9cd489c7f1	finish simplifying DarwinTargetAsmInfo::SelectSectionForGlobal for now. Make the section switching directives more consistent by not including \n and including \t for them all. llvm-svn: 77107	2009-07-26 01:24:18 +00:00
Chris Lattner	b95150b65a	simplify DarwinTargetAsmInfo::SelectSectionForGlobal a bit and make it more aggressive, we now put: const int G2 __attribute__((weak)) = 42; into the text (readonly) segment like gcc, previously we put it into the data (readwrite) segment. llvm-svn: 77104	2009-07-26 00:51:36 +00:00
Chris Lattner	cf7cc0ed7d	add the most expedient hack to fix PR4619, along with a testcase. Thanks to Rafael for the great example. llvm-svn: 77083	2009-07-25 17:57:37 +00:00
Evan Cheng	12dd5c078f	I've lost my mind. PR4572 has not been fixed. llvm-svn: 77031	2009-07-25 01:11:46 +00:00
Evan Cheng	5faed6335e	Forgot this test earlier. llvm-svn: 77007	2009-07-24 22:42:45 +00:00
Eric Christopher	24a620ec3d	Move insertps tests to sse41 combo test file, convert to filecheck format and add an extract/insert test. llvm-svn: 76994	2009-07-24 19:24:26 +00:00
Chris Lattner	7fd20e69f1	merge one more sse41 test into sse41.ll llvm-svn: 76853	2009-07-23 04:49:39 +00:00
Chris Lattner	efe5b9aaf8	merge another sse41 test into sse41.ll llvm-svn: 76852	2009-07-23 04:43:48 +00:00
Chris Lattner	c59ca858c8	merge sse41-pmovx.ll into sse41.ll llvm-svn: 76850	2009-07-23 04:39:09 +00:00
Chris Lattner	313c95ea84	change a test to run in filecheck style. Rename it to be a general dumping ground of various SSE4.1 tests, since filecheck can reasonably handle them all in one file. Generalize it to check x86-64 stuff as well since it has a different ABI (a convenient way to test both the reg and mem forms of these instructions). llvm-svn: 76848	2009-07-23 04:33:02 +00:00
Eric Christopher	c9299d5756	Support insertps via the intrinsic and add a couple of simple testcases to make sure it's being generated. llvm-svn: 76843	2009-07-23 02:22:41 +00:00
Eric Christopher	36eeebc1c4	Add test for pinsrd and pinsrb instructions. llvm-svn: 76840	2009-07-23 01:58:04 +00:00
Dan Gohman	840e7f252e	x86 isel tweak: use lea (%reg,%reg) instead of lea (,%reg,2). llvm-svn: 76817	2009-07-22 23:26:55 +00:00
Dan Gohman	8b7a539e80	Make the grep line in this test more specific, to avoid unintended matches. llvm-svn: 76802	2009-07-22 22:02:42 +00:00
Evan Cheng	e0f52b2bf3	Remove a big test case. llvm-svn: 76669	2009-07-21 22:52:04 +00:00
Evan Cheng	c142755cec	Another rewriter bug exposed by recent coalescer changes. ReuseInfo::GetRegForReload() should make sure the "switched" register is in the desired register class. I'm surprised this hasn't caused more failures in the past. llvm-svn: 76558	2009-07-21 09:15:00 +00:00
Evan Cheng	0cf19155ef	Fix a dagga combiner bug: avoid creating illegal constant. Is this really a winning transformation? fold (shl (srl x, c1), c2) -> (shl (and x, (shl -1, c1)), (sub c2, c1)) or (srl (and x, (shl -1, c1)), (sub c1, c2)) llvm-svn: 76535	2009-07-21 05:40:15 +00:00
Evan Cheng	443ae1d494	Cross RC coalescing is now on by default. llvm-svn: 76519	2009-07-21 00:22:59 +00:00
Evan Cheng	0048e876c3	Fix some sub-reg coalescing bugs where the coalescer wasn't updating the resulting interval's register class. llvm-svn: 76458	2009-07-20 19:47:55 +00:00
Dan Gohman	00b05492f1	Revert the addition of hasNoPointerOverflow to GEPOperator. Getelementptrs that are defined to wrap are virtually useless to optimization, and getelementptrs that are undefined on any kind of overflow are too restrictive -- it's difficult to ensure that all intermediate addresses are within bounds. I'm going to take a different approach. Remove a few optimizations that depended on this flag. llvm-svn: 76437	2009-07-20 17:43:30 +00:00
Chris Lattner	eac40b1473	implement a new magic global "llvm.compiler.used" which is like llvm.used, but doesn't cause ".no_dead_strip" to be emitted on darwin. llvm-svn: 76399	2009-07-20 06:14:25 +00:00
Jakob Stoklund Olesen	96f9a917c8	Fix http://llvm.org/bugs/show_bug.cgi?id=4583 Inline asm instructions may have additional <imp-def,kill> register operands. These operands are not marked with a flag like the normal asm operands, so we must not assert that there is a flag. llvm-svn: 76373	2009-07-19 19:09:59 +00:00
Evan Cheng	a19ee1bb42	Catch more coalescing opportunities. llvm-svn: 76282	2009-07-18 04:52:23 +00:00
Evan Cheng	84f06f0ee6	Enable cross register class coalescing. llvm-svn: 76281	2009-07-18 02:10:10 +00:00
Evan Cheng	92fc255bb4	Fix pr4552. Stack slot coloring with register must take care not to generate illegal ams. llvm-svn: 76258	2009-07-17 22:42:51 +00:00
Evan Cheng	67ccedff04	Fix x86 inline ams 'q' constraint support. In 32-bit mode, it's just like 'Q', i.e. EAX, EDX, ECX, EBX. In 64-bit mode, it just means all the i64r registers. Yeah, that makes sense. llvm-svn: 76248	2009-07-17 22:13:25 +00:00
Chris Lattner	fb1ff2b993	rename test. llvm-svn: 76197	2009-07-17 18:05:55 +00:00
Dale Johannesen	e08bda67c2	Assume an inline asm might be a call, so we get stack alignment right when it is. This is not ideal but conservatively correct. Adjust a test to compensate for changed stack offset value. gcc.apple/asm-block-57.c llvm-svn: 76120	2009-07-16 22:34:45 +00:00
Evan Cheng	981276bb16	Changed my mind. We now allow remat of instructions whose defs have subreg indices. llvm-svn: 76100	2009-07-16 20:15:00 +00:00
Evan Cheng	39e5f6205a	With recent MC changes, RIP base register is explicitly modeled. Make sure we add it when x86 V_SET0 / V_SETALLONES (by transforming it into a constpool load) into the use instruction. llvm-svn: 76094	2009-07-16 18:44:05 +00:00
Evan Cheng	7a6b20df7f	Let callers decide the sub-register index on the def operand of rematerialized instructions. Avoid remat'ing instructions whose def have sub-register indices for now. It's just really really hard to get all the cases right. llvm-svn: 75900	2009-07-16 09:20:10 +00:00
Evan Cheng	83b99bb014	ShortenDeadCopySrcLiveRange needs to be more conservative in multi-kill situations. llvm-svn: 75838	2009-07-15 21:39:50 +00:00
Chris Lattner	9f8b22c8ff	convert to filecheck style, simplify RUN line, and add comment. llvm-svn: 75667	2009-07-14 19:49:11 +00:00
Chris Lattner	6ec578688d	Reapply my previous asmprinter changes now with more testing and two additional bug fixes: 1. The bug that everyone hit was a problem in the asmprinter where it would remove $stub but keep the L prefix on a name when emitting the indirect symbol. This is easy to fix by keeping the name of the stub and the name of the symbol in a StringMap instead of just keeping a StringSet and trying to reconstruct it late. 2. There was a problem printing the personality function. The current logic to print out the personality function from the DWARF information is a bit of a cesspool right now that duplicates a bunch of other logic in the asm printer. The short version of it is that it depends on emitting both the L and _ prefix for symbols (at least on darwin) and until I can untangle it, it is best to switch the mangler back to emitting both prefixes. llvm-svn: 75646	2009-07-14 18:17:16 +00:00
Daniel Dunbar	3d0c94fbbc	Revert r75610 (and r75620, which was blocking the revert), in the hopes of unbreaking llvm-gcc (on Darwin). --- Reverse-merging r75620 into '.': U include/llvm/Support/Mangler.h --- Reverse-merging r75610 into '.': U test/CodeGen/X86/loop-hoist.ll G include/llvm/Support/Mangler.h U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp U lib/VMCore/Mangler.cpp llvm-svn: 75636	2009-07-14 15:57:55 +00:00
Chris Lattner	b0e0d16efb	Change the X86 asmprinter to use the mangler to apply suffixes like "$non_lazy_ptr" to symbols instead of doing it with "printSuffixedName". This gets us to the point where there is a real separation between computing a symbol name and printing it, something I need for MC printer stuff. This patch also fixes a corner case bug where unnamed private globals wouldn't get the private label prefix. Next up, rename all uses of getValueName -> getMangledName for better greppability, and then tackle the ppc/arm backends to eliminate "printSuffixedName". llvm-svn: 75610	2009-07-14 06:04:35 +00:00
Chris Lattner	2a6bb9bc51	Change the internal interface to makeNameProper to take a bool that indicates whether the label is private or not, instead of taking prefix stuff. One effect of this is that symbols will be generated with just the private prefix, instead of both the private prefix and the user-label-prefix, but this doesn't matter as long as it is consistent. For example we'll now get "Lfoo" instead of "L_foo". These are just assembler temporary labels anyway, so they never even make it into the .o file. llvm-svn: 75607	2009-07-14 04:50:12 +00:00
Bill Wendling	0751c92127	Check for the correct unnamed name. llvm-svn: 75573	2009-07-14 00:53:58 +00:00
Chris Lattner	1ab26071d4	Two changes: 1) unique globals with the existing "Count" local in Mangler, not with atomic nonsense. Using atomics will give us nondeterminstic output from the compiler when using multiple threads, which is bad. 2) Do not mangle an unknown global name with a type suffix. We don't need this anymore now that llvm ir doesn't have type planes. llvm-svn: 75541	2009-07-13 22:48:46 +00:00
Chris Lattner	35e5cff316	add nounwind llvm-svn: 75407	2009-07-12 00:46:16 +00:00
Nick Lewycky	e5c1852b69	Darwin prepends an _ to internal globals, Linux doesn't. llvm-svn: 75405	2009-07-11 23:48:59 +00:00
Chris Lattner	a54c70286c	fix x86-64 static codegen to materialize the address of a global with movl instead of lea. It is better for code size (and presumably efficiency) to use: movl $foo, %eax rather than: leal foo, eax Both give a nice zero extending "move immediate" instruction, the former is just smaller. Note that global addresses should be handled different by the x86 backend, but I chose to follow the style already in place and add more fixme's. llvm-svn: 75403	2009-07-11 23:17:29 +00:00
Chris Lattner	7015283356	this test was incorrect for x86-64 static. It passed on darwin, because darwin doesn't have static x86-64 mode. llvm-svn: 75392	2009-07-11 22:30:05 +00:00
Chris Lattner	496f872969	Fix PR4533, which is about buggy codegen in x86-64 -static mode. Basically, using: lea symbol(%rip), %rax is not valid in -static mode, because the current RIP may not be within 32-bits of "symbol" when an app is built partially pic and partially static. The fix for this is to compile it to: lea symbol, %rax It would be better to codegen this as: movq $symbol, %rax but that will come next. The hard part of fixing this bug was fixing abi-isel, which was actively testing for the wrong behavior. Also, the RUN lines are completely impossible to understand what they are testing. To help with this, convert the -static x86-64 codegen tests to use filecheck. This is much more stable and makes it more clear what the codegen is expected to be. llvm-svn: 75382	2009-07-11 20:29:19 +00:00
Chris Lattner	6bfeeb798a	We get the P modifier wrong in a lot of cases, just add some more rigorous testing. In addition to fixing this, I still need to do some more testing on darwin. llvm-svn: 75362	2009-07-11 08:30:22 +00:00
Eli Friedman	08991c716a	Make EXTRACT_VECTOR_ELT a bit more flexible in terms of the returned value. Adjust other code to deal with that correctly. Make DAGTypeLegalizer::PromoteIntRes_EXTRACT_VECTOR_ELT take advantage of this new flexibility to simplify the code and make it deal with unusual vectors (like <4 x i1>) correctly. Fixes PR3037. llvm-svn: 75176	2009-07-09 22:01:03 +00:00
Evan Cheng	4f87295872	Targets sometimes assign fixed stack object to spill certain callee-saved registers based on dynamic conditions. For example, X86 EBP/RBP, when used as frame register has to be spilled in the first fixed object. It should inform PEI this so it doesn't get allocated another stack object. Also, it should not be spilled as other callee-saved registers but rather its spilling and restoring are being handled by emitPrologue and emitEpilogue. Avoid spilling it twice. llvm-svn: 75116	2009-07-09 06:53:48 +00:00
Chris Lattner	ee12a671e5	remove eh, convert to FileCheck style llvm-svn: 75087	2009-07-09 01:07:22 +00:00
Chris Lattner	9aa10daa2d	we have no tests for dllimport/export. Add one. llvm-svn: 75085	2009-07-09 00:53:44 +00:00
Chris Lattner	f321f1f93f	* add some assertions for sanity checking. * remove some old code that was needed when we'd put ESP in the scale instead of the base of some instructions. * Fix a bug with the P modifier in inline asm that caused us to drop it. llvm-svn: 75077	2009-07-09 00:27:29 +00:00
Chris Lattner	8747eae8e9	add a test for dale's recent change. llvm-svn: 75074	2009-07-09 00:00:16 +00:00
Chris Lattner	0122ff1be6	switch test to FileCheck-style and test the P and non-P cases. llvm-svn: 75071	2009-07-08 23:44:06 +00:00
Chris Lattner	ddf2433b7b	rename a test to make it a feature test. llvm-svn: 75070	2009-07-08 23:40:57 +00:00
Chris Lattner	1eeea0d262	add some more check for vector compares. llvm-svn: 75024	2009-07-08 18:51:25 +00:00
Chris Lattner	182817004d	convert a test to "FileCheck" style. llvm-svn: 75023	2009-07-08 18:48:24 +00:00
Chris Lattner	c153b998d9	eliminate the v[if]cmp versions of these tests, now that [if]cmp+sext works. llvm-svn: 74980	2009-07-08 00:49:35 +00:00
Chris Lattner	ea7bd9b484	dag combine sext(setcc) -> vsetcc before legalize. To make this safe, VSETCC must define all bits, which is different than it was documented to before. Since all targets that implement VSETCC already have this behavior, and we don't optimize based on this, just change the documentation. We now get nice code for vec_compare.ll llvm-svn: 74978	2009-07-08 00:31:33 +00:00
Chris Lattner	e2435c4f6f	add support for legalizing an icmp where the result is illegal (4xi1) but the input is legal (4 x i32) llvm-svn: 74964	2009-07-07 23:03:54 +00:00
Chris Lattner	1de1b3155b	add a trivial test that vector compares work. llvm-svn: 74963	2009-07-07 22:51:09 +00:00
Chris Lattner	a754f344b2	implement support for spliting and scalarizing vector setcc's. This finishes off enough support for vector compares to get the icmp/fcmp version of 2008-07-23-VSetCC.ll passing. llvm-svn: 74961	2009-07-07 22:47:46 +00:00
Chris Lattner	573a3eeda2	verify that the fcmp version of this works just as well as the vfcmp version. We actually get better code for this silly testcase. llvm-svn: 74954	2009-07-07 22:07:47 +00:00
Evan Cheng	29ce3bfbb8	Avoid adding a duplicate def. This fixes PR4478. llvm-svn: 74857	2009-07-06 21:34:05 +00:00
Chris Lattner	4cddab0f14	@GOTPCREL is also rip-relative. Fix fast-isel to do the right thing. This fixes an llvm-gcc bootstrap problem I introduced. llvm-svn: 74691	2009-07-02 04:22:01 +00:00
Chris Lattner	e703feb0ac	Fix yet-another bug I introduced into fastisel, this time handling constant pool references that weren't getting properly rip-relative. llvm-svn: 74689	2009-07-02 03:14:25 +00:00
Chris Lattner	2bbdc61f92	Fix some fast-isel problems selecting global variable addressing in pic mode. llvm-svn: 74582	2009-07-01 03:27:19 +00:00
Rafael Espindola	340632e814	Fix PR4485. Avoid unnecessary duplication of operand 0 of X86::FpSET_ST0_80. This duplication would cause one register to remain on the stack at the function return. llvm-svn: 74534	2009-06-30 16:40:03 +00:00
Rafael Espindola	33b0aa0274	Fix PR4484. This was caused by me confounding FP0 and ST(0). llvm-svn: 74523	2009-06-30 12:18:16 +00:00
Rafael Espindola	a0fdda93be	FIX PR 4459. Not sure I understand how the temp register gets used, but this fixes a bug and introduces no regressions. llvm-svn: 74446	2009-06-29 20:29:59 +00:00
Chris Lattner	e711b85035	factor some logic out into a helper function, allow remat of loads from constant globals. This implements remat-constant.ll even without aggressive-remat. llvm-svn: 74373	2009-06-27 04:38:55 +00:00
Chris Lattner	19eb0dad26	Reimplement rip-relative addressing in the X86-64 backend. The new implementation primarily differs from the former in that the asmprinter doesn't make a zillion decisions about whether or not something will be RIP relative or not. Instead, those decisions are made by isel lowering and propagated through to the asm printer. To achieve this, we: 1. Represent RIP relative addresses by setting the base of the X86 addr mode to X86::RIP. 2. When ISel Lowering decides that it is safe to use RIP, it lowers to X86ISD::WrapperRIP. When it is unsafe to use RIP, it lowers to X86ISD::Wrapper as before. 3. This removes isRIPRel from X86ISelAddressMode, representing it with a basereg of RIP instead. 4. The addressing mode matching logic in isel is greatly simplified. 5. The asmprinter is greatly simplified, notably the "NotRIPRel" predicate passed through various printoperand routines is gone now. 6. The various symbol printing routines in asmprinter now no longer infer when to emit (%rip), they just print the symbol. I think this is a big improvement over the previous situation. It does have two small caveats though: 1. I implemented a horrible "no-rip" modifier for the inline asm "P" constraint modifier. This is a short term hack, there is a much better, but more involved, solution. 2. I had to xfail an -aggressive-remat testcase because it isn't handling the use of RIP in the constant-pool reading instruction. This specific test is easy to fix without -aggressive-remat, which I intend to do next. llvm-svn: 74372	2009-06-27 04:16:01 +00:00
Chris Lattner	aef726f8b9	remove some unneeded eh info. llvm-svn: 74371	2009-06-27 04:07:31 +00:00
Chris Lattner	3e94ce2426	testcase for PR4466 llvm-svn: 74367	2009-06-27 01:33:35 +00:00
Dan Gohman	49b2ecafe7	Add some testcases for some of the recent ScalarEvolution bug fixes. llvm-svn: 74353	2009-06-26 22:54:11 +00:00
Chris Lattner	4384816259	remove unwind info, add test for asmprinting of jump table labels with (%rip) llvm-svn: 74337	2009-06-26 22:16:49 +00:00
Evan Cheng	016ed65455	Add x86 support for 'n' inline asm modifier. This will be handled target independently as part of MC work. llvm-svn: 74336	2009-06-26 22:00:19 +00:00
Chris Lattner	f035685176	down with unwind info :) llvm-svn: 74206	2009-06-25 21:48:17 +00:00
Chris Lattner	6e06dc1168	unwind info not needed. llvm-svn: 74112	2009-06-24 19:48:04 +00:00
Evan Cheng	7292cadf06	Fix support for inline asm input / output operand tying when operand spans across multiple registers (e.g. two i64 operands in 32-bit mode). llvm-svn: 74053	2009-06-24 02:05:51 +00:00
Dan Gohman	4f4bda36df	Extend ScalarEvolution's multiple-exit support to compute exact trip counts in more cases. Generalize ScalarEvolution's isLoopGuardedByCond code to recognize And and Or conditions, splitting the code out into an isNecessaryCond helper function so that it can evaluate Ands and Ors recursively, and make SCEVExpander be much more aggressive about hoisting instructions out of loops. test/CodeGen/X86/pr3495.ll has an additional instruction now, but it appears to be due to an arbitrary register allocation difference. llvm-svn: 74048	2009-06-24 01:18:18 +00:00
Rafael Espindola	373c6bdbc5	Fix PR4185. Handle FpSET_ST0_80 being used when ST0 is still alive. llvm-svn: 73850	2009-06-21 12:02:51 +00:00
Chris Lattner	580eecebbd	change TLS_ADDR lowering to lower to a real mem operand, instead of matching as a global with that gets printed with the :mem modifier. All operands to lea's should be handled with the lea32mem operand kind, and this allows the TLS stuff to do this. There are several better ways to do this, but I went for the minimal change since I can't really test this (beyond make check). This also makes the use of EBX explicit in the operand list in the 32-bit, instead of implicit in the instruction. llvm-svn: 73834	2009-06-20 20:38:48 +00:00
Chris Lattner	965cc0e45b	no need for unwind info llvm-svn: 73832	2009-06-20 19:48:26 +00:00
Chris Lattner	33d1976328	no need for unwind info here. llvm-svn: 73831	2009-06-20 19:43:09 +00:00
Dan Gohman	651faa1905	Re-apply r73718, now that the fix in r73787 is in, and add a hand-crafted testcase which demonstrates the bug that was exposed in 254.gap. llvm-svn: 73793	2009-06-19 23:23:27 +00:00
Evan Cheng	b90241ac42	Revert 73718. It's breaking 254.gap. llvm-svn: 73783	2009-06-19 21:15:06 +00:00
Eli Friedman	5cccb60bad	Fix for PR2484: add an SSE1 pattern for a shuffle we normally prefer to handle with an SSE2 instruction. llvm-svn: 73760	2009-06-19 07:00:55 +00:00
Evan Cheng	6c1c55f942	On Darwin, ams printer should output a second label before a jump table so the linker knows it's a new atom. But this is only needed if the jump table is put in a separate section from the function body. llvm-svn: 73720	2009-06-18 20:37:15 +00:00
Dan Gohman	da82dc2ec1	Generalize LSR's OptimizeSMax to handle unsigned max tests as well as signed max tests. Along with r73717, this helps CodeGen avoid emitting code for a maximum operation for this class of loop. llvm-svn: 73718	2009-06-18 20:23:18 +00:00
Dan Gohman	fd857b0406	Remove the code from IVUsers that attempted to handle casted induction variables in cases where the cast isn't foldable. It ended up being a pessimization in many cases. This could be fixed, but it would require a bunch of complicated code in IVUsers' clients. The advantages of this approach aren't visible enough to justify it at this time. llvm-svn: 73706	2009-06-18 16:54:06 +00:00
Eli Friedman	6a984089f4	Add some generic expansion logic for SMULO and UMULO. Fixes UMULO support for x86, and UMULO/SMULO for many architectures, including PPC (PR4201), ARM, and Cell. The resulting expansion isn't perfect, but it's not bad. llvm-svn: 73477	2009-06-16 06:58:29 +00:00
Dan Gohman	255bcad466	Update this test to use fmul instead of mul. llvm-svn: 73436	2009-06-15 22:49:34 +00:00
Bill Wendling	a0a5984345	This test is failing. Revert for now. llvm-svn: 73404	2009-06-15 19:10:56 +00:00
Bill Wendling	1ea00229de	Add another testcase for r71478. llvm-svn: 73399	2009-06-15 18:36:34 +00:00
Arnold Schwaighofer	6b340f9247	CheckTailCallReturnConstraints is missing a check on the incomming chain of the RETURN node. The incomming chain must be the outgoing chain of the CALL node. This causes the backend to identify tail calls that are not tail calls. This patch fixes this. llvm-svn: 73387	2009-06-15 14:43:36 +00:00
Arnold Schwaighofer	780e3addf8	Fix Bug 4278: X86-64 with -tailcallopt calling convention out of sync with regular cc. The only difference between the tail call cc and the normal cc was that one parameter register - R9 - was reserved for calling functions through a function pointer. After time the tail call cc has gotten out of sync with the regular cc. We can use R11 which is also caller saved but not used as parameter register for potential function pointers and remove the special tail call cc on x86-64. llvm-svn: 73233	2009-06-12 16:26:57 +00:00
Eli Friedman	62028b7323	Fix the run-line for this test to work correctly outside of x86. llvm-svn: 73025	2009-06-07 09:44:19 +00:00
Eli Friedman	2964aa5a38	Tweak the expansion code for BIT_CONVERT to generate better code converting from an MMX vector to an i64. llvm-svn: 73024	2009-06-07 09:41:57 +00:00
Eli Friedman	d4b463b0dc	Slightly generalize the code that handles shuffles of consecutive loads on x86 to handle more cases. Fix a bug in said code that would cause it to read past the end of an object. Rewrite the code in SelectionDAGLegalize::ExpandBUILD_VECTOR to be a bit more general. Remove PerformBuildVectorCombine, which is no longer necessary with these changes. In addition to simplifying the code, with this change, we can now catch a few more cases of consecutive loads. llvm-svn: 73012	2009-06-07 06:52:44 +00:00
Eli Friedman	2dadbd05f9	Fix the expansion for CONCAT_VECTORS so that it doesn't create illegal types. llvm-svn: 72993	2009-06-06 07:08:26 +00:00
Eli Friedman	4395222136	Avoid crashing on a variable-index insertelement with element type i16. llvm-svn: 72991	2009-06-06 06:32:50 +00:00
Eli Friedman	e546f94ef5	Get rid of some bogus patterns for X86vzmovl. Don't create VZEXT_MOVL nodes for vectors with an i16 element type. Add an optimization for building a vector which is all zeros/undef except for the bottom element, where the bottom element is an i8 or i16. llvm-svn: 72988	2009-06-06 06:05:10 +00:00
Eli Friedman	539325c8e7	Fix an obvious typo. llvm-svn: 72987	2009-06-06 05:55:37 +00:00
Eli Friedman	1227d199be	Get rid of a bogus pattern that interferes with optimization. llvm-svn: 72985	2009-06-06 04:17:04 +00:00
Eli Friedman	05eef883e8	PR2598: make sure to expand illegal forms of integer/floating-point conversions for x86, like <2 x i32> -> <2 x float> and <4 x i16> -> <4 x float>. llvm-svn: 72983	2009-06-06 03:57:58 +00:00
Nate Begeman	058d4eeccf	Adapt the x86 build_vector dagcombine to the current state of the legalizer. build vectors with i64 elements will only appear on 32b x86 before legalize. Since vector widening occurs during legalize, and produces i64 build_vector elements, the dag combiner is never run on these before legalize splits them into 32b elements. Teach the build_vector dag combine in x86 back end to recognize consecutive loads producing the low part of the vector. Convert the two uses of TLI's consecutive load recognizer to pass LoadSDNodes since that was required implicitly. Add a testcase for the transform. Old: subl $28, %esp movl 32(%esp), %eax movl 4(%eax), %ecx movl %ecx, 4(%esp) movl (%eax), %eax movl %eax, (%esp) movaps (%esp), %xmm0 pmovzxwd %xmm0, %xmm0 movl 36(%esp), %eax movaps %xmm0, (%eax) addl $28, %esp ret New: movl 4(%esp), %eax pmovzxwd (%eax), %xmm0 movl 8(%esp), %eax movaps %xmm0, (%eax) ret llvm-svn: 72957	2009-06-05 21:37:30 +00:00
Dan Gohman	5f6f8101d5	Split the Add, Sub, and Mul instruction opcodes into separate integer and floating-point opcodes, introducing FAdd, FSub, and FMul. For now, the AsmParser, BitcodeReader, and IRBuilder all preserve backwards compatability, and the Core LLVM APIs preserve backwards compatibility for IR producers. Most front-ends won't need to change immediately. This implements the first step of the plan outlined here: http://nondot.org/sabre/LLVMNotes/IntegerOverflow.txt llvm-svn: 72897	2009-06-04 22:49:04 +00:00
Devang Patel	9757e4f9f3	Add new function attribute - noredzone. Update code generator to use this attribute and remove DisableRedZone target option. Update llc to set this attribute when -disable-red-zone command line option is used. llvm-svn: 72894	2009-06-04 22:05:33 +00:00
Evan Cheng	dada49d18a	RALinScan::attemptTrivialCoalescing() was returning a virtual register instead of the physical register it is allocated to. This resulted in virtual register(s) being added the live-in sets. llvm-svn: 72890	2009-06-04 20:53:36 +00:00
Dan Gohman	05fe1217c7	Check in test changes that I accidentally left out of r72872. llvm-svn: 72875	2009-06-04 18:22:31 +00:00
Eli Friedman	11070e275f	PR3739, part 2: Use an explicit store to spill XMM registers. (Previously, the code tried to use "push", which doesn't exist for XMM registers.) llvm-svn: 72836	2009-06-04 02:32:04 +00:00
Eli Friedman	fd27229206	PR3739, part 1: Disable the red zone on Win64. llvm-svn: 72830	2009-06-04 02:02:01 +00:00
Evan Cheng	b71402d6ae	For Darwin / x86_64, override -relocation-model=static to pic if the output is assembly since Darwin assembler does not really support -static codeine. I view this as a temporary workaround until the assembler / linker changes. llvm-svn: 72806	2009-06-03 21:13:54 +00:00
Evan Cheng	4e47a019ba	Fix for PR4225: When rewriter reuse a value in a physical register , it clear the register kill operand marker and its kill ops information. However, the cleared operand may be a def of a super-register. Clear the kill ops info for the super-register's sub-registers as well. llvm-svn: 72758	2009-06-03 09:00:27 +00:00
Dan Gohman	609f627ed7	Revert r72734. The Darwin assembler doesn't support the static relocation model on x86-64. Higher level logic should override the relocation model to PIC on x86_64-apple-darwin. llvm-svn: 72746	2009-06-03 00:37:20 +00:00
Dan Gohman	f6e6588203	Fix CodeGenPrepare's address-mode sinking to handle unusual addresses, involving Base values which do not have Pointer type. This fixes PR4297. llvm-svn: 72739	2009-06-02 21:29:13 +00:00
Evan Cheng	7e66d61bec	On Darwin x86_64 small code model doesn't guarantee code address fits in 32-bit. llvm-svn: 72734	2009-06-02 20:09:31 +00:00
Evan Cheng	2d198e1bc2	(i64 (zext (srl GR32 8))) -> movzbl AH is not safe since srl 8 only clear the top 8 bits. llvm-svn: 72618	2009-05-30 08:43:27 +00:00
Evan Cheng	57f85a1529	Remove an accidental commit. llvm-svn: 72560	2009-05-29 05:28:52 +00:00
Evan Cheng	550fc9ba9f	More h-registers tricks: folding zext nodes. llvm-svn: 72558	2009-05-29 01:44:43 +00:00
Evan Cheng	a36a15ff66	Do not try to create a MVT type of width 0. llvm-svn: 72557	2009-05-28 23:52:18 +00:00
Eli Friedman	8b0b7c2d6d	Add a testcase which got fixed by recent legalization work. llvm-svn: 72517	2009-05-28 05:10:20 +00:00
Evan Cheng	40810c4d1b	Added optimization that narrow load / op / store and the 'op' is a bit twiddling instruction and its second operand is an immediate. If bits that are touched by 'op' can be done with a narrower instruction, reduce the width of the load and store as well. This happens a lot with bitfield manipulation code. e.g. orl $65536, 8(%rax) => orb $1, 10(%rax) Since narrowing is not always a win, e.g. i32 -> i16 is a loss on x86, dag combiner consults with the target before performing the optimization. llvm-svn: 72507	2009-05-28 00:35:15 +00:00
Torok Edwin	99b1003c2e	Fix PR4254. The DAGCombiner created a negative shiftamount, stored in an unsigned variable. Later the optimizer eliminated the shift entirely as being undefined. Example: (srl (shl X, 56) 48). ShiftAmt is 4294967288. Fix it by checking that the shiftamount is positive, and storing in a signed variable. llvm-svn: 72331	2009-05-23 17:29:48 +00:00
Torok Edwin	beb86bd0b4	available_externall linkage is not local, this was confusing the codegenerator, and it wasn't generating calls through @PLT for these functions. hasLocalLinkage() is now false for available_externally, I attempted to fix the inliner and dce to handle available_externally properly. It passed make check. llvm-svn: 72328	2009-05-23 14:06:57 +00:00
Eli Friedman	262a99ffed	Fix test to account for legalization changes; I think this ends up running an extra DAGCombine pass which improves the code a bit. llvm-svn: 72326	2009-05-23 13:15:11 +00:00
Evan Cheng	ff129ff17f	Fix test on non-darwin hosts. llvm-svn: 72161	2009-05-20 05:45:36 +00:00
Evan Cheng	e17c02e328	Try again. Allow call to immediate address for ELF or when in static relocation mode. llvm-svn: 72160	2009-05-20 04:53:57 +00:00
Evan Cheng	8a4887572e	Cannot use immediate as call absolute target in PIC mode. llvm-svn: 72154	2009-05-20 01:11:00 +00:00
Dan Gohman	904f081ce7	Add nounwind to a few tests. llvm-svn: 72002	2009-05-18 15:16:49 +00:00
Jakob Stoklund Olesen	fa57451cf5	Help DejaGnu avoid pipe-jam by producing less output from certain test cases. When a test fails with more than a pipeful of output on stdout AND stderr, one of the DejaGnu programs blocks. The problem can be avoided by redirecting stdout to a file. llvm-svn: 71919	2009-05-16 00:34:42 +00:00
Dan Gohman	a09e38894a	Add nounwind to this test. llvm-svn: 71734	2009-05-13 22:29:12 +00:00
Evan Cheng	e43bfc153e	If header of inner loop is aligned, do not align the outer loop header. We don't want to add nops in the outer loop for the sake of aligning the inner loop. llvm-svn: 71609	2009-05-12 23:58:14 +00:00
Evan Cheng	c7f7276825	Teach TransferDeadness to delete truly dead instructions if they do not produce side effects. llvm-svn: 71606	2009-05-12 23:07:00 +00:00
Evan Cheng	b0a4c44103	Add nounwind. llvm-svn: 71575	2009-05-12 18:35:43 +00:00
Evan Cheng	d6e3e4d746	Fixed a stack slot coloring with reg bug: do not update implicit use / def when doing forward / backward propagation. llvm-svn: 71574	2009-05-12 18:31:57 +00:00
Dan Gohman	d13f674130	Factor the code for collecting IV users out of LSR into an IVUsers class, and generalize it so that it can be used by IndVarSimplify. Implement the base IndVarSimplify transformation code using IVUsers. This removes TestOrigIVForWrap and associated code, as ScalarEvolution now has enough builtin overflow detection and folding logic to handle all the same cases, and more. Run "opt -iv-users -analyze -disable-output" on your favorite loop for an example of what IVUsers does. This lets IndVarSimplify eliminate IV casts and compute trip counts in more cases. Also, this happens to finally fix the remaining testcases in PR1301. Now that IndVarSimplify is being more aggressive, it occasionally runs into the problem where ScalarEvolutionExpander's code for avoiding duplicate expansions makes it difficult to ensure that all expanded instructions dominate all the instructions that will use them. As a temporary measure, IndVarSimplify now uses a FixUsesBeforeDefs function to fix up instructions inserted by SCEVExpander. Fortunately, this code is contained, and can be easily removed once a more comprehensive solution is available. llvm-svn: 71535	2009-05-12 02:17:14 +00:00
Evan Cheng	9b27f3ec42	Teach LSR to optimize more loop exit compares, i.e. change them to use postinc iv value. Previously LSR would only optimize those which are in the loop latch block. However, if LSR can prove it is safe (and profitable), it's now possible to change those not in the latch blocks to use postinc values. Also, if the compare is the only use, LSR would place the iv increment instruction before the compare instead in the latch. llvm-svn: 71485	2009-05-11 22:33:01 +00:00
Dale Johannesen	dd32623987	Fix PR4188. TailMerging can't tolerate inexact sucessor info. llvm-svn: 71478	2009-05-11 21:54:13 +00:00
Dan Gohman	25ab4c185c	Make this grep line a little more specific so that it doesn't accidentally match something unrelated. llvm-svn: 71458	2009-05-11 18:49:56 +00:00
Dan Gohman	dfa39efe6d	When scalarizing a vector BITCAST, check whether the operand has vector type, rather than assume that it does. If the operand is not vector, it shouldn't be run through ScalarizeVectorOp. This fixes one of the testcases in PR3886. llvm-svn: 71453	2009-05-11 18:30:42 +00:00
Dan Gohman	0edabc8a6f	Convert a subtract into a negate and an add when it helps x86 address folding. llvm-svn: 71446	2009-05-11 18:02:53 +00:00
Dale Johannesen	f86e34065b	Reverse a loop that is counting up to a maximum to count down to 0 instead, under very restricted circumstances. Adjust 4 testcases in which this optimization fires. llvm-svn: 71439	2009-05-11 17:15:42 +00:00
Evan Cheng	06b0d3879e	Enable loop bb placement optimization. llvm-svn: 71291	2009-05-08 23:35:49 +00:00
Chris Lattner	7b2dabcac9	Fix PR4152: asm constraint validation happens before dag combine, so we need to work a bit to combine things like (x+c1+c2) into x+c3. llvm-svn: 71232	2009-05-08 18:23:14 +00:00
Evan Cheng	2a1d20b0fb	Optimize code placement in loop to eliminate unconditional branches or move unconditional branch to the outside of the loop. e.g. /// A: /// ... /// <fallthrough to B> /// /// B: --> loop header /// ... /// jcc <cond> C, [exit] /// /// C: /// ... /// jmp B /// /// ==> /// /// A: /// ... /// jmp B /// /// C: --> new loop header /// ... /// <fallthough to B> /// /// B: /// ... /// jcc <cond> C, [exit] llvm-svn: 71209	2009-05-08 06:34:09 +00:00
Bill Wendling	864cbcfc46	THis doesn't fail. llvm-svn: 71142	2009-05-07 01:41:42 +00:00
Bill Wendling	7c50dcd02e	Temporarily revert r71010. It was causing massive failures during self-hosting. llvm-svn: 71138	2009-05-07 01:27:25 +00:00
Lang Hames	fcc5ebb1d4	Renamed Spiller classes (plus uses and related files) to VirtRegRewriter. llvm-svn: 71057	2009-05-06 02:36:21 +00:00
Evan Cheng	0d781df8dc	Quotes should be printed before private prefix; some code clean up. llvm-svn: 71032	2009-05-05 22:50:29 +00:00
Dan Gohman	5e839321f2	If a MachineBasicBlock has multiple ways of reaching another block, allow it to have multiple CFG edges to that block. This is needed to allow MachineBasicBlock::isOnlyReachableByFallthrough to work correctly. This fixes PR4126. llvm-svn: 71018	2009-05-05 21:10:19 +00:00
Evan Cheng	984da04cd0	Enable stack coloring with regs at -O3. llvm-svn: 71010	2009-05-05 20:30:36 +00:00
Chris Lattner	5cc9a36d1c	Add basic support for code generation of addrspace(257) -> FS relative on x86. Patch by Zoltan Varga! llvm-svn: 70992	2009-05-05 18:52:19 +00:00
Dan Gohman	2973567a95	X86FastISel doesn't support the -tailcallopt ABI. llvm-svn: 70902	2009-05-04 19:50:33 +00:00
Dan Gohman	a79cce4aef	Previously, RecursivelyDeleteDeadInstructions provided an option of returning a list of pointers to Values that are deleted. This was unsafe, because the pointers in the list are, by nature of what RecursivelyDeleteDeadInstructions does, always dangling. Replace this with a simple callback mechanism. This may eventually be removed if all clients can reasonably be expected to use CallbackVH. Use this to factor out the dead-phi-cycle-elimination code from LSR utility function, and generalize it to use the RecursivelyDeleteTriviallyDeadInstructions utility function. This makes LSR more aggressive about eliminating dead PHI cycles; adjust tests to either be less trivial or to simply expect fewer instructions. llvm-svn: 70636	2009-05-02 18:29:22 +00:00
Evan Cheng	b7d41a6680	Mark MOV8mr_NOREX and MOV8rm_NOREX as mayStore / mayLoad respectively. llvm-svn: 70461	2009-04-30 00:58:57 +00:00
Chris Lattner	794fb5b4b3	fix a regression handling indirect results: these need to be considered memory operands otherwise the writebacks get lost when the inline asm doesn't otherwise have side effects. This fixes rdar://6839427, though clang really shouldn't generate these anymore. llvm-svn: 70455	2009-04-30 00:48:50 +00:00
Nate Begeman	b407809122	Fix infinite recursion in the C++ code which handles movddup by making it unnecessary. llvm-svn: 70425	2009-04-29 22:47:44 +00:00
Evan Cheng	62fdc300dd	spillPhysRegAroundRegDefsUses() may have invalidated iterators stored in fixed_ IntervalPtrs. Reset them. llvm-svn: 70378	2009-04-29 07:16:34 +00:00
Bill Wendling	7546bed590	Second attempt: Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to use the old behavior, the flag is -O0. This change allows for finer-grained control over which optimizations are run at different -O levels. Most of this work was pretty mechanical. The majority of the fixes came from verifying that a "fast" variable wasn't used anymore. The JIT still uses a "Fast" flag. I'll change the JIT with a follow-up patch. llvm-svn: 70343	2009-04-29 00:15:41 +00:00
Anton Korobeynikov	1799ac4b55	Properly print 'P' modifier on inline asm memory operands. This should fix PR3379 and PR4064. Patch inspired by Edwin Török! llvm-svn: 70328	2009-04-28 21:49:33 +00:00
Evan Cheng	754a0d2f9e	Fix PR4034. Bug in LiveInterval::join when it's compacting new valno's. llvm-svn: 70291	2009-04-28 06:24:09 +00:00
Evan Cheng	8a9736a26c	Fix for PR4051. When 2address pass delete an instruction, update kill info when necessary. llvm-svn: 70279	2009-04-28 02:12:36 +00:00
Bill Wendling	ef47ace92f	r70270 isn't ready yet. Back this out. Sorry for the noise. llvm-svn: 70275	2009-04-28 01:04:53 +00:00
Bill Wendling	2799e916c3	Massive check in. This changes the "-fast" flag to "-O#" in llc. If you want to use the old behavior, the flag is -O0. This change allows for finer-grained control over which optimizations are run at different -O levels. Most of this work was pretty mechanical. The majority of the fixes came from verifying that a "fast" variable wasn't used anymore. The JIT still uses a "Fast" flag. I'm not 100% sure if it's necessary to change it there... llvm-svn: 70270	2009-04-28 00:21:31 +00:00
Evan Cheng	c315cf24e3	Fix PR4076. Correctly create live interval of physical register with two-address update. llvm-svn: 70245	2009-04-27 20:42:46 +00:00
Dan Gohman	e1a532cb4f	Permit ChangeCompareStride to rewrite a comparison when the factor between the comparison's iv stride and the candidate stride is exactly -1. llvm-svn: 70244	2009-04-27 20:35:32 +00:00
Dan Gohman	ff30ebd710	Teach getZeroExtendExpr and getSignExtendExpr to use trip-count information to simplify [sz]ext({a,+,b}) to {zext(a),+,[zs]ext(b)}, as appropriate. These functions and the trip count code each call into the other, so this requires careful handling to avoid infinite recursion. During the initial trip count computation, conservative SCEVs are used, which are subsequently discarded once the trip count is actually known. Among other benefits, this change lets LSR automatically eliminate some unnecessary zext-inreg and sext-inreg operation where the operand is an induction variable. llvm-svn: 70241	2009-04-27 20:16:15 +00:00
Nate Begeman	9d121924fd	2nd attempt, fixing SSE4.1 issues and implementing feedback from duncan. PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. llvm-svn: 70225	2009-04-27 18:41:29 +00:00
Evan Cheng	43fc90ae59	Fix PR4056. It's possible a physical register def is dead if its implicit use is deleted by two-address pass. llvm-svn: 70213	2009-04-27 17:36:47 +00:00
Dan Gohman	4aeebb184b	Fix the syntax for a PR number in a test. llvm-svn: 70208	2009-04-27 15:08:34 +00:00
Dan Gohman	744f455d55	When transforming sext(trunc(load(x))) into sext(smaller load(x)), the trunc is directly replaced with the smaller load, so don't try to create a new sext node. This fixes PR4050. llvm-svn: 70179	2009-04-27 02:00:55 +00:00
Evan Cheng	696a04eba2	Do not share a single unknown val# for all the live ranges merged into a physical sub-register live interval. When coalescer is merging in clobbered virtaul register live interval into a physical register live interval, give each virtual register val# a separate val# in the physical register live interval. Otherwise, the coalescer would have lost track of the definitions information it needs to make correct coalescing decisions. llvm-svn: 70026	2009-04-25 09:25:19 +00:00
Rafael Espindola	4e7a0bf1f1	Fix PR 4004 by including the call to __tls_get_addr in X86tlsaddr. This is not very elegant, but neither is the tls specification :-( llvm-svn: 69968	2009-04-24 12:59:40 +00:00
Rafael Espindola	0b1037ad26	Revert 69952. Causes testsuite failures on linux x86-64. llvm-svn: 69967	2009-04-24 12:40:33 +00:00
Nate Begeman	c1a09c7dfa	PR2957 ISD::VECTOR_SHUFFLE now stores an array of integers representing the shuffle mask internal to the node, rather than taking a BUILD_VECTOR of ConstantSDNodes as the shuffle mask. A value of -1 represents UNDEF. In addition to eliminating the creation of illegal BUILD_VECTORS just to represent shuffle masks, we are better about canonicalizing the shuffle mask, resulting in substantially better code for some classes of shuffles. A clean up of x86 shuffle code, and some canonicalizing in DAGCombiner is next. llvm-svn: 69952	2009-04-24 03:42:54 +00:00
Dan Gohman	3499a53e1d	Explicitly pass -tailcallopt=false to these tests so that they work as intended no matter what the default setting of that option is. llvm-svn: 69911	2009-04-23 19:39:41 +00:00
Evan Cheng	a36c6c6819	It has finally happened. Spiller is now using live interval info. This fixes a very subtle bug. vr defined by an implicit_def is allowed overlap with any register since it doesn't actually modify anything. However, if it's used as a two-address use, its live range can be extended and it can be spilled. The spiller must take care not to emit a reload for the vn number that's defined by the implicit_def. This is both a correctness and performance issue. llvm-svn: 69743	2009-04-21 22:46:52 +00:00
Evan Cheng	c248188b46	Added a linearscan register allocation optimization. When the register allocator spill an interval with multiple uses in the same basic block, it creates a different virtual register for each of the reloads. e.g. %reg1498<def> = MOV32rm %reg1024, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0] %reg1506<def> = MOV32rm %reg1024, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0] %reg1486<def> = MOV32rr %reg1506 %reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead> %reg1510<def> = MOV32rm %reg1024, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0] => %reg1498<def> = MOV32rm %reg2036, 1, %reg0, 12, %reg0, Mem:LD(4,4) [sunkaddr39 + 0] %reg1506<def> = MOV32rm %reg2037, 1, %reg0, 8, %reg0, Mem:LD(4,4) [sunkaddr42 + 0] %reg1486<def> = MOV32rr %reg1506 %reg1486<def> = XOR32rr %reg1486, %reg1498, %EFLAGS<imp-def,dead> %reg1510<def> = MOV32rm %reg2038, 1, %reg0, 4, %reg0, Mem:LD(4,4) [sunkaddr45 + 0] From linearscan's point of view, each of reg2036, 2037, and 2038 are separate registers, each is "killed" after a single use. The reloaded register is available and it's often clobbered right away. e.g. In thise case reg1498 is allocated EAX while reg2036 is allocated RAX. This means we end up with multiple reloads from the same stack slot in the same basic block. Now linearscan recognize there are other reloads from same SS in the same BB. So it'll "downgrade" RAX (and its aliases) after reg2036 is allocated until the next reload (reg2037) is done. This greatly increase the likihood reloads from SS are reused. This speeds up sha1 from OpenSSL by 5.8%. It is also an across the board win for SPEC2000 and 2006. llvm-svn: 69585	2009-04-20 08:01:12 +00:00
Dale Johannesen	8a4446429e	Adjust XFAIL syntax, maybe that will help. The other way worked for me... llvm-svn: 69414	2009-04-18 02:01:23 +00:00
Dale Johannesen	05d46aca49	patch 69408 breaks this by removing the opportunity for the optimization it's testing to kick in (although it improves the code, getting rid of all spills). I don't understand the optimization well enough to rescue the test, so XFAILing. llvm-svn: 69409	2009-04-18 00:11:50 +00:00
Rafael Espindola	d74132e2c5	For general dynamic TLS access we must use leaq foo@TLSGD(%rip), %rdi as part of the instruction sequence. Using a register other than %rdi and then copying it to %rdi is not valid. llvm-svn: 69350	2009-04-17 14:35:58 +00:00
Evan Cheng	2d5be54315	Teach spiller to unfold instructions which modref spill slot when a scratch register is available and when it's profitable. e.g. xorq %r12<kill>, %r13 addq %rax, -184(%rbp) addq %r13, -184(%rbp) ==> xorq %r12<kill>, %r13 movq -184(%rbp), %r12 addq %rax, %r12 addq %r13, %r12 movq %r12, -184(%rbp) Two more instructions, but fewer memory accesses. It can also open up opportunities for more optimizations. llvm-svn: 69341	2009-04-17 01:29:40 +00:00
Rafael Espindola	a07d1c3103	fix PR3995. A scale must be 1, 2, 4 or 8. llvm-svn: 69284	2009-04-16 12:34:53 +00:00
Dan Gohman	98aa1d9693	Expand GEPs in ScalarEvolution expressions. SCEV expressions can now have pointer types, though in contrast to C pointer types, SCEV addition is never implicitly scaled. This not only eliminates the need for special code like IndVars' EliminatePointerRecurrence and LSR's own GEP expansion code, it also does a better job because it lets the normal optimizations handle pointer expressions just like integer expressions. Also, since LLVM IR GEPs can't directly index into multi-dimensional VLAs, moving the GEP analysis out of client code and into the SCEV framework makes it easier for clients to handle multi-dimensional VLAs the same way as other arrays. Some existing regression tests show improved optimization. test/CodeGen/ARM/2007-03-13-InstrSched.ll in particular improved to the point where if-conversion started kicking in; I turned it off for this test to preserve the intent of the test. llvm-svn: 69258	2009-04-16 03:18:22 +00:00
Dan Gohman	e1c4d4c5be	Fix the RUN lines so that this test actually tests. llvm-svn: 69096	2009-04-14 22:50:17 +00:00
Dan Gohman	365c457893	For the h-register addressing-mode trick, use the correct value for any non-address uses of the address value. This fixes 186.crafty. llvm-svn: 69094	2009-04-14 22:45:05 +00:00
Dan Gohman	3c19cf07d9	When the result of an EXTRACT_SUBREG, INSERT_SUBREG, or SUBREG_TO_REG operator is used by a CopyToReg to export the value to a different block, don't reuse the CopyToReg's register for the subreg operation result if the register isn't precisely the right class for the subreg operation. Also, rename the h-registers.ll test, now that there are more than one. llvm-svn: 69087	2009-04-14 22:17:14 +00:00
Evan Cheng	b64f2c1b08	Some of GR8_NOREX registers are only available in 64-bit mode. llvm-svn: 69049	2009-04-14 16:57:43 +00:00
Evan Cheng	9f44d3148c	Fix PR3934 part 2. findOnlyInterestingUse() was not setting IsCopy and IsDstPhys which are returned by value and used by callee. This happened to work on the earlier test cases because of a logic error in the caller side. llvm-svn: 69006	2009-04-14 00:32:25 +00:00
Evan Cheng	fa48d5c8d0	PR3934: Fix a bogus two-address pass assertion. llvm-svn: 68979	2009-04-13 20:04:24 +00:00
Dan Gohman	be7227005f	Implement x86 h-register extract support. - Add patterns for h-register extract, which avoids a shift and mask, and in some cases a temporary register. - Add address-mode matching for turning (X>>(8-n))&(255<<n), where n is a valid address-mode scale value, into an h-register extract and a scaled-offset address. - Replace X86's MOV32to32_ and related instructions with the new target-independent COPY_TO_SUBREG instruction. On x86-64 there are complicated constraints on h registers, and CodeGen doesn't currently provide a high-level way to express all of them, so they are handled with a bunch of special code. This code currently only supports extracts where the result is used by a zero-extend or a store, though these are fairly common. These transformations are not always beneficial; since there are only 4 h registers, they sometimes require extra move instructions, and this sometimes increases register pressure because it can force out values that would otherwise be in one of those registers. However, this appears to be relatively uncommon. llvm-svn: 68962	2009-04-13 16:09:41 +00:00
Rafael Espindola	72347bffce	X86-64 TLS support for local exec and initial exec. llvm-svn: 68947	2009-04-13 13:02:49 +00:00
Rafael Espindola	ad8137187c	In X86DAGToDAGISel::MatchWrapper, if base or index are set, avoid matching only if symbolic addresses are RIP relatives. llvm-svn: 68924	2009-04-12 23:00:38 +00:00
Rafael Espindola	412b15f4ed	Add tests for the parts of X86-64 TLS that are already implemented. llvm-svn: 68901	2009-04-12 10:43:41 +00:00
Chris Lattner	6d6cf3ff4a	fix a cross-block fastisel crash handling overflow intrinsics. See comment for details. This fixes rdar://6772169 llvm-svn: 68890	2009-04-12 07:51:14 +00:00
Rafael Espindola	88986ef511	Don't fold a load if the other operand is a TLS address. With this we generate movl %gs:0, %eax leal i@NTPOFF(%eax), %eax instead of movl $i@NTPOFF, %eax addl %gs:0, %eax llvm-svn: 68778	2009-04-10 10:09:34 +00:00
Chris Lattner	301c4f39a0	reg0 references are not real registers. This fixes a crash on the attached testcase. llvm-svn: 68712	2009-04-09 16:50:43 +00:00
Dan Gohman	68de98eef3	Generalize ExtendUsesToFormExtLoad to be usable for ANY_EXTEND, in addition to ZERO_EXTEND and SIGN_EXTEND. Fix a bug in the way it checked for live-out values, and simplify the way it find users by using SDNode::use_iterator's (relatively) new features. Also, make it slightly more permissive on targets with free truncates. In SelectionDAGBuild, avoid creating ANY_EXTEND nodes that are larger than necessary. If the target's SwitchAmountTy has enough bits, use it. This exposes the truncate to optimization early, enabling more optimizations. llvm-svn: 68670	2009-04-09 03:51:29 +00:00
Rafael Espindola	7eb72dc5f2	Re-apply 68552. Tested by bootstrapping llvm-gcc and using that to build llvm. llvm-svn: 68645	2009-04-08 21:14:34 +00:00
Dan Gohman	94fde57da3	Fully escape the grep string for this test. llvm-svn: 68580	2009-04-08 00:54:40 +00:00
Dan Gohman	b979f332fd	Update this test for recent codegen improvements. CodeGen is now using an lea in place of a mov and an add for this test. llvm-svn: 68579	2009-04-08 00:51:11 +00:00
Dan Gohman	c9ce27d6b7	Implement support for using modeling implicit-zero-extension on x86-64 with SUBREG_TO_REG, teach SimpleRegisterCoalescing to coalesce SUBREG_TO_REG instructions (which are similar to INSERT_SUBREG instructions), and teach the DAGCombiner to take advantage of this on targets which support it. This eliminates many redundant zero-extension operations on x86-64. This adds a new TargetLowering hook, isZExtFree. It's similar to isTruncateFree, except it only applies to actual definitions, and not no-op truncates which may not zero the high bits. Also, this adds a new optimization to SimplifyDemandedBits: transform operations like x+y into (zext (add (trunc x), (trunc y))) on targets where all the casts are no-ops. In contexts where the high part of the add is explicitly masked off, this allows the mask operation to be eliminated. Fix the DAGCombiner to avoid undoing these transformations to eliminate casts on targets where the casts are no-ops. Also, this adds a new two-address lowering heuristic. Since two-address lowering runs before coalescing, it helps to be able to look through copies when deciding whether commuting and/or three-address conversion are profitable. Also, fix a bug in LiveInterval::MergeInClobberRanges. It didn't handle the case that a clobber range extended both before and beyond an existing live range. In that case, multiple live ranges need to be added. This was exposed by the new subreg coalescing code. Remove 2008-05-06-SpillerBug.ll. It was bugpoint-reduced, and the spiller behavior it was looking for no longer occurrs with the new instruction selection. llvm-svn: 68576	2009-04-08 00:15:30 +00:00
Bill Wendling	6e702cf68c	Temporarily revert r68552. This was causing a failure in the self-hosting LLVM builds. --- Reverse-merging (from foreign repository) r68552 into '.': U test/CodeGen/X86/tls8.ll U test/CodeGen/X86/tls10.ll U test/CodeGen/X86/tls2.ll U test/CodeGen/X86/tls6.ll U lib/Target/X86/X86Instr64bit.td U lib/Target/X86/X86InstrSSE.td U lib/Target/X86/X86InstrInfo.td U lib/Target/X86/X86RegisterInfo.cpp U lib/Target/X86/X86ISelLowering.cpp U lib/Target/X86/X86CodeEmitter.cpp U lib/Target/X86/X86FastISel.cpp U lib/Target/X86/X86InstrInfo.h U lib/Target/X86/X86ISelDAGToDAG.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.cpp U lib/Target/X86/AsmPrinter/X86ATTAsmPrinter.h U lib/Target/X86/AsmPrinter/X86IntelAsmPrinter.h U lib/Target/X86/X86ISelLowering.h U lib/Target/X86/X86InstrInfo.cpp U lib/Target/X86/X86InstrBuilder.h U lib/Target/X86/X86RegisterInfo.td llvm-svn: 68560	2009-04-07 22:35:25 +00:00
Rafael Espindola	0324937229	Reduce code duplication on the TLS implementation. This introduces a small regression on the generated code quality in the case we are just computing addresses, not loading values. Will work on it and on X86-64 support. llvm-svn: 68552	2009-04-07 21:37:46 +00:00
Dan Gohman	e98c3b1ea1	Don't attempt to handle aggregate argument values in FastISel; let SelectionDAG do those. This fixes PR3955. llvm-svn: 68546	2009-04-07 20:40:11 +00:00
Nick Lewycky	811a7377b0	Try SSE2? llvm-svn: 68423	2009-04-04 10:24:24 +00:00
Nick Lewycky	49df985ad9	Fix test on non-x86 platforms. llvm-svn: 68419	2009-04-04 07:20:43 +00:00
Dan Gohman	ea48adc739	Fix a TargetLowering optimization so that it doesn't duplicate loads when an input node has multiple uses. llvm-svn: 68398	2009-04-03 20:11:30 +00:00
Mon P Wang	f829fb5cab	Added a x86 dag combine to increase the chances to use a movq for v2i64 on x86-32. llvm-svn: 68368	2009-04-03 02:43:30 +00:00
Evan Cheng	0674a512bf	Fully general expansion of integer shift of any size. llvm-svn: 68134	2009-03-31 19:39:24 +00:00
Dan Gohman	9f7d1f2bd4	Add an explicit -asm-verbose to these tests, to make it possible to run the tests with -asm-verbose defaulting to false. llvm-svn: 68124	2009-03-31 18:20:47 +00:00
Owen Anderson	59cff6919d	Remove the "fast" cases for spill and restore point determination, as these were subtlely wrong in obscure cases. Patch the testcase to account for this change. llvm-svn: 68093	2009-03-31 08:27:09 +00:00
Dan Gohman	cba99ee717	Fix live-out reg logic to not insert over-aggressive AssertZExt instructions. This fixes lua. llvm-svn: 68083	2009-03-31 01:38:29 +00:00
Evan Cheng	d7824e208a	Turn a 2-address instruction into a 3-address one when it's profitable even if the two-address operand is killed. e.g. %reg1024<def> = MOV r1 %reg1025<def> = ADD %reg1024, %reg1026 r0 = MOV %reg1025 If it's not possible / profitable to commute ADD, then turning ADD into a LEA saves a copy. llvm-svn: 68065	2009-03-30 21:34:07 +00:00
Anton Korobeynikov	497fd0e996	Tweak test for recent relro stuff llvm-svn: 68035	2009-03-30 15:28:40 +00:00
Evan Cheng	5c460dbc3d	Forgot this test. llvm-svn: 68025	2009-03-30 06:17:34 +00:00
Anton Korobeynikov	d24f576124	Testcase for recent ro/relocs stuff llvm-svn: 68008	2009-03-29 17:14:57 +00:00
Arnold Schwaighofer	76188bc8a1	Make check in CheckTailCallReturnConstraints for ignorable instructions between a CALL and a RET node more generic. Add a test for tail calls with a void return. llvm-svn: 67943	2009-03-28 12:36:29 +00:00
Arnold Schwaighofer	636127325b	Enable tail call optimization for functions that return a struct (bug 3664) and for functions that return types that need extending (e.g i1). llvm-svn: 67934	2009-03-28 08:33:27 +00:00
Evan Cheng	a15fdaa292	Optimize some 64-bit multiplication by constants into two lea's or one lea + shl since imulq is slow (latency 5). e.g. x * 40 => shlq $3, %rdi leaq (%rdi,%rdi,4), %rax This has the added benefit of allowing more multiply to be folded into addressing mode. e.g. a * 24 + b => leaq (%rdi,%rdi,2), %rax leaq (%rsi,%rax,8), %rax llvm-svn: 67917	2009-03-28 05:57:29 +00:00
Evan Cheng	86f6af35bf	Add -march=x86. llvm-svn: 67783	2009-03-26 23:03:32 +00:00
Bill Wendling	e59b8d1cad	Add -f to RUN line. llvm-svn: 67744	2009-03-26 06:17:54 +00:00
Chris Lattner	ad28e0de5f	no need for eh info llvm-svn: 67740	2009-03-26 05:51:18 +00:00
Bill Wendling	3b9278a1c5	Add testcase for r67728. llvm-svn: 67729	2009-03-26 01:52:47 +00:00
Evan Cheng	bddc7d1032	Add a test case for PR3779: when to promote the function return value. llvm-svn: 67702	2009-03-25 20:30:19 +00:00
Evan Cheng	7e4217176a	Revert 67132. This is breaking some objective-c apps. Also fixes SDISel so it does not force promote return value if the function is not marked signext / zeroext. llvm-svn: 67701	2009-03-25 20:20:11 +00:00
Evan Cheng	3a7489a4cc	CodeGen still defaults to non-verbose asm, but llc now overrides it and default to verbose. llvm-svn: 67668	2009-03-25 01:47:28 +00:00
Dan Gohman	edd5fa3721	Add a testcase for the scheduling heuristic introduced in r67586. llvm-svn: 67622	2009-03-24 16:38:27 +00:00
Evan Cheng	b3196f1298	Do not emit comments unless -asm-verbose. llvm-svn: 67580	2009-03-24 00:17:40 +00:00
Evan Cheng	702a8b4399	Fix a bug in spill weight computation. If the alias is a super-register, and the super-register is in the register class we are trying to allocate. Then add the weight to all sub-registers of the super-register even if they are not aliases. e.g. allocating for GR32, bh is not used, updating bl spill weight. bl should get the same spill weight otherwise it will be choosen as a spill candidate since spilling bh doesn't make ebx available. This fix PR2866. llvm-svn: 67574	2009-03-23 22:57:19 +00:00
Dale Johannesen	34123aba43	Fix internal representation of fp80 to be the same as a normal i80 {low64, high16} rather than its own {high64, low16}. A depressing number of places know about this; I think I got them all. Bitcode readers and writers convert back to the old form to avoid breaking compatibility. llvm-svn: 67562	2009-03-23 21:16:53 +00:00
Evan Cheng	e09988f66b	Update test for pr3864. llvm-svn: 67545	2009-03-23 18:27:36 +00:00
Evan Cheng	7e4a6972d6	Fix PR3391 and PR3864. Reg allocator infinite looping. llvm-svn: 67544	2009-03-23 18:24:37 +00:00
Evan Cheng	2ec94dd447	Model inline asm constraint which ties an input to an output register as machine operand TIED_TO constraint. This eliminated the need to pre-allocate registers for these. This also allows register allocator can eliminate the unneeded copies. llvm-svn: 67512	2009-03-23 08:01:15 +00:00
Evan Cheng	4b11d96b62	Do not fold away subreg_to_reg if the source register has a sub-register index. That means the source register is taking a sub-register of a larger register. e.g. On x86 %RAX<def> = ... %RAX<def> = SUBREG_TO_REG 0, %EAX:3<kill>, 3 The first def is defining RAX, not EAX so the top bits were not zero-extended. llvm-svn: 67511	2009-03-23 07:19:58 +00:00
Rafael Espindola	4461c91700	Add -relocation-model=pic so that the test works both in Linux and Darwin. llvm-svn: 67191	2009-03-18 09:38:28 +00:00

... 3 4 5 6 7 ...

1463 Commits