llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-30 23:42:52 +01:00

Author	SHA1	Message	Date
Joey Gouly	355a09f268	[ARMv8] Add CodeGen support for VSEL. This uses the ARMcmov pattern that Tim cleaned up in r188995. Thanks to Simon Tatham for his floating point help! llvm-svn: 189024	2013-08-22 15:29:11 +00:00
Joey Gouly	67edac2b5e	[ARM] Constrain some register classes in EmitAtomicBinary64 so that we pass these tests with -verify-machineinstrs. llvm-svn: 189006	2013-08-22 12:19:24 +00:00
Logan Chien	2891b0c61c	Fix ARM FastISel PIC function call. The function call to external function should come with PLT relocation type if the PIC relocation model is used. llvm-svn: 189002	2013-08-22 12:08:04 +00:00
Tim Northover	eb7a86ed88	ARM: use TableGen patterns to select CMOV operations. Back in the mists of time (2008), it seems TableGen couldn't handle the patterns necessary to match ARM's CMOV node that we convert select operations to, so we wrote a lot of fairly hairy C++ to do it for us. TableGen can deal with it now: there were a few minor differences to CodeGen (see tests), but nothing obviously worse that I could see, so we should probably address anything that does come up in a localised manner. llvm-svn: 188995	2013-08-22 09:57:11 +00:00
Tim Northover	7e34f41ef8	ARM: respect tied 64-bit inlineasm operands when printing The code for 'Q' and 'R' operand modifiers needs to look through tied operands to discover the register class. llvm-svn: 188990	2013-08-22 06:51:04 +00:00
Michael Gottesman	e5ccfcac27	[stackprotector] When finding the split point to splice off the end of a parentmbb into a successmbb, include any DBG_VALUE MI. Fix for PR16954. llvm-svn: 188987	2013-08-22 05:40:50 +00:00
Jim Grosbach	d6ff6507fa	ARM: R9 is not safe to use for tcGPR. Indirect tail-calls shouldn't use R9 for the branch destination, as it's not reliably a call-clobbered register. rdar://14793425 llvm-svn: 188967	2013-08-22 00:14:24 +00:00
Tom Stellard	721e3acccd	SelectionDAG: Make sure stores are always added to the LegalizedNodes list When truncated vector stores were being custom lowered in VectorLegalizer::LegalizeOp(), the old (illegal) and new (legal) node pair was not being added to LegalizedNodes list. Instead of the legalized result being passed to VectorLegalizer::TranslateLegalizeResult(), the result was being passed back into VectorLegalizer::LegalizeOp(), which ended up adding a (new, new) pair to the list instead. This was causing an assertion failure when a custom lowered truncated vector store was the last instruction a basic block and the VectorLegalizer was unable to find it in the LegalizedNodes list when updating the DAG root. llvm-svn: 188953	2013-08-21 22:42:58 +00:00
Manman Ren	1b7973fc19	TBAA: remove !tbaa from testing cases when they are not needed. This will make it easier to turn on struct-path aware TBAA since the metadata format will change. llvm-svn: 188944	2013-08-21 22:20:53 +00:00
Juergen Ributzka	d12ce0859f	Teach BaseIndexOffset::match to identify base pointers in loops. The small utility function that pattern matches Base + Index + Offset patterns for loads and stores fails to recognize the base pointer for loads/stores from/into an array at offset 0 inside a loop. As a result DAGCombiner::MergeConsecutiveStores was not able to merge all stores. This commit fixes the issue by adding an additional pattern match and also a test case. Reviewer: Nadav llvm-svn: 188936	2013-08-21 21:53:38 +00:00
Hao Liu	7962606ca8	A minor change for an obvous problem caused by r188451: def imm0_63 : Operand<i32>, ImmLeaf<i32, [{ return Imm >= 0 && Imm < 63;}]>{ As it seems Imm <63 should be Imm <= 63. ImmLeaf is used in pattern match, but there is already a function check the shift amount range, so just remove ImmLeaf. Also add a test to check 63. llvm-svn: 188911	2013-08-21 17:47:53 +00:00
Joey Gouly	ec3b9aa53e	Add -mcpu to two X86 tests. These tests are failing on Haswell CPUs due to different instruction selection. llvm-svn: 188908	2013-08-21 17:14:31 +00:00
Elena Demikhovsky	44bbb2b413	AVX-512: Added SHIFT instructions. llvm-svn: 188899	2013-08-21 09:36:02 +00:00
Richard Sandiford	1dc05c13d2	[SystemZ] Define remainig *MUL_LOHI patterns The initial port used MLG(R) for i64 UMUL_LOHI but left the other three combinations as not-legal-or-custom. Although 32x32->{32,32} multiplications exist, they're not as quick as doing a normal 64-bit multiplication, so it didn't seem like i32 SMUL_LOHI and UMUL_LOHI would be useful. There's also no direct instruction for i64 SMUL_LOHI, so it needs to be implemented in terms of UMUL_LOHI. However, not defining these patterns means that we don't convert division by a constant into multiplication, so this patch fills in the other cases. The new i64 SMUL_LOHI sequence is simpler than the one that we used previously for 64x64->128 multiplication, so int-mul-08.ll now tests the full sequence. llvm-svn: 188898	2013-08-21 09:34:56 +00:00
Richard Sandiford	e6e07910e3	[SystemZ] Use FI[EDX]BRA for codegen llvm-svn: 188895	2013-08-21 09:04:20 +00:00
Akira Hatanaka	a80bdd3ab0	[mips] Add support for mfhc1 and mthc1. llvm-svn: 188848	2013-08-20 23:47:25 +00:00
Reed Kotler	3e323b240e	Add an option which permits the user to specify using a bitmask, that various functions be compiled as mips32, without having to add attributes. This is useful in certain situations where you don't want to have to edit the function attributes in the source. For now it's only an option used for the compiler developers when debugging the mips16 port. llvm-svn: 188826	2013-08-20 20:53:09 +00:00
Jim Grosbach	343f1fbc39	ARM: Fix fast-isel copy/paste-o. Update testcase to be more careful about checking register values. While regexes are general goodness for these sorts of testcases, in this example, the registers are constrained by the calling convention, so we can and should check their explicit values. rdar://14779513 llvm-svn: 188819	2013-08-20 19:12:42 +00:00
Elena Demikhovsky	f09dad5d90	AVX-512: Added more patterns for VMOVSS, VMOVSD, VMOVD, VMOVQ llvm-svn: 188786	2013-08-20 11:00:29 +00:00
Daniel Sanders	30561c36b8	[mips][msa] Removed fcge, fcgt, fsge, fsgt These instructions were present in a draft spec but were removed before publication. llvm-svn: 188782	2013-08-20 09:41:47 +00:00
Richard Sandiford	add1a68f21	[SystemZ] Use SRST to optimize memchr SystemZTargetLowering::emitStringWrapper() previously loaded the character into R0 before the loop and made R0 live on entry. I'd forgotten that allocatable registers weren't allowed to be live across blocks at this stage, and it confused LiveVariables enough to cause a miscompilation of f3 in memchr-02.ll. This patch instead loads R0 in the loop and leaves LICM to hoist it after RA. This is actually what I'd tried originally, but I went for the manual optimisation after noticing that R0 often wasn't being hoisted. This bug forced me to go back and look at why, now fixed as r188774. We should also try to optimize null checks so that they test the CC result of the SRST directly. The select between null and the SRST GPR result could then usually be deleted as dead. llvm-svn: 188779	2013-08-20 09:38:48 +00:00
Daniel Sanders	91c40d80de	[mips][msa] Added insve llvm-svn: 188777	2013-08-20 09:22:54 +00:00
Richard Sandiford	6a0b1638b4	Fix test typo and add usual "br %r14" test llvm-svn: 188775	2013-08-20 09:14:46 +00:00
Richard Sandiford	fcd54a3b89	Fix overly pessimistic shortcut in post-RA MachineLICM Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers and TermRegs. When it sees a definition of R it adds all aliases of R to the corresponding set, so that when it needs to test for membership it only needs to test a single register, rather than worrying about aliases there too. E.g. the final candidate loop just has: unsigned Def = Candidates[i].Def; if (!PhysRegClobbers.test(Def) && ...) { to test whether register Def is multiply defined. However, there was also a shortcut in ProcessMI to make sure we didn't add candidates if we already knew that they would fail the final test. This shortcut was more pessimistic than the final one because it checked whether _any alias_ of the defined register was multiply defined. This is too conservative for targets that define register pairs. E.g. on z, R0 and R1 are sometimes used as a pair, so there is a 128-bit register that aliases both R0 and R1. If a loop used R0 and R1 independently, and the definition of R0 came first, we would be able to hoist the R0 assignment (because that used the final test quoted above) but not the R1 assignment (because that meant we had two definitions of the paired R0/R1 register and would fail the shortcut in ProcessMI). This patch just uses the same check for the ProcessMI shortcut as we use in the final candidate loop. llvm-svn: 188774	2013-08-20 09:11:13 +00:00
Tim Northover	cec1079024	ARM: implement some simple f64 materializations. Previously we used a const-pool load for virtually all 64-bit floating values. Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov" instructions of one stripe or another. llvm-svn: 188773	2013-08-20 08:57:11 +00:00
Daniel Sanders	15341e9a12	[mips][msa] Added and.v, bmnz.v, bmz.v, bsel.v, nor.v, or.v, xor.v llvm-svn: 188767	2013-08-20 08:38:21 +00:00
Hal Finkel	4bb40e7c8d	Don't form PPC CTR-based loops around a copysignl call copysign/copysignf never become function calls (because the SDAG expansion code does not lower to the corresponding function call, but rather directly implements the associated logic), but copysignl almost always is lowered into a call to the requested libm functon (and, thus, might clobber CTR). llvm-svn: 188727	2013-08-19 23:35:24 +00:00
Paul Redmond	404ef5af36	Improve the widening of integral binary vector operations - split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap - WidenVecRes_BinaryCanTrap preserves the original behaviour for operations that can trap - WidenVecRes_Binary simply widens the operation and improves codegen for 3-element vectors by allowing widening and promotion on x86 (matches the behaviour of unary and ternary operation widening) - use WidenVecRes_Binary for operations on integers. Reviewed by: nrotem llvm-svn: 188699	2013-08-19 20:01:35 +00:00
Elena Demikhovsky	f1afd2e4db	AVX-512: added arithmetic and logical operations. ADD, SUB, MUL integer and FP types. OR, AND, XOR. Added embeded broadcast form for these instructions. llvm-svn: 188673	2013-08-19 13:26:14 +00:00
Richard Sandiford	5e32ef0acd	[SystemZ] Add negative integer absolute (load negative) For now this matches the equivalent of (neg (abs ...)), which did hit a few times in projects/test-suite. We should probably also match cases where absolute-like selects are used with reversed arguments. llvm-svn: 188671	2013-08-19 12:56:58 +00:00
Richard Sandiford	aee0958460	[SystemZ] Add integer absolute (load positive) llvm-svn: 188670	2013-08-19 12:48:54 +00:00
Richard Sandiford	841d24aa5a	[SystemZ] Add support for sibling calls This first cut is pretty conservative. The final argument register (R6) is call-saved, so we would need to make sure that the R6 argument to a sibling call is the same as the R6 argument to the calling function, which seems worth keeping as a separate patch. Saying that integer truncations are free means that we no longer use the extending instructions LGF and LLGF for spills in int-conv-09.ll and int-conv-10.ll. Instead we treat the registers as 64 bits wide and truncate them to 32-bits where necessary. I think it's unlikely we'd use LGF and LLGF for spills in other situations for the same reason, so I'm removing the tests rather than replacing them. The associated code is generic and applies to many more instructions than just LGF and LLGF, so there is no corresponding code removal. llvm-svn: 188669	2013-08-19 12:42:31 +00:00
Hal Finkel	23714b2e52	Add ExpandFloatOp_FCOPYSIGN to handle ppcf128-related expansions We had previously been asserting when faced with a FCOPYSIGN f64, ppcf128 node because there was no way to expand the FCOPYSIGN node. Because ppcf128 is the sum of two doubles, and the first double must have the larger magnitude, we can take the sign from the first double. As a result, in addition to fixing the crash, this is also an optimization. llvm-svn: 188655	2013-08-19 06:55:37 +00:00
Hal Finkel	9591220c33	Add the PPC fcpsgn instruction Modern PPC cores support a floating-point copysign instruction, and we can use this to lower the FCOPYSIGN node (which is created from calls to the libm copysign function). A couple of extra patterns are necessary because the operand types of FCOPYSIGN need not agree. llvm-svn: 188653	2013-08-19 05:01:02 +00:00
Tim Northover	057a4d7c26	ARM: make sure we keep inline asm operands tied. When patching inlineasm nodes to use GPRPair for 64-bit values, we were dropping the information that two operands were tied, which effectively broke the live-interval of vregs affected. llvm-svn: 188643	2013-08-18 18:06:03 +00:00
Elena Demikhovsky	406cf0ea6d	AVX-512: Added VMOVD, VMOVQ, VMOVSS, VMOVSD instructions. llvm-svn: 188637	2013-08-18 13:08:57 +00:00
Tom Stellard	ad43e88afa	R600: Expand vector FRINT ops Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 188598	2013-08-16 23:51:33 +00:00
Tom Stellard	e42573d2cc	R600: Expand vector FFLOOR ops Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 188597	2013-08-16 23:51:29 +00:00
Tom Stellard	0721bae8ba	R600: Expand vector float operations for both SI and R600 Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> llvm-svn: 188596	2013-08-16 23:51:24 +00:00
Jim Grosbach	58bafdb9f1	ARM: Properly constrain comparison fastisel register classes. Ongoing 'make the verifier happy' improvements to ARM fast-isel. rdar://12594152 llvm-svn: 188595	2013-08-16 23:37:40 +00:00
Jim Grosbach	4829862a24	ARM: Fast-isel register class constrain for extends. Properly constrain the operand register class for instructions used in [sz]ext expansion. Update more tests to use the verifier now that we're getting the register classes correct. rdar://12594152 llvm-svn: 188594	2013-08-16 23:37:36 +00:00
Jim Grosbach	de05043e78	ARM: Fix more fast-isel verifier failures. Teach the generic instruction selection helper functions to constrain the register classes of their input operands. For non-physical register references, the generic code needs to be careful not to mess that up when replacing references to result registers. As the comment indicates for MachineRegisterInfo::replaceRegWith(), it's important to call constrainRegClass() first. rdar://12594152 llvm-svn: 188593	2013-08-16 23:37:31 +00:00
Jim Grosbach	7f992f45da	ARM: Clean up fast-isel machine verifier errors. Lots of machine verifier errors result from using a plain GPR regclass for incoming argument copies. A more restrictive rGPR class is more appropriate since it more accurately represents what's happening, plus it lines up better with isel later on so the verifier is happier. Reduces the number of ARM fast-isel tests not running with the verifier enabled by over half. rdar://12594152 llvm-svn: 188592	2013-08-16 23:37:23 +00:00
Reed Kotler	4a69818916	Fix a subtle difference between running clang vs llc for mips16. This regards how mips16 is viewed. It's not really a target type but there has always been a target for it in the td files. It's more properly -mcpu=mips32 -mattr=+mips16 . This is how clang treats it but we have always had the -mcpu=mips16 which I probably should delete now but it will require updating all the .ll test cases for mips16. In this case it changed how we decide if we have a count bits instruction and whether instruction lowering should then expand ctlz. Now that we have dual mode compilation, -mattr=+mips16 really just indicates the inital processor mode that we are compiling for. (It is also possible to have -mcpu=64 -mattr=+mips16 but as far as I know, nobody has even built such a processor, though there is an architecture manual for this). llvm-svn: 188586	2013-08-16 23:05:18 +00:00
Daniel Dunbar	e491451730	[tests] Another attempt to workaround broken misched-copy.s test on some buildbots. llvm-svn: 188567	2013-08-16 18:01:18 +00:00
Michel Danzer	65d5ad5728	R600/SI: Add pattern for xor of i1 Fixes two recent piglit regressions with radeonsi. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188559	2013-08-16 16:19:31 +00:00
Michel Danzer	acc130ec54	R600/SI: Fix broken encoding of DS_WRITE_B32 The logic in SIInsertWaits::getHwCounts() only really made sense for SMRD instructions, and trying to shoehorn it into handling DS_WRITE_B32 caused it to corrupt the encoding of that by clobbering the first operand with the second one. Undo that damage and only apply the SMRD logic to that. Fixes some derivates related piglit regressions with radeonsi. Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 188558	2013-08-16 16:19:24 +00:00
Benjamin Kramer	700f0ccf14	When initializing the PIC global base register on ARM/ELF add pc to fix the address. This unbreaks PIC with fast isel on ELF targets (PR16717). The output matches what GCC and SDag do for PIC but may not cover all of the many flavors of PIC that exist. llvm-svn: 188551	2013-08-16 12:52:08 +00:00
Richard Sandiford	06a13f49c8	[SystemZ] Use SRST to implement strlen and strnlen It would also make sense to use it for memchr; I'm working on that now. llvm-svn: 188547	2013-08-16 11:41:43 +00:00
Richard Sandiford	93a75a2a56	[SystemZ] Use MVST to implement strcpy and stpcpy llvm-svn: 188546	2013-08-16 11:29:37 +00:00
Richard Sandiford	353c7bc810	[SystemZ] Use CLST to implement strcmp llvm-svn: 188544	2013-08-16 11:21:54 +00:00
Richard Sandiford	159b694b6e	[SystemZ] Fix handling of 64-bit memcmp results Generalize r188163 to cope with return types other than MVT::i32, just as the existing visitMemCmpCall code did. I've split this out into a subroutine so that it can be used for other upcoming patches. I also noticed that I'd used the wrong API to record the out chain. It's a load that uses DAG.getRoot() rather than getRoot(), so the out chain should go on PendingLoads. I don't have a testcase for that because we don't do any interesting scheduling on z yet. llvm-svn: 188540	2013-08-16 10:55:47 +00:00
Richard Sandiford	7d2dfd7cf5	[SystemZ] Fix sign of integer memcmp result r188163 used CLC to implement memcmp. Code that compares the result directly against zero can test the CC value produced by CLC, but code that needs an integer result must use IPM. The sequence I'd used was: ipm <reg> sll <reg>, 2 sra <reg>, 30 but I'd forgotten that this inverts the order, so that CC==1 ("less") becomes an integer greater than zero, and CC==2 ("greater") becomes an integer less than zero. This sequence should only be used if the CLC arguments are reversed to compensate. The problem then is that the branch condition must also be reversed when testing the CLC result directly. Rather than do that, I went for a different sequence that works with the natural CLC order: ipm <reg> srl <reg>, 28 rll <reg>, <reg>, 31 One advantage of this is that it doesn't clobber CC. A disadvantage is that any sign extension to 64 bits must be done separately, rather than being folded into the shifts. llvm-svn: 188538	2013-08-16 10:22:54 +00:00
Craig Topper	79189e25c8	Don't use v16i32 for load pattern matching. All 512-bit loads are cated to v8i64. llvm-svn: 188534	2013-08-16 06:07:34 +00:00
Daniel Dunbar	13fbfa523f	[tests] Add a hack to eliminate some dangling .s files on buildbots. - Benjamin fixed the emission of this file in r179937, but it still lives on a few buildbots. We should probably clean up the build dirs once in a while, eh? llvm-svn: 188527	2013-08-16 02:54:00 +00:00
Daniel Dunbar	333625bd77	[tests] Remove an out-dated failing test. llvm-svn: 188526	2013-08-16 02:53:29 +00:00
Tom Stellard	77968acef1	Revert "R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions" This reverts commit a6a39ced095c2f453624ce62c4aead25db41a18f. This is the wrong version of this fix. llvm-svn: 188523	2013-08-16 01:18:43 +00:00
Tom Stellard	25dbdabc12	R600/SI: Fix incorrect encoding of DS_WRITE_B32 instructions The SIInsertWaits pass was overwriting the first operand (gds bit) of DS_WRITE_B32 with the second operand (value to write). This meant that any time the value to write was stored in an odd number VGPR, the gds bit would be set causing the instruction to write to GDS instead of LDS. llvm-svn: 188522	2013-08-16 01:12:20 +00:00
Tom Stellard	284558892e	R600: Add support for global vector loads with element types less than 32-bits Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188521	2013-08-16 01:12:16 +00:00
Tom Stellard	c42a38e3ad	R600: Add support for global vector stores with elements less than 32-bits Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188520	2013-08-16 01:12:11 +00:00
Tom Stellard	8d9a460dad	R600: Add support for i16 and i8 global stores Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188519	2013-08-16 01:12:06 +00:00
Tom Stellard	f0f0f6e071	R600: Add support for v4i32 stores on Cayman Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188518	2013-08-16 01:12:00 +00:00
Tom Stellard	9da87c6553	R600: Enable folding of inline literals into REQ_SEQUENCE instructions Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188517	2013-08-16 01:11:55 +00:00
Tom Stellard	8061257aaf	R600: Change the RAT instruction assembly names so they match the docs Tested-by: Aaron Watry <awatry@gmail.com> llvm-svn: 188515	2013-08-16 01:11:46 +00:00
Daniel Dunbar	a496d61c01	[tests] Cleanup initialization of test suffixes. - Instead of setting the suffixes in a bunch of places, just set one master list in the top-level config. We now only modify the suffix list in a few suites that have one particular unique suffix (.ml, .mc, .yaml, .td, .py). - Aside from removing the need for a bunch of lit.local.cfg files, this enables 4 tests that were inadvertently being skipped (one in Transforms/BranchFolding, a .s file each in DebugInfo/AArch64 and CodeGen/PowerPC, and one in CodeGen/SI which is now failing and has been XFAILED). - This commit also fixes a bunch of config files to use config.root instead of older copy-pasted code. llvm-svn: 188513	2013-08-16 00:37:11 +00:00
Jack Carter	2c2f78cead	[Mips][msa] Added the simple builtins (madd_q to xori) Includes: madd_q, maddr_q, maddv, max_[asu], maxi_[su], min_[asu], mini_[su], mod_[su], msub_q, msubr_q, msubv, mul_q, mulr_q, mulv, nloc, nlzc, nori, ori, pckev, pckod, pcnt, sat_[su], shf, sld, sldi, sll, slli, splat, splati, sr[al], sr[al]i, subs_[su], subss_u, subus_s, subv, subvi, vshf, xori Patch by Daniel Sanders llvm-svn: 188460	2013-08-15 14:22:07 +00:00
Jack Carter	8798c3bae2	[Mips][msa] Added the simple builtins (fadd to ftq) Includes: fadd, fceq, fcg[et], fclass, fcl[et], fcne, fcun, fdiv, fexdo, fexp2, fexup[lr], ffint_[su], ffql, ffqr, fill, flog2, fmadd, fmax, fmax_a, fmin, fmin_a, fmsub, fmul, frint, frcp, frsqrt, fseq, fsge, fsgt, fsle, fslt, fsne, fsqr, fsub, ftint_s, ftq Patch by Daniel Sanders llvm-svn: 188458	2013-08-15 13:45:36 +00:00
Jack Carter	80890657b3	[Mips][msa] Added the simple builtins (add_a to dpsub[su], ilvev to ldi) Includes: add_a, adds_[asu], addv, addvi, andi.b, asub_[su].[bhwd], aver?_[su]_[bhwd], bclr, bclri, bins[lr], bins[lr]i, bmnzi, bmzi, bneg, bnegi, bseli, bset, bseti, c(eq\|ne), c(eq\|ne)i, cl[et]_[su], cl[et]i_[su], copy_[su].[bhw], div_[su], dotp_[su], dpadd_[su], dpsub_[su], ilvev, ilvl, ilvod, ilvr, insv, insve, ldi Patch by Daniel Sanders llvm-svn: 188457	2013-08-15 12:24:57 +00:00
Craig Topper	1c614b247d	Revert r188449 as it turns out we're just missing the instructions that need the v16i32/v16f32 matching. llvm-svn: 188454	2013-08-15 08:38:25 +00:00
Hao Liu	ad6d3a3db7	Clang and AArch64 backend patches to support shll/shl and vmovl instructions and ACLE functions llvm-svn: 188451	2013-08-15 08:26:11 +00:00
Craig Topper	b1acbb9cab	Don't let isPermImmMask handle v16i32 since VPERMI doesn't match on that type. Remove 128-bit vector handling from isPermImmMask too, it's covered by isPSHUFDMask. llvm-svn: 188449	2013-08-15 07:30:51 +00:00
Tom Stellard	0f3c885b1a	R600/SI: Improve legalization of vector operations This should fix hangs in the OpenCL piglit tests. llvm-svn: 188431	2013-08-14 23:25:00 +00:00
Tom Stellard	20e208af7d	R600/SI: Replace v1i32 type with i32 in imageload and sample intrinsics llvm-svn: 188430	2013-08-14 23:24:53 +00:00
Tom Stellard	d7b0828247	R600/SI: Convert v16i8 resource descriptors to i128 Now that compute support is better on SI, we can't continue using v16i8 for descriptors since this is also a legal type in OpenCL. This patch fixes numerous hangs with the piglit OpenCL test and since we now use a target specific DAG node for LOAD_CONSTANT with the correct MemOperandFlags, this should also fix: https://bugs.freedesktop.org/show_bug.cgi?id=66805 llvm-svn: 188429	2013-08-14 23:24:45 +00:00
Tom Stellard	257882f15c	R600/SI: Use i8 types for resource descriptors in tests We switched from i32 to i8 types a while ago and the tests were never updated. llvm-svn: 188428	2013-08-14 23:24:37 +00:00
Tom Stellard	649e8ff0ee	R600/SI: Lower BUILD_VECTOR to REG_SEQUENCE v2 Using REG_SEQUENCE for BUILD_VECTOR rather than a series of INSERT_SUBREG instructions should make it easier for the register allocator to coalasce unnecessary copies. v2: - Use an SGPR register class if all the operands of BUILD_VECTOR are SGPRs. llvm-svn: 188427	2013-08-14 23:24:32 +00:00
Tom Stellard	599374cf06	R600/SI: Assign a register class to the $vaddr operand for MIMG instructions The previous code declared the operand as unknown:$vaddr, which made it possible for scalar registers to be used instead of vector registers. llvm-svn: 188425	2013-08-14 23:24:17 +00:00
Tom Stellard	6287532c80	R600/SI: Handle MSAA texture targets Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188421	2013-08-14 22:22:14 +00:00
Tom Stellard	5d16e4f78e	R600/SI: Allow conversion between v32i8 and v8i32 Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188420	2013-08-14 22:22:09 +00:00
Tom Stellard	9e2fd5271f	R600/SI: Add pattern for fp_to_uint This fixes the F2U opcode for the Mesa driver. Patch by: Marek Olšák Signed-off-by: Marek Olšák <marek.olsak@amd.com> llvm-svn: 188418	2013-08-14 22:21:57 +00:00
Hal Finkel	a89d228510	Actually fix PPC64 64-bit GPR inline asm constraint matching This is a follow-up to r187693, correcting that code to request the correct register class. The previous version, with the wrong register class, was not really correcting the constraints, but rather was removing them. Coincidentally, this fixed the failing test case in r187693, but obviously created other problems. llvm-svn: 188407	2013-08-14 20:05:04 +00:00
Renato Golin	5068d4579d	Let t2LDRBi8 and t2LDRBi12 have same Base Pointer When determining if two different loads are from the same base address, this patch allows one load to use a t2LDRi8 address mode and another to use a t2LDRi12 address mode. The current implementation is very conservative and this allows the case of differing Thumb2 byte loads to be considered. Allowing these differing modes instead of forcing the exact same opcode is useful for situations where one opcodes loads from a base address+1 and a second opcode loads for a base address-1. Patch by Daniel Stewart. llvm-svn: 188385	2013-08-14 16:35:29 +00:00
NAKAMURA Takumi	98545843bd	llvm/test/CodeGen/X86/setcc-sentinals.ll: Relax expressions for x86_64-win32. llvm-svn: 188340	2013-08-14 00:46:00 +00:00
Akira Hatanaka	ff296075bc	[mips] Properly parse registers that appear in inline-asm constraints. llvm-svn: 188336	2013-08-14 00:21:25 +00:00
Jim Grosbach	a6bd8c2220	DAG: Combine (and (setne X, 0), (setne X, -1)) -> (setuge (add X, 1), 2) A common idiom is to use zero and all-ones as sentinal values and to check for both in a single conditional ("x != 0 && x != (unsigned)-1"). That generates code, for i32, like: testl %edi, %edi setne %al cmpl $-1, %edi setne %cl andb %al, %cl With this transform, we generate the simpler: incl %edi cmpl $1, %edi seta %al Similar improvements for other integer sizes and on other platforms. In general, combining the two setcc instructions into one is better. rdar://14689217 llvm-svn: 188315	2013-08-13 21:30:58 +00:00
Elena Demikhovsky	42b33ee116	AVX-512: Added CMP and BLEND instructions. Lowering for SETCC. llvm-svn: 188265	2013-08-13 13:24:07 +00:00
Tom Stellard	9f798c877d	R600: Set scheduling preference to Sched::Source R600 doesn't need to do any scheduling on the SelectionDAG now that it has a very good MachineScheduler. Also, using the VLIW SelectionDAG scheduler was having a major impact on compile times. For example with the phatk kernel here are the LLVM IR to machine code compile times: With Sched::VLIW Total Compile Time: 1.4890 Seconds (User + System) SelectionDAG Instruction Scheduling: 1.1670 Seconds (User + System) With Sched::Source Total Compile Time: 0.3330 Seconds (User + System) SelectionDAG Instruction Scheduling: 0.0070 Seconds (User + System) The code ouput was identical with both schedulers. This may not be true for all programs, but it gives me confidence that there won't be much reduction, if any, in code quality by using Sched::Source. llvm-svn: 188215	2013-08-12 22:33:21 +00:00
Tim Northover	67f2cc8ebf	Fix FileCheck --check-prefix lines. Various tests had sprung up over the years which had --check-prefix=ABC on the RUN line, but "CHECK-ABC:" later on. This happened to work before, but was strictly incorrect. FileCheck is getting stricter soon though. Patch by Ron Ofir. llvm-svn: 188173	2013-08-12 12:43:26 +00:00
Richard Sandiford	4980a32ba3	[SystemZ] Use CLC and IPM to implement memcmp For now this is restricted to fixed-length comparisons with a length in the range [1, 256], as for memcpy() and MVC. llvm-svn: 188163	2013-08-12 10:28:10 +00:00
Tim Northover	2497b9b9ba	Allow compatible extension attributes for tail calls If the tail-callee and caller give the same bits via the same signext/zeroext attribute then a tail-call should be allowed, since the extension has already been done by the callee. llvm-svn: 188159	2013-08-12 09:45:46 +00:00
Reed Kotler	31da848d63	Don't generate floating point stubs for mips16 code if the function is actually an instrinsic that will not occur in libc. This list here is not exhaustive but fixes the one places in test-suite where this occurs. I have filed a bug against myself to research the full list and add them to the array of such cases. In the future, actual stub generation will occur in a later phase and we won't need this code because we will know at that time during the compilation that in fact no helper function was even needed. llvm-svn: 188149	2013-08-11 21:30:27 +00:00
Elena Demikhovsky	afcde02b68	AVX-512: Added more tests for BROADCAST llvm-svn: 188148	2013-08-11 12:29:16 +00:00
Elena Demikhovsky	66a9e4f863	AVX-512: Added VPERM* instructons and MOV* zmm-to-zmm instructions. Added a test for shuffles using VPERM. llvm-svn: 188147	2013-08-11 07:55:09 +00:00
Niels Ole Salscheider	c023cc4f85	R600/SI: FMA is faster than fmul and fadd for f64 llvm-svn: 188136	2013-08-10 10:38:54 +00:00
Niels Ole Salscheider	fc24d0a6e6	R600/SI: Add FMA pattern llvm-svn: 188135	2013-08-10 10:38:47 +00:00
Reed Kotler	d4cb39c73a	Add another intrinsic that LLVM gives an incorrect prototype to. I need to go through all the runtime routine list and see if there are any more I need to add for mips16 floating point. Prototypes must be correct or else I don't know to add a helper function call. llvm-svn: 188106	2013-08-09 21:33:41 +00:00
Michael Gottesman	9aac3bd709	[stackprotector] Simplify SP Pass so that we emit different fail basic blocks for each fail condition. This patch decouples the stack protector pass so that we can support stack protector implementations that do not use the IR level generated stack protector fail basic block. No codesize increase is caused by this change since the MI level tail merge pass properly merges together the fail condition blocks (see the updated test). llvm-svn: 188105	2013-08-09 21:26:18 +00:00
Stephen Lin	ec70f360f9	CHECK-LABEL-ify tests llvm-svn: 188087	2013-08-09 17:50:15 +00:00
Craig Topper	ae74eb18d7	Add missing 'v' prefix in front of palignr on one of checks. llvm-svn: 188054	2013-08-09 05:41:12 +00:00
Hal Finkel	2dc47cddf0	Set ISD::FROUND to Expand by default for all types For most libm ISD nodes, TargetLoweringBase::initActions sets the default scalar-type action to Expand, and leaves the vector-type action default as Legal. This is not appropriate for the new ISD::FROUND node (which no backend but PowerPC handles explicitly). Fixes PR16842. llvm-svn: 188048	2013-08-09 04:13:44 +00:00
Arnold Schwaighofer	ddea7f3974	Revert "Reapply r185872 now that the address sanitizer has been changed to support this." This reverts commit r187939. It broke an O0 build of a spec benchmark. llvm-svn: 188012	2013-08-08 21:04:16 +00:00
David Fang	772a101ff0	initial draft of PPCMachObjectWriter.cpp this records relocation entries in the mach-o object file for PIC code generation. tested on powerpc-darwin8, validated against darwin otool -rvV llvm-svn: 188004	2013-08-08 20:14:40 +00:00
Niels Ole Salscheider	20c4077bf5	R600/SI: Implement fp32<->fp64 conversions llvm-svn: 187988	2013-08-08 16:06:15 +00:00
Niels Ole Salscheider	74ee40da2a	R600/SI: Implement sint<->fp64 conversions llvm-svn: 187987	2013-08-08 16:06:08 +00:00
Andrea Di Biagio	36954f2af6	test commit. llvm-svn: 187974	2013-08-08 10:46:36 +00:00
Eric Christopher	e6639b8535	Make sure that if we're going to attempt to add a type to a DIE that the type exists. Fix up cases where we weren't checking for optional types and add an assert to addType to make sure we catch this in the future. Fix up a testcase that was using the tag for DW_TAG_array_type when it meant DW_TAG_enumeration_type. llvm-svn: 187963	2013-08-08 07:40:37 +00:00
Hal Finkel	e76170ce53	PPC: Map frin to round() not nearbyint() and rint() Making use of the recently-added ISD::FROUND, which allows for custom lowering of round(), the PPC backend will now map frin to round(). Previously, we had been using frin to lower nearbyint() (and rint() via some custom lowering to handle the extra fenv flags requirements), but only in fast-math mode because frin does not tie-to-even. Several users had complained about this behavior, and this new mapping of frin to round is certainly more appropriate (and does not require fast-math mode). In effect, this reverts r178362 (and part of r178337, replacing the nearbyint mapping with the round mapping). llvm-svn: 187960	2013-08-08 04:31:34 +00:00
Bill Wendling	169723f925	Reapply r185872 now that the address sanitizer has been changed to support this. Original commit message: Stop emitting weak symbols into the "coal" sections. The Mach-O linker has been able to support the weak-def bit on any symbol for quite a while now. The compiler however continued to place these symbols into a "coal" section, which required the linker to map them back to the base section name. Replace the sections like this: __TEXT/__textcoal_nt instead use __TEXT/__text __TEXT/__const_coal instead use __TEXT/__const __DATA/__datacoal_nt instead use __DATA/__data <rdar://problem/14265330> llvm-svn: 187939	2013-08-07 23:42:09 +00:00
Elena Demikhovsky	ae2624a373	AVX-512 set: Added BROADCAST instructions with lowering logic and a test. llvm-svn: 187884	2013-08-07 12:34:55 +00:00
Richard Sandiford	b6323e0b21	[SystemZ] Optimize floating-point comparisons with zero This follows the same lines as the integer code. In the end it seemed easier to have a second 4-bit mask in TSFlags to specify the compare-like CC values. That eats one more TSFlags bit than adding a CCHasUnordered would have done, but it feels more concise. llvm-svn: 187883	2013-08-07 11:10:06 +00:00
Richard Sandiford	5960348422	[SystemZ] Add floating-point load-and-test instructions These instructions can also be used as comparisons with zero. llvm-svn: 187882	2013-08-07 11:03:34 +00:00
Reed Kotler	30cf33a57e	Create a pattern for the "trap" instruction. llvm-svn: 187863	2013-08-07 04:00:26 +00:00
Tom Stellard	3b9645302a	R600/SI: Use VSrc_* register classes as the default classes for types Since the VSrc_* register classes contain both VGPRs and SGPRs, copies that used be emitted by isel like this: SGPR = COPY VGPR Will now be emitted like this: VSrC = COPY VGPR This patch also adds a pass that tries to identify and fix situations where a VGPR to SGPR copy may occur. Hopefully, these changes will make it impossible for the compiler to generate illegal VGPR to SGPR copies. llvm-svn: 187831	2013-08-06 23:08:28 +00:00
Tom Stellard	eab7c786d4	R600/SI: Add more special cases for opcodes to ensureSRegLimit() Also factor out the register class lookup to its own function. llvm-svn: 187830	2013-08-06 23:08:18 +00:00
Manman Ren	50def296e2	Debug Info Finder\|Verifier: handle DbgLoc attached to instructions. Also remove checking of llvm.dbg.sp since it is not used in generating dwarf. Current state of Finder: DebugInfoFinder tries to list all debug info MDNodes used in a module. To list debug info MDNodes used by an instruction, DebugInfoFinder provides processDeclare, processValue and processLocation to handle DbgDeclareInst, DbgValueInst and DbgLoc attached to instructions. processModule will go through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes used by the CUs. TODO: 1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We need to add a list of variables that are used by DbgDeclareInst and DbgValueInst. 2> MDString fields should be null or isa<MDString> and MDNode fields should be null or isa<MDNode>. We currently use empty string or int 0 to represent null. 3> Go though Verify functions and make sure that they check field types. 4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each testing case has a llvm.dbg.cu. Re-apply r187609 with fix to pass ocaml binding. vmcore.ml generates a debug location with scope being metadata !{}, in verifier we treat this as a null scope. llvm-svn: 187812	2013-08-06 19:38:43 +00:00
Hal Finkel	71d37e18da	Add PPC64 mulli pattern The PPC backend had been missing a pattern to generate mulli for 64-bit multiples. We had been generating it only for 32-bit multiplies. Unfortunately, generating li + mulld unnecessarily increases register pressure. llvm-svn: 187807	2013-08-06 17:03:03 +00:00
Justin Holewinski	2fc234bf3f	[NVPTX] Add missing patterns for i1 [s,u]int_to_fp llvm-svn: 187800	2013-08-06 14:13:34 +00:00
Justin Holewinski	06563fec33	[NVPTX] Fix bug in stack code generation causes by MC conversion We do use a very small set of physical registers, so account for them in the virtual register encoding between MachineInstr and MC llvm-svn: 187799	2013-08-06 14:13:31 +00:00
Justin Holewinski	70fde80969	[NVPTX] Start conversion to MC infrastructure This change converts the NVPTX target to use the MC infrastructure instead of directly emitting MachineInstr instances. This brings the target more up-to-date with LLVM TOT, and should fix PR15175 and PR15958 (libNVPTXInstPrinter is empty) as a side-effect. llvm-svn: 187798	2013-08-06 14:13:27 +00:00
Tim Northover	d79219981f	ARM: implement allowTruncateForTailCall Now that it's in place, it seems silly not to let ARM make use of the extra tail call opportunities. llvm-svn: 187795	2013-08-06 13:58:03 +00:00
Tim Northover	29e73e0f55	Refactor isInTailCallPosition handling This change came about primarily because of two issues in the existing code. Niether of: define i64 @test1(i64 %val) { %in = trunc i64 %val to i32 tail call i32 @ret32(i32 returned %in) ret i64 %val } define i64 @test2(i64 %val) { tail call i32 @ret32(i32 returned undef) ret i32 42 } should be tail calls, and the function sameNoopInput is responsible. The main problem is that it is completely symmetric in the "tail call" and "ret" value, but in reality different things are allowed on each side. For these cases: 1. Any truncation should lead to a larger value being generated by "tail call" than needed by "ret". 2. Undef should only be allowed as a source for ret, not as a result of the call. Along the way I noticed that a mismatch between what this function treats as a valid truncation and what the backends see can lead to invalid calls as well (see x86-32 test case). This patch refactors the code so that instead of being based primarily on values which it recurses into when necessary, it starts by inspecting the type and considers each fundamental slot that the backend will see in turn. For example, given a pathological function that returned {{}, {{}, i32, {}}, i32} we would consider each "real" i32 in turn, and ask if it passes through unchanged. This is much closer to what the backend sees as a result of ComputeValueVTs. Aside from the bug fixes, this eliminates the recursion that's going on and, I believe, makes the bulk of the code significantly easier to understand. The trade-off is the nasty iterators needed to find the real types inside a returned value. llvm-svn: 187787	2013-08-06 09:12:35 +00:00
Tom Stellard	e4e3be6f50	Factor FlattenCFG out from SimplifyCFG Patch by: Mei Ye llvm-svn: 187764	2013-08-06 02:43:45 +00:00
Tom Stellard	f94818ae61	R600/SI: Add missing test for r187749 llvm-svn: 187754	2013-08-05 22:45:56 +00:00
Richard Sandiford	39f379d037	[SystemZ] Use BRCT and BRCTG to eliminate add-&-compare sequences This patch just uses a peephole test for "add; compare; branch" sequences within a single block. The IR optimizers already convert loops to decrement-and-branch-on-nonzero form in some cases, so even this simplistic test triggers many times during a clang bootstrap and projects/test-suite run. It looks like there are still cases where we need to more strongly prefer branches on nonzero though. E.g. I saw a case where a loop that started out with a check for 0 ended up with a check for -1. I'll try to look at that sometime. I ended up adding the Reference class because MachineInstr::readsRegister() doesn't check for subregisters (by design, as far as I could tell). llvm-svn: 187723	2013-08-05 11:23:46 +00:00
Richard Sandiford	eefa00392f	[SystemZ] Use LOAD AND TEST to eliminate comparisons against zero llvm-svn: 187720	2013-08-05 11:03:20 +00:00
Elena Demikhovsky	cb3f9da2e3	AVX-512 set: added mask operations, lowering BUILD_VECTOR for i1 vector types. Added intrinsics and tests. llvm-svn: 187717	2013-08-05 08:52:21 +00:00
Reed Kotler	d5b7892552	Add the saving of S2. This is needed for some of the floating point helper functions. This can be optimized out later when the remaining parts of the helper function work is moved into the Mips16HardFloat pass. For now it forces us to use the 32 bit save/restore instructions instead of the 16 bit ones. llvm-svn: 187712	2013-08-04 23:56:53 +00:00
Benjamin Kramer	c63386d01a	X86: Turn fp selects into mask operations. double test(double a, double b, double c, double d) { return a<b ? c : d; } before: _test: ucomisd %xmm0, %xmm1 ja LBB0_2 movaps %xmm3, %xmm2 LBB0_2: movaps %xmm2, %xmm0 after: _test: cmpltsd %xmm1, %xmm0 andpd %xmm0, %xmm2 andnpd %xmm3, %xmm0 orpd %xmm2, %xmm0 Small speedup on Benchmarks/SmallPT llvm-svn: 187706	2013-08-04 12:05:16 +00:00
Elena Demikhovsky	2f33e9fa89	AVX-512 set: added VEXTRACTPS instruction llvm-svn: 187705	2013-08-04 10:46:07 +00:00
Tim Northover	da32ed4814	X86: specify CPU on new test to fix atom buildbot Apparently Atoms use lea for stack adjustment, which we weren't looking for. llvm-svn: 187704	2013-08-04 10:00:45 +00:00
Tim Northover	d7e748d087	X86: correct tail return address calculation Due to the weird and wondeful usual arithmetic conversions, some calculations involving negative values were getting performed in uint32_t and then promoted to int64_t, which is really not a good idea. Patch by Katsuhiro Ueno. llvm-svn: 187703	2013-08-04 09:35:57 +00:00
Reed Kotler	338c130a3e	Clean up code for Mips16 large frame handling. llvm-svn: 187701	2013-08-04 01:13:25 +00:00
Hal Finkel	f91cfcdaed	Fix PPC64 64-bit GPR inline asm constraint matching Internally, the PowerPC backend names the 32-bit GPRs R[0-9]+, and names the 64-bit parent GPRs X[0-9]+. When matching inline assembly constraints with explicit register names, on PPC64 when an i64 MVT has been requested, we need to follow gcc's convention of using r[0-9]+ to refer to the 64-bit (parent) registers. At some point, we'll probably want to arrange things so that the generic code in TargetLowering uses the AsmName fields declared in *RegisterInfo.td in order to match these inline asm register constraints. If we do that, this change can be reverted. llvm-svn: 187693	2013-08-03 12:25:10 +00:00
Akira Hatanaka	9ecf735bdd	[mips] Expand vector truncating stores and extending loads. llvm-svn: 187667	2013-08-02 19:23:33 +00:00
Eric Christopher	973b3bf7ae	Temporarily revert "Debug Info Finder\|Verifier: handle DbgLoc attached to instructions." in an attempt to bring back some bots. This reverts commit r187609. llvm-svn: 187638	2013-08-02 00:49:44 +00:00
Bill Wendling	e7b7059f1d	Use function attributes to indicate that we don't want to realign the stack. Function attributes are the future! So just query whether we want to realign the stack directly from the function instead of through a random target options structure. llvm-svn: 187618	2013-08-01 21:42:05 +00:00
Reed Kotler	e5ac0862d0	Fix some issues with Mips16 floating when certain intrinsics are present. This is actually an LLVM bug in the way it generates signatures for these when soft float is enabled. For example, floor ends up having the signature of int64(int64). The signature part is not the same as where the actual parameter types are recorded, and those ARE of course int64(int64) when soft float is enabled. (Yes, Mips16 hard float uses soft float but with different runtime rounes but then has to interoperate with Mips32 using normal floating point). This logic will eventually be moved to the Mips16HardFloat pass so it's not worth sorting out these issues in LLVM since nobody but Mips16 cares about these signatures, as far as I know, and even I won't eventually either. llvm-svn: 187613	2013-08-01 21:17:53 +00:00
Manman Ren	dd35d4fb94	Debug Info Finder\|Verifier: handle DbgLoc attached to instructions. Also remove checking of llvm.dbg.sp since it is not used in generating dwarf. Current state of Finder: DebugInfoFinder tries to list all debug info MDNodes used in a module. To list debug info MDNodes used by an instruction, DebugInfoFinder provides processDeclare, processValue and processLocation to handle DbgDeclareInst, DbgValueInst and DbgLoc attached to instructions. processModule will go through all DICompileUnits in llvm.dbg.cu and list debug info MDNodes used by the CUs. TODO: 1> Finder has a list of CUs, SPs, Types, Scopes and global variables. We need to add a list of variables that are used by DbgDeclareInst and DbgValueInst. 2> MDString fields should be null or isa<MDString> and MDNode fields should be null or isa<MDNode>. We currently use empty string or int 0 to represent null. 3> Go though Verify functions and make sure that they check field types. 4> Clean up existing testing cases to remove llvm.dbg.sp and make sure each testing case has a llvm.dbg.cu. llvm-svn: 187609	2013-08-01 20:52:39 +00:00
Tom Stellard	a515fb7c17	R600: Add 64-bit float load/store support * Added R600_Reg64 class * Added T#Index#.XY registers definition * Added v2i32 register reads from parameter and global space * Added f32 and i32 elements extraction from v2f32 and v2i32 * Added v2i32 -> v2f32 conversions Tom Stellard: - Mark vec2 operations as expand. The addition of a vec2 register class made them all legal. Patch by: Dmitry Cherkassov Signed-off-by: Dmitry Cherkassov <dcherkassov@gmail.com> llvm-svn: 187582	2013-08-01 15:23:42 +00:00
Tom Stellard	67b2cf4e87	R600: Use 64-bit alignment for 64-bit kernel arguments llvm-svn: 187581	2013-08-01 15:23:31 +00:00
Tom Stellard	f34661790c	R600/SI: Custom lower i64 ZERO_EXTEND llvm-svn: 187580	2013-08-01 15:23:26 +00:00
Richard Sandiford	9b9d87ef99	[SystemZ] Reuse CC results for integer comparisons with zero This also fixes a bug in the predication of LR to LOCR: I'd forgotten that with these in-place instruction builds, the implicit operands need to be added manually. I think this was latent until now, but is tested by int-cmp-45.c. It also adds a CC valid mask to STOC, again tested by int-cmp-45.c. llvm-svn: 187573	2013-08-01 10:39:40 +00:00
Richard Sandiford	6d6df38281	[SystemZ] Prefer comparisons with zero Convert >= 1 to > 0, etc. Using comparison with zero isn't a win on its own, but it exposes more opportunities for CC reuse (the next patch). llvm-svn: 187571	2013-08-01 10:29:45 +00:00
Tim Northover	dbac87d1fc	AArch64: add initial NEON support Patch by Ana Pazos. - Completed implementation of instruction formats: AdvSIMD three same AdvSIMD modified immediate AdvSIMD scalar pairwise - Completed implementation of instruction classes (some of the instructions in these classes belong to yet unfinished instruction formats): Vector Arithmetic Vector Immediate Vector Pairwise Arithmetic - Initial implementation of instruction formats: AdvSIMD scalar two-reg misc AdvSIMD scalar three same - Intial implementation of instruction class: Scalar Arithmetic - Initial clang changes to support arm v8 intrinsics. Note: no clang changes for scalar intrinsics function name mangling yet. - Comprehensive test cases for added instructions To verify auto codegen, encoding, decoding, diagnosis, intrinsics. llvm-svn: 187567	2013-08-01 09:20:35 +00:00
Robert Lytton	6063ad29ad	XCore target: Fix Vararg handling llvm-svn: 187565	2013-08-01 08:29:44 +00:00
Robert Lytton	e227132743	XCore target: Add byval handling llvm-svn: 187563	2013-08-01 08:18:55 +00:00
Robert Lytton	e1f5a5cc36	Xcore target Fix emitArrayBound() calling OutStreamer.Emit*() multiple times when trying to print a single line llvm-svn: 187562	2013-08-01 07:52:05 +00:00
Reed Kotler	9ef1a8ca7e	Fix some misc. issues with Mips16 fp stubs. 1) They should never be inlined. 2) A naming inconsistency with gcc mips16 3) Stubs should not have the global attribute llvm-svn: 187555	2013-08-01 02:26:31 +00:00
Tom Stellard	c6c9cd5b09	Revert "R600: Non vector only instruction can be scheduled on trans unit" This reverts commit 98ce62780ea7185ba710868bf83c8077e8d7f6d6. llvm-svn: 187526	2013-07-31 20:43:27 +00:00
Vincent Lejeune	5847584207	R600: Avoid more than 4 literals in the same instruction group at scheduling llvm-svn: 187515	2013-07-31 19:32:07 +00:00
Vincent Lejeune	2100f94811	R600: Non vector only instruction can be scheduled on trans unit llvm-svn: 187514	2013-07-31 19:31:56 +00:00
Richard Sandiford	5a382b8c6f	[SystemZ] Implement isLegalAddressingMode() The loop optimizers were assuming that scales > 1 were OK. I think this is actually a bug in TargetLoweringBase::isLegalAddressingMode(), since it seems to be trying to reject anything that isn't r+i or r+r, but it has no default case for scales other than 0, 1 or 2. Implementing the hook for z means that z can no longer test any change there though. llvm-svn: 187497	2013-07-31 12:58:26 +00:00
Richard Sandiford	987e271aaa	[SystemZ] Be more careful about inverting CC masks (conditional loads) Extend r187495 to conditional loads. I split this out because the easiest way seemed to be to force a particular operand order in SystemZISelDAGToDAG.cpp. llvm-svn: 187496	2013-07-31 12:38:08 +00:00
Richard Sandiford	b3ecd3b03e	[SystemZ] Be more careful about inverting CC masks System z branches have a mask to select which of the 4 CC values should cause the branch to be taken. We can invert a branch by inverting the mask. However, not all instructions can produce all 4 CC values, so inverting the branch like this can lead to some oddities. For example, integer comparisons only produce a CC of 0 (equal), 1 (less) or 2 (greater). If an integer EQ is reversed to NE before instruction selection, the branch will test for 1 or 2. If instead the branch is reversed after instruction selection (by inverting the mask), it will test for 1, 2 or 3. Both are correct, but the second isn't really canonical. This patch therefore keeps track of which CC values are possible and uses this when inverting a mask. Although this is mostly cosmestic, it fixes undefined behavior for the CIJNLH in branch-08.ll. Another fix would have been to mask out bit 0 when generating the fused compare and branch, but the point of this patch is that we shouldn't need to do that in the first place. The patch also makes it easier to reuse CC results from other instructions. llvm-svn: 187495	2013-07-31 12:30:20 +00:00
Richard Sandiford	7320349682	[SystemZ] Move compare-and-branch generation even later r187116 moved compare-and-branch generation from the instruction-selection pass to the peephole optimizer (via optimizeCompare). It turns out that even this is a bit too early. Fused compare-and-branch instructions don't interact well with predication, where a CC result is needed. They also make it harder to reuse the CC side-effects of earlier instructions (not yet implemented, but the subject of a later patch). Another problem was that the AnalyzeBranch family of routines weren't handling compares and branches, so we weren't able to reverse the fused form in cases where we would reverse a separate branch. This could have been fixed by extending AnalyzeBranch, but given the other problems, I've instead moved the fusing to the long-branch pass, which is also responsible for the opposite transformation: splitting out-of-range compares and branches into separate compares and long branches. I've added a test for the AnalyzeBranch problem. A test for the predication problem is included in the next patch, which fixes a bug in the choice of CC mask. llvm-svn: 187494	2013-07-31 12:11:07 +00:00
Richard Sandiford	32c979f9e1	[SystemZ] Postpone NI->RISBG conversion to convertToThreeAddress() r186399 aggressively used the RISBG instruction for immediate ANDs, both because it can handle some values that AND IMMEDIATE can't, and because it allows the destination register to be different from the source. I realized later while implementing the distinct-ops support that it would be better to leave the choice up to convertToThreeAddress() instead. The AND IMMEDIATE form is shorter and is less likely to be cracked. This is a problem for 32-bit ANDs because we assume that all 32-bit operations will leave the high word untouched, whereas RISBG used in this way will either clear the high word or copy it from the source register. The patch uses the z196 instruction RISBLG for this instead. This means that z10 will be restricted to NILL, NILH and NILF for 32-bit ANDs, but I think that should be OK for now. Although we're using z10 as the base architecture, the optimization work is going to be focused more on z196 and zEC12. llvm-svn: 187492	2013-07-31 11:36:35 +00:00
Elena Demikhovsky	175a2e60dd	Added INSERT and EXTRACT intructions from AVX-512 ISA. All insertf/extractf functions replaced with insert/extract since we have insertf and inserti forms. Added lowering for INSERT_VECTOR_ELT / EXTRACT_VECTOR_ELT for 512-bit vectors. Added lowering for EXTRACT/INSERT subvector for 512-bit vectors. Added a test. llvm-svn: 187491	2013-07-31 11:35:14 +00:00
Craig Topper	45e8fdfc7f	Changed register names (and pointer keywords) to be lower case when using Intel X86 assembler syntax. Patch by Richard Mitton. llvm-svn: 187476	2013-07-31 02:47:52 +00:00
Andrew Trick	c8b891d6ff	This test may have been sensitive to the ARM ABI... llvm-svn: 187442	2013-07-30 20:34:59 +00:00
Andrew Trick	bf377e8816	MI Sched fix: assert "Disconnected LRG within the scheduling region." llvm-svn: 187435	2013-07-30 19:59:08 +00:00
Tom Stellard	0009a2cbb1	R600/SI: Expand vector fp <-> int conversions llvm-svn: 187421	2013-07-30 14:31:03 +00:00
Saleem Abdulrasool	f01cc77809	[ARM] check bitwidth in PerformORCombine When simplifying a (or (and B A) (and C ~A)) to a (VBSL A B C) ensure that the bitwidth of the second operands to both ands match before comparing the negation of the values. Split the check of the value of the second operands to the ands. Move the cast and variable declaration slightly higher to make it slightly easier to follow. Bug-Id: 16700 Signed-off-by: Saleem Abdulrasool <compnerd@compnerd.org> llvm-svn: 187404	2013-07-30 04:43:08 +00:00
Quentin Colombet	dbdfe6759b	[R600] Replicate old DAGCombiner behavior in target specific DAG combine. build_vector is lowered to REG_SEQUENCE, which is something the register allocator does a good job at optimizing. llvm-svn: 187397	2013-07-30 00:27:16 +00:00
Quentin Colombet	94c7d4af34	[DAGCombiner] insert_vector_elt: Avoid building a vector twice. This patch prevents the following combine when the input vector is used more than once. insert_vector_elt (build_vector elt0, ..., eltN), NewEltIdx, idx => build_vector elt0, ..., NewEltIdx, ..., eltN The reasons are: - Building a vector may be expensive, so try to reuse the existing part of a vector instead of creating a new one (think big vectors). - elt0 to eltN now have two users instead of one. This may prevent some other optimizations. llvm-svn: 187396	2013-07-30 00:24:09 +00:00
Manman Ren	46cc5a281a	Debug Info: enable verifier for testing cases. llvm-svn: 187375	2013-07-29 20:18:19 +00:00
Manman Ren	7a31996783	Debug Info: update testing cases to pass verifier. llvm-svn: 187362	2013-07-29 18:12:58 +00:00
Nico Rieck	552fc262e0	Proper va_arg/va_copy lowering on win64 Win64 uses CharPtrBuiltinVaList instead of X86_64ABIBuiltinVaList like other 64-bit targets. llvm-svn: 187355	2013-07-29 13:07:06 +00:00
Silviu Baranga	5aac9ffdd0	Allow generation of vmla.f32 instructions when targeting Cortex-A15. The patch also adds the VFP4 feature to Cortex-A15 and fixes the DontUseFusedMAC predicate so that we can still generate vmla.f32 instructions on non-darwin targets with VFP4. llvm-svn: 187349	2013-07-29 09:25:50 +00:00
Manman Ren	d32ce58b2f	Debug Info Verifier: verify SPs in llvm.dbg.sp. Also always add DIType, DISubprogram and DIGlobalVariable to the list in DebugInfoFinder without checking them, so we can verify them later on. llvm-svn: 187285	2013-07-27 01:26:08 +00:00
Rafael Espindola	d96265e7c2	next batch of -disable-debug-info-verifier llvm-svn: 187260	2013-07-26 22:31:26 +00:00
Akira Hatanaka	329763d851	[mips] Implement llvm.trap intrinsic. Patch by Sasa Stankovic. llvm-svn: 187244	2013-07-26 20:58:55 +00:00
Manman Ren	2bd542e9c6	Debug Info Verifier: enable verification of DICompileUnit. We used to call Verify before adding DICompileUnit to the list, and now we remove the check and always add DICompileUnit to the list in DebugInfoFinder, so we can verify them later on. llvm-svn: 187237	2013-07-26 20:04:30 +00:00
Akira Hatanaka	66dcf905b6	[mips] Print instructions "beq", "bne" and "or" using assembler pseudo instructions "beqz", "bnez" and "move", when possible. beq $2, $zero, $L1 => beqz $2, $L1 bne $2, $zero, $L1 => bnez $2, $L1 or $2, $3, $zero => move $2, $3 llvm-svn: 187229	2013-07-26 18:34:25 +00:00
Justin Holewinski	d714d8ebe8	Add a target legalize hook for SplitVectorOperand (again) CustomLowerNode was not being called during SplitVectorOperand, meaning custom legalization could not be used by targets. This also adds a test case for NVPTX that depends on this custom legalization. Differential Revision: http://llvm-reviews.chandlerc.com/D1195 Attempt to fix the buildbots by making the X86 test I just added platform independent llvm-svn: 187202	2013-07-26 13:28:29 +00:00
Rafael Espindola	c3d143707e	Revert "Add a target legalize hook for SplitVectorOperand" This reverts commit 187198. It broke the bots. The soft float test probably needs a -triple because of name differences. On the hard float test I am getting a "roundss $1, %xmm0, %xmm0", instead of "vroundss $1, %xmm0, %xmm0, %xmm0". llvm-svn: 187201	2013-07-26 13:18:16 +00:00
Justin Holewinski	c729fa30b1	Add a target legalize hook for SplitVectorOperand CustomLowerNode was not being called during SplitVectorOperand, meaning custom legalization could not be used by targets. This also adds a test case for NVPTX that depends on this custom legalization. Differential Revision: http://llvm-reviews.chandlerc.com/D1195 llvm-svn: 187198	2013-07-26 12:46:39 +00:00
Roman Divacky	32a21acb65	PPC32 va_list is an actual structure so va_copy needs to copy the whole structure not just a pointer. This implements that and thus fixes va_copy on PPC32. Fixes #15286. Both bug and patch by Florian Zeitz! llvm-svn: 187158	2013-07-25 21:36:47 +00:00
Manman Ren	c20620404a	Debug Info: improve the verifier to check field types. Make sure the context field of DIType is MDNode. Fix testing cases to make them pass the verifier. llvm-svn: 187150	2013-07-25 19:33:30 +00:00
Rafael Espindola	32f9d6abe2	Remove the mblaze backend from llvm. Approval in here http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-July/064169.html llvm-svn: 187145	2013-07-25 18:55:05 +00:00
Andrew Trick	7b0a985247	Evict local live ranges if they can be reassigned. The previous change to local live range allocation also suppressed eviction of local ranges. In rare cases, this could result in more expensive register choices. This commit actually revives a feature that I added long ago: check if live ranges can be reassigned before eviction. But now it only happens in rare cases of evicting a local live range because another local live range wants a cheaper register. The benefit is improved code size for some benchmarks on x86 and armv7. I measured no significant compile time increase and performance changes are noise. llvm-svn: 187140	2013-07-25 18:35:19 +00:00
Andrew Trick	b401fd4c9e	Allocate local registers in order for optimal coloring. Also avoid locals evicting locals just because they want a cheaper register. Problem: MI Sched knows exactly how many registers we have and assumes they can be colored. In cases where we have large blocks, usually from unrolled loops, greedy coloring fails. This is a source of "regressions" from the MI Scheduler on x86. I noticed this issue on x86 where we have long chains of two-address defs in the same live range. It's easy to see this in matrix multiplication benchmarks like IRSmk and even the unit test misched-matmul.ll. A fundamental difference between the LLVM register allocator and conventional graph coloring is that in our model a live range can't discover its neighbors, it can only verify its neighbors. That's why we initially went for greedy coloring and added eviction to deal with the hard cases. However, for singly defined and two-address live ranges, we can optimally color without visiting neighbors simply by processing the live ranges in instruction order. Other beneficial side effects: It is much easier to understand and debug regalloc for large blocks when the live ranges are allocated in order. Yes, global allocation is still very confusing, but it's nice to be able to comprehend what happened locally. Heuristics could be added to bias register assignment based on instruction locality (think late register pairing, banks...). Intuituvely this will make some test cases that are on the threshold of register pressure more stable. llvm-svn: 187139	2013-07-25 18:35:14 +00:00
Rafael Espindola	837f0a4606	Current batch of -disable-debug-info-verifier. llvm-svn: 187130	2013-07-25 17:16:05 +00:00
Tim Northover	41d15677dc	AArch64: add llc-based tests for previous commit. Better to have tests run even on non-AArch64 platforms. llvm-svn: 187128	2013-07-25 16:23:55 +00:00
Richard Sandiford	d3155041a0	[SystemZ] Rework compare and branch support Before the patch we took advantage of the fact that the compare and branch are glued together in the selection DAG and fused them together (where possible) while emitting them. This seemed to work well in practice. However, fusing the compare so early makes it harder to remove redundant compares in cases where CC already has a suitable value. This patch therefore uses the peephole analyzeCompare/optimizeCompareInstr pair of functions instead. No behavioral change intended, but it paves the way for a later patch. llvm-svn: 187116	2013-07-25 09:34:38 +00:00
Richard Sandiford	a8ce38895a	[SystemZ] Add LOCR and LOCGR llvm-svn: 187113	2013-07-25 09:11:15 +00:00
Richard Sandiford	ef1a928c13	[SystemZ] Add LOC and LOCG As with the stores, these instructions can trap when the condition is false, so they are only used for things like (cond ? x : *ptr). llvm-svn: 187112	2013-07-25 09:04:52 +00:00
Richard Sandiford	8de9a94d5e	[SystemZ] Add STOC and STOCG These instructions are allowed to trap even if the condition is false, so for now they are only used for "ptr = (cond ? x : ptr)"-style constructs. llvm-svn: 187111	2013-07-25 08:57:02 +00:00
Manman Ren	c48140740e	Debug Info: improve the verifier to check field types. Make sure the context and type fields are MDNodes. We will generate verification errors if those fields are non-empty strings. Fix testing cases to make them pass the verifier. llvm-svn: 187106	2013-07-25 06:43:01 +00:00
Bill Wendling	a61c3eb016	Replace the "NoFramePointerElimNonLeaf" target option with a function attribute. There's no need to specify a flag to omit frame pointer elimination on non-leaf nodes...(Honestly, I can't parse that option out.) Use the function attribute stuff instead. llvm-svn: 187093	2013-07-25 00:34:29 +00:00
Manman Ren	425f3c838c	Update testing cases to pass debug info verifier. llvm-svn: 187083	2013-07-24 22:23:00 +00:00
Quentin Colombet	83197eea3a	Fix a bug in IfConverter with nested predicates. Prior to this patch, IfConverter may widen the cases where a sequence of instructions were executed because of the way it uses nested predicates. This result in incorrect execution. For instance, Let A be a basic block that flows conditionally into B and B be a predicated block. B can be predicated with A.BrToBPredicate into A iff B.Predicate is less "permissive" than A.BrToBPredicate, i.e., iff A.BrToBPredicate subsumes B.Predicate. The IfConverter was checking the opposite: B.Predicate subsumes A.BrToBPredicate. <rdar://problem/14379453> llvm-svn: 187071	2013-07-24 20:20:37 +00:00
Manman Ren	2abe20d1ee	Debug Info: improve the Finder. Improve the Finder to handle context of a DIVariable used by DbgValueInst. Fix testing cases to make them pass the verifier. llvm-svn: 187052	2013-07-24 17:10:09 +00:00
Manman Ren	02d93a8ef2	Update testing cases to pass debug info verifier. llvm-svn: 187049	2013-07-24 15:55:41 +00:00
Manman Ren	9438f9191a	Update testing cases to make them pass debug info verification. llvm-svn: 187016	2013-07-24 01:26:37 +00:00
Tom Stellard	cf58aac74c	DAGCombiner: Pass the correct type to TargetLowering::isF(Abs\|Neg)Free This commit also implements these functions for R600 and removes a test case that was relying on the buggy behavior. llvm-svn: 187007	2013-07-23 23:55:03 +00:00
Tom Stellard	7e06ca2a39	R600: Treat CONSTANT_ADDRESS loads like GLOBAL_ADDRESS loads when necessary These are really the same address space in hardware. The only difference is that CONSTANT_ADDRESS uses a special cache for faster access. When we are unable to use the constant kcache for some reason (e.g. smaller types or lack of indirect addressing) then the instruction selector must use GLOBAL_ADDRESS loads instead. llvm-svn: 187006	2013-07-23 23:54:56 +00:00
Manman Ren	35c74ac066	Debug Info: improve the Finder. Improve the Finder to handle context of a DIVariable. If Scope is a DICompileUnit, add it to the list of CUs. llvm-svn: 187003	2013-07-23 23:10:00 +00:00
Quentin Colombet	995760ffc1	[ARM][ISel] Improve the lowering of vector loads. When vectors are built from a single value, the ARM lowering issues a scalar_to_vector node. This node is then always morphed into a move from the general purpose unit to the vector unit. When the value comes from a load, this can be simplified into a vector load to the right lane. This patch changes the lowering of insert_vector_elt to expose a vector friendly pattern in this situation. This is a step toward fixing <rdar://problem/14170854>. llvm-svn: 186999	2013-07-23 22:34:47 +00:00
Tom Stellard	e096e3c298	R600: Add support for 24-bit MAD instructions Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186923	2013-07-23 01:48:49 +00:00
Tom Stellard	803a4c6e50	R600: Add support for 24-bit MUL instructions Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186922	2013-07-23 01:48:42 +00:00
Tom Stellard	705721da31	R600: Improve support for < 32-bit loads Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186921	2013-07-23 01:48:35 +00:00
Tom Stellard	bc9deba8ad	R600: Move CONST_ADDRESS folding into AMDGPUDAGToDAGISel::Select() This increases the number of opportunites we have for folding. With the previous implementation we were unable to fold into any instructions other than the first when multiple instructions were selected from a single SDNode. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186919	2013-07-23 01:48:24 +00:00
Tom Stellard	e60bb5f272	R600: Use KCache for kernel arguments Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186918	2013-07-23 01:48:18 +00:00
Tom Stellard	6ca248c89f	R600: Use the same compute kernel calling convention for all GPUs A side-effect of this is that now the compiler expects kernel arguments to be 4-byte aligned. Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186916	2013-07-23 01:48:05 +00:00
Tom Stellard	ac10764020	R600: Use correct LoadExtType when lowering kernel arguments Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186915	2013-07-23 01:47:58 +00:00
Tom Stellard	d7dd88a3f7	R600: Clean up extended load patterns Reviewed-by: Vincent Lejeune <vljn at ovi.com> llvm-svn: 186914	2013-07-23 01:47:52 +00:00
Tom Stellard	25ce190913	R600: Expand vector FNEG llvm-svn: 186913	2013-07-23 01:47:46 +00:00
Manman Ren	3a6f508b6b	Debug Info Finder: use processDeclare and processValue to list debug info MDNodes used by DbgDeclareInst and DbgValueInst. Another 16 testing cases failed and they are disabled with -disable-debug-info-verifier. A total of 34 cases are disabled with -disable-debug-info-verifier and will be corrected. llvm-svn: 186902	2013-07-23 00:22:51 +00:00
Mihai Popa	cb970f1789	This adds range checking for "ldr Rn, [pc, #imm]" Thumb instructions. With this patch: 1. ldr.n is recognized as mnemonic for the short encoding 2. ldr.w is recognized as menmonic for the long encoding 3. ldr will map to either short or long encodings depending on the size of the offset llvm-svn: 186831	2013-07-22 15:49:36 +00:00
Justin Holewinski	0a7b783785	[NVPTX] Use approximate FP ops when unsafe-fp-math is used, and append .ftz to instructions if the nvptx-f32ftz attribute is set to "true" llvm-svn: 186820	2013-07-22 12:18:04 +00:00
Lang Hames	2d3b19969c	Refactor AnalyzeBranch on ARM. The previous version did not always analyze indirect branches correctly. Under some circumstances, this led to the deletion of basic blocks that were the destination of indirect branches. In that case it left indirect branches to nowhere in the code. This patch replaces, and is more general than either of the previous fixes for indirect-branch-analysis issues, r181161 and r186461. For other branches (not indirect) this refactor should have almost identical behavior to the previous version. There are some corner cases where this refactor is able to analyze blocks that the previous version could not (e.g. this necessitated the update to thumb2-ifcvt2.ll). <rdar://problem/14464830> llvm-svn: 186735	2013-07-19 23:52:47 +00:00
Vincent Lejeune	946e631d0f	R600: Don't emit empty then clause and use alu_pop_after llvm-svn: 186725	2013-07-19 21:45:15 +00:00
Rafael Espindola	6178a47aa5	s/compiler_used/compiler.used/. We were incorrectly using compiler_used instead of compiler.used. Unfortunately the passes using the broken name had tests also using the broken name. llvm-svn: 186705	2013-07-19 18:44:51 +00:00
Richard Sandiford	49d61a017c	[SystemZ] Add tests for ALHSIK and ALGHSIK The insn definitions themselves crept into r186689, sorry. This should be the last of the distinct-ops instructions. llvm-svn: 186690	2013-07-19 16:44:32 +00:00
Richard Sandiford	fe2faa0d5b	[SystemZ] Add ALRK, AGLRK, SLRK and SGLRK Follows the same lines as r186686, but much more limited, since we only use ADD LOGICAL for multi-i64 additions. llvm-svn: 186689	2013-07-19 16:37:00 +00:00
Richard Sandiford	8ae3d3dbfc	[SystemZ] Add AHIK and AGHIK I did these as a separate patch because it uses a slightly different form of RIE layout. llvm-svn: 186687	2013-07-19 16:32:12 +00:00
Richard Sandiford	8dca7aa469	[SystemZ] Add ARK, AGRK, SRK and SGRK The testsuite changes follow the same lines as for r186683. llvm-svn: 186686	2013-07-19 16:26:39 +00:00
Richard Sandiford	ae52d0261f	[SystemZ] Add NGRK, OGRK and XGRK Like r186683, but for 64 bits. llvm-svn: 186685	2013-07-19 16:24:22 +00:00
Richard Sandiford	f4f67cacd7	[SystemZ] Add NRK, ORK and XRK The atomic tests assume the two-operand forms, so I've restricted them to z10. Running and-01.ll, or-01.ll and xor-01.ll for z196 as well as z10 shows why using convertToThreeAddress() is better than exposing the three-operand forms first and then converting back to two operands where possible (which is what I'd originally tried). Using the three-operand form first stops us from taking advantage of NG, OG and XG for spills. llvm-svn: 186683	2013-07-19 16:21:55 +00:00
Richard Sandiford	499b1e1400	[SystemZ] Use SLLK, SRLK and SRAK for codegen This patch uses the instructions added in r186680 for codegen. llvm-svn: 186681	2013-07-19 16:12:08 +00:00
Manman Ren	4cd42b4157	Try to appease the bots. llvm-svn: 186653	2013-07-19 04:56:51 +00:00
Andrew Trick	4e802c9cb5	MI Sched: test case fix for previous checkin. llvm-svn: 186635	2013-07-19 00:31:31 +00:00
Manman Ren	d66f7fe8a2	Debug Info: enable verifying by default and disable testing cases that fail. 1> Use DebugInfoFinder to find debug info MDNodes. 2> Add disable-debug-info-verifier to disable verifying debug info. 3> Disable verifying for testing cases that fail (will update the testing cases later on). 4> MDNodes generated by clang can have empty filename for TAG_inheritance and TAG_friend, so DIType::Verify is modified accordingly. Note that DebugInfoFinder does not list all debug info MDNode. For example, clang can generate: metadata !{i32 786468}, which will fail to verify. This MDNode is used by debug info but not included in DebugInfoFinder. This MDNode is generated as a temporary node in DIBuilder::createFunction Value *TElts[] = { GetTagConstant(VMContext, DW_TAG_base_type) }; MDNode::getTemporary(VMContext, TElts) llvm-svn: 186634	2013-07-19 00:31:03 +00:00
Stephen Lin	798e242090	Update to more CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change. All changes were made by the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" grep -q "^; RUN: llc.debug" $NAME && continue grep -q "^; RUN:.llvm-objdump" $NAME && continue grep -q "^; RUN: opt." $NAME && continue TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$[A-Za-z0-9_-]$$[A-Za-z0-9_-]$:$ $$FUNC[:] \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;$.$-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;$.$-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;$.$-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;$.*$-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME done This script catches a superset of the cases caught by the script associated with commit r186280. It initially found some false positives due to unusual constructs in a minority of tests; all such cases were disambiguated first in commit r186621. llvm-svn: 186624	2013-07-18 22:47:09 +00:00
Stephen Lin	52ebde139c	Disambiguate function names in some CodeGen tests. (Some tests were using function names that also were names of instructions and/or doing other unusual things that were making the test not amenable to otherwise scriptable pattern matching.) No functionality change. llvm-svn: 186621	2013-07-18 22:29:15 +00:00
Tom Stellard	2fd2f61532	R600/SI: Fix crash with VSELECT https://bugs.freedesktop.org/show_bug.cgi?id=66175 llvm-svn: 186616	2013-07-18 21:43:53 +00:00
Tom Stellard	3f8cd2512a	R600/SI: Add support for v2f32 loads llvm-svn: 186615	2013-07-18 21:43:48 +00:00
Tom Stellard	9268fc81d5	R600/SI: Add support for v2f32 stores llvm-svn: 186614	2013-07-18 21:43:42 +00:00
Tom Stellard	73fcab4d3a	R600: Expand VSELECT for all types llvm-svn: 186613	2013-07-18 21:43:35 +00:00
Stephen Lin	f569d23c3a	Update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change. llvm-svn: 186594	2013-07-18 18:35:22 +00:00
Joey Gouly	ee38b99990	Forgot 'svn add' again, sorry! Tests for r186574. llvm-svn: 186580	2013-07-18 13:17:26 +00:00
Richard Sandiford	be2932828b	[SystemZ] Use RNSBG This should be the last of the R.SBG patches for now. llvm-svn: 186573	2013-07-18 10:40:35 +00:00
Richard Sandiford	106d44afb8	[SystemZ] Generalize RxSBG SRA case The original code only folded SRA into ROTATE ... SELECTED BITS if there was no outer shift. This patch splits out that check and generalises it slightly. The extra cases aren't really that interesting, but this is paving the way for RNSBG support. llvm-svn: 186571	2013-07-18 10:14:55 +00:00
Richard Sandiford	8e6d7fea3c	[SystemZ] Use RXSBG Extend the previous R.SBG patches to handle XORs. llvm-svn: 186570	2013-07-18 10:06:15 +00:00
Craig Topper	641165b33f	Fix copy and paste bug from r186491 to make v2f64 use MOVAPD/MOVUPD as it should. llvm-svn: 186566	2013-07-18 07:16:44 +00:00
Hal Finkel	fdd124178e	PPC: Support dynamic allocas with large alignment Support for dynamic stack alignments in the PPC backend has been unfinished, in part because it depends on dynamic stack realignment (which I only just recently implemented fully). Now we can also support dynamic allocas with higher than the default target stack alignment (16 bytes). In order to round-up the requested size to the maximum requested alignment, we need an additional register to hold the rounded-up size. We're already using one scavenged register to hold the previous stack-pointer value (which needs to be stored with the signal-safe stdux update), and so when we have dynamic allocas and a large alignment, we allocate two emergency spill slots for the scavenger. llvm-svn: 186562	2013-07-18 04:28:21 +00:00
Hal Finkel	79a33a00d6	PPC: Add base-pointer support to builtin setjmp/longjmp First, this changes the base-pointer implementation to remove an unnecessary complication (and one that is incompatible with how builtin SjLj is implemented): instead of using r31 as the base pointer when it is not needed as a frame pointer, now the base pointer will always be r30 when needed. Second, we introduce another pseudo register, BP, which is used just like the FP pseudo register to refer to the base register before we know for certain what register it will be. Third, we now save BP into the jmp_buf, and restore r30 from that slot in longjmp. If the function that called setjmp did not use a base pointer, then r30 will be overwritten by the setjmp-calling-function's restore code. FP restoration (which is restored into r31) works the same way. llvm-svn: 186545	2013-07-17 23:50:51 +00:00
Joey Gouly	200e661b16	Add the tests that I forgot to 'svn add' with my previous commit (r186504). llvm-svn: 186506	2013-07-17 14:03:49 +00:00
Richard Osborne	b765390114	[XCore] Ensure implicit operands aren't lost on the return instruction. Patch by Robert Lytton. llvm-svn: 186500	2013-07-17 10:58:37 +00:00
Craig Topper	f16a718df3	Make x86 fast-isel correctly choose between aligned and unaligned operations for vector stores. Fixes PR16640. llvm-svn: 186491	2013-07-17 05:57:45 +00:00
Hal Finkel	149f358122	PPC: Add CTR-register clobber to builtin setjmp Because the builtin longjmp implementation uses a CTR-based indirect jump, when the control flow arrives at the builtin setjmp call, the CTR register has necessarily been clobbered. Correspondingly, this adds CTR to the list of implicit definitions of the builtin setjmp pseudo instruction. We don't need to add CTR to the implicit definitions of builtin longjmp because, even though it does clobber the CTR register, the control flow cannot return to inside the loop unless there is also a builtin setjmp call. llvm-svn: 186488	2013-07-17 05:35:44 +00:00
Hal Finkel	e625744d86	PPC: Implement base pointer and stack realignment This builds on some frame-lowering code that has existed since 2005 (r24224) but was disabled in 2008 (r48188) because it needed base pointer support to function correctly. This implementation follows the strategy suggested by Dale Johannesen in r48188 where the following comment was added: This does not currently work, because the delta between old and new stack pointers is added to offsets that reference incoming parameters after the prolog is generated, and the code that does that doesn't handle a variable delta. You don't want to do that anyway; a better approach is to reserve another register that retains to the incoming stack pointer, and reference parameters relative to that. And now we do exactly that. If we don't need a frame pointer, then we use r31 as a base pointer. If we do need a frame pointer, then we use r30 as a base pointer. The base pointer retains the value of the stack pointer before it was decremented in the prologue. We then use the base pointer to resolve all negative frame indicies. The basic scheme follows that for base pointers in the X86 backend. We use a base pointer when we need to dynamically realign the incoming stack pointer. This currently applies only to static objects (dynamic allocas with large alignments, and base-pointer support in SjLj lowering will come in future commits). llvm-svn: 186478	2013-07-17 00:45:52 +00:00
NAKAMURA Takumi	7b93767d62	llvm/test/CodeGen/X86/vec_setcc.ll: Add explicit -mtriple=x86_64-unknown-unknown to satisfy win32-targeted configuration. llvm-svn: 186477	2013-07-17 00:42:37 +00:00
Benjamin Kramer	6e6528e46d	Finally, force the target for this test. Should unbreak non-x86 buildbots. llvm-svn: 186445	2013-07-16 19:22:07 +00:00
Benjamin Kramer	876b63a443	Label names also differ between platforms. Use a relaxed regex. llvm-svn: 186442	2013-07-16 18:54:21 +00:00
Benjamin Kramer	1459dae6ee	Fix test not to fail when the target doesn't use leading underscores on symbols. llvm-svn: 186439	2013-07-16 18:42:01 +00:00
Manman Ren	c67f77c5d6	Cleanup testing case by using a shorter name for types. llvm-svn: 186436	2013-07-16 18:26:48 +00:00
Juergen Ributzka	e612fc1230	[X86] Use min/max to optimze unsigend vector comparison on X86 Use PMIN/PMAX for UGE/ULE vector comparions to reduce the number of required instructions. This trick also works for UGT/ULT, but there is no advantage in doing so. It wouldn't reduce the number of instructions and it would actually reduce performance. Reviewer: Ben radar:5972691 llvm-svn: 186432	2013-07-16 18:20:45 +00:00
Ulrich Weigand	c1b627a527	[APFloat] PR16573: Avoid losing mantissa bits in ppc_fp128 to double truncation When truncating to a format with fewer mantissa bits, APFloat::convert will perform a right shift of the mantissa by the difference of the precision of the two formats. Usually, this will result in just the mantissa bits needed for the target format. One special situation is if the input number is denormal. In this case, the right shift may discard significant bits. This is usually not a problem, since truncating a denormal usually results in zero (underflow) after normalization anyway, since the result format's exponent range is usually smaller than the target format's. However, there is one case where the latter property does not hold: when truncating from ppc_fp128 to double. In particular, truncating a ppc_fp128 whose first double of the pair is denormal should result in just that first double, not zero. The current code however performs an excessive right shift, resulting in lost result bits. This is then caught in the APFloat::normalize call performed by APFloat::convert and causes an assertion failure. This patch checks for the scenario of truncating a denormal, and attempts to (possibly partially) replace the initial mantissa right shift by decrementing the exponent, if doing so will still result in a valid target format exponent. Index: test/CodeGen/PowerPC/pr16573.ll =================================================================== --- test/CodeGen/PowerPC/pr16573.ll (revision 0) +++ test/CodeGen/PowerPC/pr16573.ll (revision 0) @@ -0,0 +1,11 @@ +; RUN: llc < %s \| FileCheck %s + +target triple = "powerpc64-unknown-linux-gnu" + +define double @test() { + %1 = fptrunc ppc_fp128 0xM818F2887B9295809800000000032D000 to double + ret double %1 +} + +; CHECK: .quad -9111018957755033591 + Index: lib/Support/APFloat.cpp =================================================================== --- lib/Support/APFloat.cpp (revision 185817) +++ lib/Support/APFloat.cpp (working copy) @@ -1956,6 +1956,23 @@ X86SpecialNan = true; } + // If this is a truncation of a denormal number, and the target semantics + // has larger exponent range than the source semantics (this can happen + // when truncating from PowerPC double-double to double format), the + // right shift could lose result mantissa bits. Adjust exponent instead + // of performing excessive shift. + if (shift < 0 && isFiniteNonZero()) { + int exponentChange = significandMSB() + 1 - fromSemantics.precision; + if (exponent + exponentChange < toSemantics.minExponent) + exponentChange = toSemantics.minExponent - exponent; + if (exponentChange < shift) + exponentChange = shift; + if (exponentChange < 0) { + shift -= exponentChange; + exponent += exponentChange; + } + } + // If this is a truncation, perform the shift before we narrow the storage. if (shift < 0 && (isFiniteNonZero() \|\| category==fcNaN)) lostFraction = shiftRight(significandParts(), oldPartCount, -shift); llvm-svn: 186409	2013-07-16 13:03:25 +00:00
Richard Osborne	e37374c506	[XCore] Fix printing of inline asm operands. Previously an asm operand with no operand modifier would give the error "invalid operand in inline asm". llvm-svn: 186407	2013-07-16 12:48:34 +00:00

... 3 4 5 6 7 ...

8200 Commits