llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 13:02:52 +02:00

Author	SHA1	Message	Date
Bill Schmidt	b0bab996e0	[PPC64] Fix PR19893 - improve code generation for local function addresses Rafael opened http://llvm.org/bugs/show_bug.cgi?id=19893 to track non-optimal code generation for forming a function address that is local to the compile unit. The existing code was treating both local and non-local functions identically. This patch fixes the problem by properly identifying local functions and generating the proper addis/addi code. I also noticed that Rafael's earlier changes to correct the surrounding code in PPCISelLowering.cpp were also needed for fast instruction selection in PPCFastISel.cpp, so this patch fixes that code as well. The existing test/CodeGen/PowerPC/func-addr.ll is modified to test the new code generation. I've added a -O0 run line to test the fast-isel code as well. Tested on powerpc64[le]-unknown-linux-gnu with no regressions. llvm-svn: 211056	2014-06-16 21:36:02 +00:00
Tim Northover	7f8ca02cf4	ARM: implement correct atomic operations on v7M ARM v7M has ldrex/strex but not ldrexd/strexd. This means 32-bit operations should work as normal, but 64-bit ones are almost certainly doomed. Patch by Phoebe Buckheister. llvm-svn: 211042	2014-06-16 18:49:36 +00:00
Louis Gerbarg	55f89e91ff	Fix illegal relocations in X86FastISel On x86_86 the lea instruction can only use a 32 bit immediate value. When the code is compiled statically the RIP register is not used, meaning the immediate is all that can be used for the relocation, which is not sufficient in the case of targets more than +/- 2GB away. This patch bails out of fast isel in those cases and reverts to DAG which does the right thing. Test case included. llvm-svn: 211040	2014-06-16 17:35:40 +00:00
Cameron McInally	1bfa586059	Hook up vector int_ctlz for AVX512. llvm-svn: 211024	2014-06-16 14:12:28 +00:00
Daniel Sanders	495b392e19	[mips][mips64r6] cl[oz], and dcl[oz] are re-encoded in MIPS32r6/MIPS64r6 Summary: There is no change to the restrictions, just the result register is stored once in the encoding rather than twice. The rt field is zero in MIPS32r6/MIPS64r6. Depends on D4119 Reviewers: zoran.jovanovic, jkolek, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D4120 llvm-svn: 211019	2014-06-16 13:18:59 +00:00
Daniel Sanders	2a30e4fcab	[mips][mips64r6] ll, sc, lld, and scd are re-encoded on MIPS32r6/MIPS64r6. Summary: The linked-load, store-conditional operations have been re-encoded such that have a 9-bit offset instead of the 16-bit offset they have prior to MIPS32r6/MIPS64r6. While implementing this, I noticed that the atomic load/store pseudos always emit a sign extension using sll and sra. I have improved this to use seb/seh when they are available (MIPS32r2/MIPS64r2 and above). Depends on D4118 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D4119 llvm-svn: 211018	2014-06-16 13:13:03 +00:00
James Molloy	d8293dd333	[AArch64] Fix a fencepost error in lowering for llvm.aarch64.neon.uqshl. Patch by Jiangning Liu! llvm-svn: 211014	2014-06-16 10:39:21 +00:00
Daniel Sanders	679bcf9838	[mips] Merge most of the big/little endian checks in atomic.ll Summary: There is very little difference between the big and little endian cases in test/CodeGen/Mips/atomic.ll. Merge them together using multiple FileCheck prefixes. Depends on D4117 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D4118 llvm-svn: 211013	2014-06-16 10:25:17 +00:00
Christian Pirker	219e80de72	ARMEB: Fix trunc store for vector types Reviewed at http://reviews.llvm.org/D4135 llvm-svn: 211010	2014-06-16 09:17:30 +00:00
Jingyue Wu	ae39e54823	Canonicalize addrspacecast ConstExpr between different pointer types As a follow-up to r210375 which canonicalizes addrspacecast instructions, this patch canonicalizes addrspacecast constant expressions. Given clang uses ConstantExpr::getAddrSpaceCast to emit addrspacecast cosntant expressions, this patch is also a step towards having the frontend emit canonicalized addrspacecasts. Piggyback a minor refactor in InstCombineCasts.cpp Update three affected tests in addrspacecast-alias.ll, access-non-generic.ll and constant-fold-gep.ll and added one new test in constant-fold-address-space-pointer.ll llvm-svn: 211004	2014-06-15 21:40:57 +00:00
Matt Arsenault	fafd3cb5a2	R600: Add a rotr testcase I forgot to add llvm-svn: 211002	2014-06-15 21:09:00 +00:00
Matt Arsenault	a88eef222c	R600: Remove a few more things from AMDILISelLowering Try to keep all the setOperationActions for integer ops together. llvm-svn: 211001	2014-06-15 21:08:58 +00:00
Matt Arsenault	1f47d520f5	R600: Fix assert on vector sdiv llvm-svn: 211000	2014-06-15 21:08:54 +00:00
Matt Arsenault	6f5ac69231	R600: Report that integer division is expensive. Divides by weird constants now emit much better code. llvm-svn: 210995	2014-06-15 19:48:16 +00:00
Tim Northover	9eac1de1e4	AArch64: improve handling & modelling of FP_TO_XINT nodes. There's probably no acatual change in behaviour here, just updating the LowerFP_TO_INT function to be more similar to the reverse implementation and updating costs to current CodeGen. llvm-svn: 210985	2014-06-15 09:27:15 +00:00
Tim Northover	0f6e617e90	AArch64: improve vector [su]itofp handling. This somehow got missed in the AArch64 merge, so should fix a performance regression since 3.4. llvm-svn: 210984	2014-06-15 09:27:06 +00:00
NAKAMURA Takumi	a6ff4c4e16	Don't expect tests always crashing. Add "REQUIRES:asserts". llvm-svn: 210983	2014-06-15 01:01:11 +00:00
Matt Arsenault	5f7306c2c6	R600: Add failing testcases. These are reduced from assert in the OpenCV CvtColor8u.BGR5652GRAY test. llvm-svn: 210969	2014-06-14 04:26:09 +00:00
Matt Arsenault	b2c8575d08	R600: Fix asserts related to constant initializers This would assert if a constant address space was extern and therefore didn't have an initializer. If the initializer was undef, it would hit the unreachable unhandled initializer case. An extern global should never really occur since we don't have machine linking, but bugpoint likes to remove initializers. llvm-svn: 210967	2014-06-14 04:26:05 +00:00
Jiangning Liu	c7a7a14ca1	Move GlobalMerge from Transform to CodeGen. This patch is to move GlobalMerge pass from Transform/Scalar to CodeGen, because GlobalMerge depends on TargetMachine. In the mean time, the macro INITIALIZE_TM_PASS is also moved to CodeGen/Passes.h. With this fix we can avoid making libScalarOpts depend on libCodeGen. llvm-svn: 210951	2014-06-13 22:57:59 +00:00
David Blaikie	33f6d6743b	DebugInfo: Following up to r209677, refactor local variable emission to delay the choice between emitting the definition attributes or using DW_AT_abstract_definition This doesn't fix the abstract variable handling yet, but it introduces a similar delay mechanism as was added for subprograms, causing DW_AT_location to be reordered to the beginning of the attribute list for local variables, and fixes all the test fallout for that. A subsequent commit will remove the abstract variable handling in DbgVariable and just do the abstract variable lookup at module end to ensure that abstract variables introduced after their concrete counterparts are appropriately referenced by the concrete variable. llvm-svn: 210943	2014-06-13 22:18:23 +00:00
Tim Northover	23b5546f2b	X86: lower ATOMIC_CMP_SWAP_WITH_SUCCESS directly Lowering this new node allows us to fold the almost universal comparison for success before it's even formed. Instead we can create a copy from EFLAGS and an X86ISD::SETCC operation since all "cmpxchg" instructions set the zero-flag to the correct value. rdar://problem/13201607 llvm-svn: 210923	2014-06-13 17:29:39 +00:00
Tim Northover	ad404d23ea	Atomics: make use of the "cmpxchg weak" instruction. This also simplifies the IR we create slightly: instead of working out where success & failure should go manually, it turns out we can just always jump to a success/failure block created for the purpose. Later phases will sort out the mess without much difficulty. llvm-svn: 210917	2014-06-13 16:45:52 +00:00
Tim Northover	b9ec29d7c5	IR: add "cmpxchg weak" variant to support permitted failure. This commit adds a weak variant of the cmpxchg operation, as described in C++11. A cmpxchg instruction with this modifier is permitted to fail to store, even if the comparison indicated it should. As a result, cmpxchg instructions must return a flag indicating success in addition to their original iN value loaded. Thus, for uniformity all cmpxchg instructions now return "{ iN, i1 }". The second flag is 1 when the store succeeded. At the DAG level, a new ATOMIC_CMP_SWAP_WITH_SUCCESS node has been added as the natural representation for the new cmpxchg instructions. It is a strong cmpxchg. By default this gets Expanded to the existing ATOMIC_CMP_SWAP during Legalization, so existing backends should see no change in behaviour. If they wish to deal with the enhanced node instead, they can call setOperationAction on it. Beware: as a node with 2 results, it cannot be selected from TableGen. Currently, no use is made of the extra information provided in this patch. Test updates are almost entirely adapting the input IR to the new scheme. Summary for out of tree users: ------------------------------ + Legacy Bitcode files are upgraded during read. + Legacy assembly IR files will be invalid. + Front-ends must adapt to different type for "cmpxchg". + Backends should be unaffected by default. llvm-svn: 210903	2014-06-13 14:24:07 +00:00
Cameron McInally	c8c6e33610	Fix bad copy-and-paste from r210652. AVX512 masked leading zero intrinsics. llvm-svn: 210901	2014-06-13 13:20:01 +00:00
NAKAMURA Takumi	55149aa040	llvm/test/CodeGen/X86/fast-isel-args-fail2.ll: Don't expect to fail with -Asserts. It might or might not crash. llvm-svn: 210894	2014-06-13 12:05:06 +00:00
Tim Northover	4a6737b5fc	CPP backend: set volatile property on atomic instructions. llvm-svn: 210890	2014-06-13 09:14:50 +00:00
Oliver Stannard	e34249bacf	ARM: Fix fastcc calling convention for Thumb1 When targetting Thumb1 on a processor which has a VFP unit (which is not accessible from Thumb1), we were converting the fastcc calling convention to AAPCS-VFP, which is not possible. llvm-svn: 210889	2014-06-13 08:33:03 +00:00
Matt Arsenault	e8c6185eba	R600/SI: Fix selection error on i64 rotl / rotr. Evergreen is still broken due to missing shl_parts. llvm-svn: 210885	2014-06-13 04:00:30 +00:00
Juergen Ributzka	9f0822c470	[FastISel][X86] Add support for cvttss2si/cvttsd2si intrinsics. This adds support for the cvttss2si/cvttsd2si intrinsics. Preceding insertelement instructions are folded into the conversion instruction (if possible). llvm-svn: 210870	2014-06-13 02:21:58 +00:00
Juergen Ributzka	be48c6b01a	[FastISel][X86] - Add branch weights Add branch weights to branch instructions, so that the following passes can optimize based on it (i.e. basic block ordering). llvm-svn: 210863	2014-06-13 00:45:11 +00:00
Juergen Ributzka	9dda2c5782	[FastISel][X86] Add MachineMemOperand to load/store instructions. This commit adds MachineMemOperands to load and store instructions. This allows the peephole optimizer to fold load instructions. Unfortunatelly the peephole optimizer currently doesn't run at -O0. llvm-svn: 210858	2014-06-12 23:27:57 +00:00
Juergen Ributzka	e094a4f155	Update test case to use "not" instead of "XFAIL". llvm-svn: 210829	2014-06-12 21:17:40 +00:00
Matt Arsenault	e19ddbd0dc	R600: Mostly remove remaining AMDIL intrinsics. Delete all unused ones, and add new AMDGPU named intrinsics for the ones that are. Handle the old AMDIL names for comptability (although remove their GCCBuiltin names) and add tests since there weren't any for these before. llvm-svn: 210827	2014-06-12 21:15:44 +00:00
Juergen Ributzka	cac321ed67	[FastISel][X86] Argument lowering test case This test case is supposed to xfail, because we do not handle structs or byval arguments. llvm-svn: 210816	2014-06-12 20:34:09 +00:00
Juergen Ributzka	a1c12b2a44	[FastIsel][X86] Add support for lowering the first 8 floating-point arguments. Recommit with fixed argument attribute checking code, which is required to bail out of all the cases we don't handle yet. llvm-svn: 210815	2014-06-12 20:12:34 +00:00
Saleem Abdulrasool	e89f958d6e	CodeGen: enable mov.w/mov.t pairs with minsize for WoA Windows on ARM uses COFF/PE which is intrinsically position independent. For the case of 32-bit immediates, use a pair-wise relocation as otherwise we may exceed the range of operators. This fixes a code generation crash when using -Oz when targeting Windows on ARM. llvm-svn: 210814	2014-06-12 20:06:33 +00:00
Juergen Ributzka	76f1e824ab	Revert "[FastIsel][X86] Add support for lowering the first 8 floating-point arguments." Reverting it because it breaks several tests. llvm-svn: 210810	2014-06-12 19:21:43 +00:00
Tom Stellard	5f887c9493	Revert "SelectionDAG: Enable (and (setcc x), (setcc y)) -> (setcc (and x, y)) for vectors" This reverts commit r210540, adds a testcase for the regression it caused, and marks the R600 test it was supposed to fix as XFAIL. llvm-svn: 210792	2014-06-12 16:04:47 +00:00
James Molloy	caa8dab3ef	Disable the load/store optimization pass for Thumb-1. Moritz's changes have improved codegen a lot, but further testing showed significant correctness problems. Disable by default until these have been worked out. Patch by Moritz Roth! llvm-svn: 210789	2014-06-12 15:18:33 +00:00
Daniel Sanders	03454ab1ca	[mips][mips64r6] bc1[tf] are not available on MIPS32r6/MIPS64r6 Summary: Also tightened up the acceptable condition operand for these instructions on MIPS-I to MIPS-III. Support for $fcc[1-7] was added in MIPS-IV. Prior to that only $fcc0 is acceptable. We currently don't optimize (BEQZ (NOT $a), $target) and similar. It's probably best to do this in InstCombine. Depends on D4111 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D4112 llvm-svn: 210787	2014-06-12 15:00:17 +00:00
Daniel Sanders	ff573c1cd3	[mips][mips64r6] [sl][duw]xc1 are not available on MIPS32r6/MIPS64r6 Summary: Folded mips64-fp-indexed-ls.ll into fp-indexed-ls.ll. To do so, the zext's in mips64-fp-indexed-ls.ll were changed to implicit sign extensions (performed by getelementptr). This does not affect the purpose of the test. Depends on D4004 Reviewers: zoran.jovanovic, jkolek, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D4110 llvm-svn: 210784	2014-06-12 14:19:28 +00:00
Daniel Sanders	c3777d6de2	[mips][mips64r6] c.cond.fmt, mov[fntz], and mov[fntz].[ds] are not available on MIPS32r6/MIPS64r6 Summary: c.cond.fmt has been replaced by cmp.cond.fmt. Where c.cond.fmt wrote to dedicated condition registers, cmp.cond.fmt writes 1 or 0 to normal FGR's (like the GPR comparisons). mov[fntz] have been replaced by seleqz and selnez. These instructions conditionally zero a register based on a bool in a GPR. The results can then be or'd together to act as a select without, for example, requiring a third register read port. mov[fntz].[ds] have been replaced with sel.[ds] MIPS64r6 currently generates unnecessary sign-extensions for most selects. This is because the result of a SETCC is currently an i32. Bits 32-63 are undefined in i32 and the behaviour of seleqz/selnez would otherwise depend on undefined bits. Later, we will fix this by making the result of SETCC an i64 on MIPS64 targets. Depends on D3958 Reviewers: jkolek, vmedic, zoran.jovanovic Reviewed By: vmedic, zoran.jovanovic Differential Revision: http://reviews.llvm.org/D4003 llvm-svn: 210777	2014-06-12 13:39:06 +00:00
Daniel Sanders	2cb9c461b4	[mips] Use MTHC1 when it is available (MIPS32r2 and later) for both FP32 and FP64 Summary: To make this work for both AFGR64 and FGR64 register sets, I've had to make the instruction definition consistent with the white lie (that it reads the lower 32-bits of the register) when they are generated by expandBuildPairF64(). Corrected the definition of hasMips32r2() and hasMips64r2() to include MIPS32r6 and MIPS64r6. Depends on D3956 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3957 llvm-svn: 210771	2014-06-12 11:55:58 +00:00
Daniel Sanders	c11a0f9244	[mips][mips64r6] madd.[ds], msub.[ds], nmadd.[ds], and nmsub.[ds] are not available on MIPS32r6/MIPS64r6 Summary: This patch updates both the assembler and the code generator. MIPS32r6/MIPS64r6 replaces them with maddf.[ds] and msubf.[ds] which are fused multiply-add/sub operations. We don't emit these yet, this patch only prevents the removed instructions from being emitted. Depends on D3955 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3956 llvm-svn: 210763	2014-06-12 11:04:18 +00:00
Daniel Sanders	ffe9a8dd2d	[mips][mips64r6] madd/maddu/msub/msubu are not available on MIPS32r6/MIPS64r6 Summary: This patch disables madd/maddu/msub/msubu in both the assembler and code generator. Depends on D3896 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3955 llvm-svn: 210762	2014-06-12 10:54:16 +00:00
Andrea Di Biagio	705818d2ca	[X86] Teach how to combine AVX and AVX2 horizontal binop on packed 256-bit vectors. This patch adds target combine rules to match: - [AVX] Horizontal add/sub of packed single/double precision floating point values from 256-bit vectors; - [AVX2] Horizontal add/sub of packed integer values from 256-bit vectors. llvm-svn: 210761	2014-06-12 10:53:48 +00:00
Daniel Sanders	6822f77667	[mips][mips64r6] Replace m[tf]hi, m[tf]lo, mult, multu, dmult, dmultu, div, ddiv, divu, ddivu for MIPS32r6/MIPS64. Summary: The accumulator-based (HI/LO) multiplies and divides from earlier ISA's have been removed and replaced with GPR-based equivalents. For example: div $1, $2 mflo $3 is now: div $3, $1, $2 This patch disables the accumulator-based multiplies and divides for MIPS32r6/MIPS64r6 and uses the GPR-based equivalents instead. Renamed expandPseudoDiv to insertDivByZeroTrap to better describe the behaviour of the function. MipsDelaySlotFiller now invalidates the liveness information when moving instructions to the delay slot. Without this, divrem.ll will abort since %GP ends up used before it is defined. Reviewers: vmedic, zoran.jovanovic, jkolek Reviewed By: jkolek Differential Revision: http://reviews.llvm.org/D3896 llvm-svn: 210760	2014-06-12 10:44:10 +00:00
Matt Arsenault	2c82883c7a	R600/SI: Use a register set to -1 for data0 on ds_inc/ds_dec There is not such thing as a 0-data ds instruction, and the data operand needs to be a vgpr set to something meaningful. llvm-svn: 210756	2014-06-12 08:21:54 +00:00
Juergen Ributzka	3fb59d77f5	[FastISel][x86] Add testcase for r210719. llvm-svn: 210746	2014-06-12 03:54:05 +00:00
Juergen Ributzka	15dae05820	[x86] Improve frameaddress test from r210709. llvm-svn: 210743	2014-06-12 03:29:29 +00:00
Juergen Ributzka	3c469924f6	[FastISel] Add support for the stackmap intrinsic. This implements target-independent FastISel lowering for the stackmap intrinsic. llvm-svn: 210742	2014-06-12 03:29:26 +00:00
Juergen Ributzka	f8b4c7a498	[FastISel][X86] Add support for the sqrt intrinsic. llvm-svn: 210720	2014-06-11 23:11:02 +00:00
Juergen Ributzka	99d39e4022	[FastISel][X86] Add support for the frameaddress intrinsic. llvm-svn: 210709	2014-06-11 21:44:44 +00:00
Chad Rosier	07ce4c0d5f	[AArch64] Basic Sched Model for Cortex-A57. Patch by Dave Estes<cestes@codeaurora.org> Differential Revision: http://reviews.llvm.org/D4008 llvm-svn: 210705	2014-06-11 21:06:56 +00:00
Matt Arsenault	b6b9bb5978	R600/SI: Fix bitcast between v2i32 and f64 This is the same problem fixed in r210664 for more types. The test passes without this fix. For some reason I'm only hitting this when creating selects lowered to v2i32 selects. llvm-svn: 210692	2014-06-11 19:31:13 +00:00
Matt Arsenault	33d239c3a3	R600/SI: Add common 64-bit LDS atomics llvm-svn: 210680	2014-06-11 18:08:54 +00:00
Matt Arsenault	ffa54309ef	R600/SI: Add 32-bit LDS atomic cmpxchg llvm-svn: 210678	2014-06-11 18:08:48 +00:00
Matt Arsenault	5784f403fd	R600/SI: Use LDS atomic inc / dec llvm-svn: 210677	2014-06-11 18:08:45 +00:00
Matt Arsenault	3067074b27	R600/SI: Add other LDS atomic operations llvm-svn: 210676	2014-06-11 18:08:42 +00:00
Matt Arsenault	e633c2eeb6	R600/SI: Fix backwards names for local atomic instructions. The manual lists them as _RTN_U32, not _U32_RTN, which is more consistent with how every other sized instruction is named. llvm-svn: 210674	2014-06-11 18:08:37 +00:00
Matt Arsenault	174782d2e9	R600/SI: Refactor local atomics. Use patterns that will also match the immediate offset to match the normal read / writes. llvm-svn: 210673	2014-06-11 18:08:34 +00:00
Matt Arsenault	a75d166beb	R600/SI: Use v_cvt_f32_ubyte* instructions This eliminates extra extract instructions when loading an i8 vector to a float vector. llvm-svn: 210666	2014-06-11 17:50:44 +00:00
Matt Arsenault	52e8d9b98d	R600/SI: Fix selection failure on scalar_to_vector There seem to be only 2 places that produce these, and it's kind of tricky to hit them. Also fixes failure to bitcast between i64 and v2f32, although this for some reason wasn't actually broken in the simple bitcast testcase, but did in the scalar_to_vector one. llvm-svn: 210664	2014-06-11 17:40:32 +00:00
Daniel Sanders	ff992a6f20	[mips][mips64r6] Improve tests affected by the changes to multiplies and divides Summary: MIPS32r6/MIPS64r6 support has not been added yet. inlineasm-cnstrnt-reg.ll: Explicitly specify the CPU since it will not work on MIPS32r6/MIPS64r6 when -integrated-as is the default. We can't change the mnemonic since the LO register is an implicit def of mtlo and MIPS32r6/MIPS64r6 has no instructions that use LO. 2008-08-01-AsmInline.ll: Explicitly specify the CPU since MIPS32r6/MIPS64r6 will correctly emit different code and this is a regression test. mips64instrs.ll and mips64muldiv.ll Check registers and the way the multiply is used in m1 divrem.ll Check registers and use multiple filecheck prefixes to limit redundancy Reviewers: vmedic, jkolek, zoran.jovanovic, matheusalmeida Reviewed By: matheusalmeida Subscribers: matheusalmeida Differential Revision: http://reviews.llvm.org/D3894 llvm-svn: 210656	2014-06-11 15:48:00 +00:00
Cameron McInally	a3c0d56a4d	Add AVX512 masked leadz instrinsic support. llvm-svn: 210652	2014-06-11 12:54:45 +00:00
Andrea Di Biagio	a40448e1d1	[X86] Refactor the logic to select horizontal adds/subs to a helper function. This patch moves part of the logic implemented by the target specific combine rules added at r210477 to a separate helper function. This should make easier to add more rules for matching AVX/AVX2 horizontal adds/subs. This patch also fixes a problem caused by a wrong check performed on indices of extract_vector_elt dag nodes in input to the scalar adds/subs. New tests have been added to verify that we correctly check indices of extract_vector_elt dag nodes when selecting a horizontal operation. llvm-svn: 210644	2014-06-11 07:57:50 +00:00
Jiangning Liu	4291d4247a	Global merge for global symbols. This commit is to improve global merge pass and support global symbol merge. The global symbol merge is not enabled by default. For aarch64, we need some more back-end fix to make it really benifit ADRP CSE. llvm-svn: 210640	2014-06-11 06:44:53 +00:00
Juergen Ributzka	cc0b7b7884	[FastISel][X86] Extend support for {s\|u}{add\|sub\|mul}.with.overflow intrinsics. llvm-svn: 210610	2014-06-10 23:52:44 +00:00
Matt Arsenault	4f96643a42	R600: Use BCNT_INT for evergreen llvm-svn: 210569	2014-06-10 19:18:28 +00:00
Matt Arsenault	6387e9a3dc	R600/SI: Implement i64 ctpop llvm-svn: 210568	2014-06-10 19:18:24 +00:00
Matt Arsenault	8407076508	R600/SI: Use bcnt instruction for ctpop llvm-svn: 210567	2014-06-10 19:18:21 +00:00
Matt Arsenault	d30b483e1a	R600: Handle fcopysign llvm-svn: 210564	2014-06-10 19:00:20 +00:00
Matt Arsenault	5bfef73e00	R600/SI: Handle sign_extend and zero_extend to i64 with patterns. llvm-svn: 210563	2014-06-10 18:54:59 +00:00
Reed Kotler	582410c0f5	Do Materialize Floating Point in Mips Fast-Isel Summary: Implement materialize of floating point literals in Mips Fast-Isel Reopened version of D3659 Test Plan: simplestorefp1.ll Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D4071 llvm-svn: 210546	2014-06-10 16:45:44 +00:00
Andrea Di Biagio	183859cf37	[X86] Improved target combine rules for selecting horizontal add/sub. This patch slightly changes the algorithm introduced at revision 210477 to fix a problem where the algorithm was producing incorrect code for the VEX.256 encoded versions of horizontal add/sub. For these cases, we now try to split the two 256-bit vectors into 128-bit chunks before emitting horizontal add/sub dag nodes. Added a new test case into haddsub-2.ll. llvm-svn: 210545	2014-06-10 16:42:57 +00:00
Adam Nemet	db983b8c6b	[X86] AVX512: Add vmovntdqa Along with the corresponding intrinsic and tests. llvm-svn: 210543	2014-06-10 16:39:53 +00:00
Renato Golin	89197528be	Fix a bug in the Thumb1 ARM Load/Store optimizer Previously, the basic block was searched for future uses of the base register, and if necessary any writeback to the base register was reset using a SUB instruction (e.g. before calling a function) just before such a use. However, this step happened before the merged LDM/STM instruction was built. So if there was (e.g.) a function call directly after the not-yet-formed LDM/STM, the pass would first insert a SUB instruction to reset the base register, and then (at the same location, incorrectly) insert the LDM/STM itself. This patch fixes PR19972. Patch by Moritz Roth. llvm-svn: 210542	2014-06-10 16:39:21 +00:00
Tom Stellard	ad2d29f10e	SelectionDAG: Don't use MVT::Other to determine legality of ISD::SELECT_CC The SelectionDAG bad a special case for ISD::SELECT_CC, where it would allow targets to specify: setOperationAction(ISD::SELECT_CC, MVT::Other, Expand); to indicate that they wanted to expand ISD::SELECT_CC for all types. This wasn't applied correctly everywhere, and it makes writing new DAG patterns with ISD::SELECT_CC difficult. llvm-svn: 210541	2014-06-10 16:01:29 +00:00
Tom Stellard	aab1db4cd9	SelectionDAG: Expand SELECT_CC to SELECT + SETCC This consolidates code from the Hexagon, R600, and XCore targets. No functionality change intended. llvm-svn: 210539	2014-06-10 16:01:22 +00:00
Bill Schmidt	6e11183ad7	[PPC64LE] Recognize shufflevector patterns for little endian Various masks on shufflevector instructions are recognizable as specific PowerPC instructions (vector pack, vector merge, etc.). There is existing code in PPCISelLowering.cpp to recognize the correct patterns for big endian code. The masks for these instructions are different for little endian code due to the big-endian numbering employed by these instructions. This patch adds the recognition code for little endian. I've added a new test case test/CodeGen/PowerPC/vec_shuffle_le.ll for this. The existing recognizer test (vec_shuffle.ll) is unnecessarily verbose and difficult to read, so I felt it was better to add a new test rather than modify the old one. llvm-svn: 210536	2014-06-10 14:35:01 +00:00
Chad Rosier	0f6d185fcf	[AArch64] Emit .ident compiler version attribute. Patch by Ana Pazos<apazos@codeaurora.org>! llvm-svn: 210535	2014-06-10 14:32:08 +00:00
Tim Northover	8d5e97704b	AArch64: disallow x30 & x29 as the destination for indirect tail calls As Ana Pazos pointed out, these have to be restored to their incoming values before a function returns; i.e. before the tail call. So they can't be used correctly as the destination register. llvm-svn: 210525	2014-06-10 10:50:24 +00:00
Tim Northover	7a0bf66207	Revert "X86: elide comparisons after cmpxchg instructions." This reverts commit r210523. It was committed prematurely without waiting for review. llvm-svn: 210524	2014-06-10 10:50:11 +00:00
Tim Northover	d8b770a0be	X86: elide comparisons after cmpxchg instructions. The C++ and C semantics of the compare_and_swap operations actually require us to return a boolean "success" value. In LLVM terms this means a second comparison of the output of "cmpxchg" against the input desired value. However, x86's "cmpxchg" instruction sets all flags for the comparison formed, so we can skip any secondary comparison. (N.b. this isn't true for cmpxchg8b/16b, which only set ZF). rdar://problem/13201607 llvm-svn: 210523	2014-06-10 10:49:07 +00:00
Tim Northover	bfac8dd607	AArch64: teach FastISel how to handle offset FrameIndices Previously we were abandonning the attempt, leading to some combination of extra work (when selection of a load/store fails completely) and inferior code (when this leads to a real memcpy call instead of inlining). rdar://problem/17187463 llvm-svn: 210520	2014-06-10 09:52:44 +00:00
Tim Northover	666d07f003	AArch64: make FastISel memcpy emission more robust. We were hitting an assert if FastISel couldn't create the load or store we requested. Currently this happens for large frame-local addresses, though CodeGen could be improved there. rdar://problem/17187463 llvm-svn: 210519	2014-06-10 09:52:40 +00:00
Alp Toker	03b6e12fae	Reduce verbiage of lit.local.cfg files We can just split targets_to_build in one place and make it immutable. llvm-svn: 210496	2014-06-09 22:42:55 +00:00
Bill Schmidt	41cd7375c8	[PPC64LE] Generate correct code for unaligned little-endian vector loads The code in PPCTargetLowering::PerformDAGCombine() that handles unaligned Altivec vector loads generates a lvsl followed by a vperm. As we've seen in numerous other places, the vperm instruction has a big-endian bias, and this is fixed for little endian by complementing the permute control vector and swapping the input operands. In this case the lvsl is providing the permute control vector. Rather than generating an lvsl and a complement operation, it is sufficient to generate an lvsr instruction instead. Thus for LE code generation we will generate an lvsr rather than an lvsl, and swap the other input arguments on the vperm. The existing test/CodeGen/PowerPC/vec_misalign.ll is updated to test the code generation for PPC64 and PPC64LE, in addition to the existing PPC32/G5 testing. llvm-svn: 210493	2014-06-09 22:00:52 +00:00
Saleem Abdulrasool	cf709958ac	ARM: add VLA extension for WoA Itanium ABI The armv7-windows-itanium environment is nearly identical to the MSVC ABI. It has a few divergences, mostly revolving around the use of the Itanium ABI for C++. VLA support is one of the extensions that are amongst the set of the extensions. This adds support for proper VLA emission for this environment. This is somewhat similar to the handling for __chkstk emission on X86 and the large stack frame emission for ARM. The invocation style for chkstk is still controlled via the -mcmodel flag to clang. Make an explicit note that this is an extension. llvm-svn: 210489	2014-06-09 20:18:42 +00:00
Andrea Di Biagio	23548cb631	[X86] Add target combine rules for horizontal add/sub. This patch adds new target specific combine rules to identify horizontal add/sub idioms from BUILD_VECTOR dag nodes. This patch also teaches the DAGCombiner how to canonicalize sequences of insert_vector_elt dag nodes according to the following rule: (insert_vector_elt (insert_vector_elt A, I0), I1) -> (insert_vecto_elt (insert_vector_elt A, I1), I0) This new canonicalization rule only triggers if the inner insert_vector dag node has exactly one use; also, both indices must be known constants, and I1 < I0. This last rule made it possible to write a simpler algorithm to identify horizontal add/sub patterns because now we don't have to worry about the ordering of insert_vector_elt dag nodes. llvm-svn: 210477	2014-06-09 16:54:41 +00:00
Matt Arsenault	c9f3bd4d6c	R600/SI: Keep 64-bit not on SALU llvm-svn: 210476	2014-06-09 16:36:31 +00:00
Matt Arsenault	a34a3c834c	R600: Fix selection failure for vector bswap llvm-svn: 210475	2014-06-09 16:20:25 +00:00
Bill Schmidt	3ff0a8eb8b	[PPC64LE] Generate correct little-endian code for v16i8 multiply The existing code in PPCTargetLowering::LowerMUL() for multiplying two v16i8 values assumes that vector elements are numbered in big-endian order. For little-endian targets, the vector element numbering is reversed, but the vmuleub, vmuloub, and vperm instructions still assume big-endian numbering. To account for this, we must adjust the permute control vector and reverse the order of the input registers on the vperm instruction. The existing test/CodeGen/PowerPC/vec_mul.ll is updated to be executed on powerpc64 and powerpc64le targets as well as the original powerpc (32-bit) target. llvm-svn: 210474	2014-06-09 16:06:29 +00:00
NAKAMURA Takumi	3491102cd9	llvm/test/CodeGen/X86/2014-05-29-factorial.ll: Relax an expression to match Win32 x64. llvm-svn: 210471	2014-06-09 14:20:23 +00:00
Andrea Di Biagio	26cba06d11	[X86] Avoid emitting unnecessary test instructions. This patch teaches the backend how to check for the 'NoSignedWrap' flag on binary operations to improve the emission of 'test' instructions. If the result of a binary operation is known not to overflow we know that resetting the Overflow flag is unnecessary and so we can avoid emitting the test instruction. Patch by Marcello Maggioni. llvm-svn: 210468	2014-06-09 12:34:50 +00:00
Andrea Di Biagio	a2490b5eaa	[DAG] Expose NoSignedWrap, NoUnsignedWrap and Exact flags to SelectionDAG. This patch modifies SelectionDAGBuilder to construct SDNodes with associated NoSignedWrap, NoUnsignedWrap and Exact flags coming from IR BinaryOperator instructions. Added a new SDNode type called 'BinaryWithFlagsSDNode' to allow accessing nsw/nuw/exact flags during codegen. Patch by Marcello Maggioni. llvm-svn: 210467	2014-06-09 12:32:53 +00:00
Matt Arsenault	232bf82e54	R600: Add more and testcases llvm-svn: 210453	2014-06-09 08:36:53 +00:00
Chad Rosier	22a15b47d4	[AArch64] Fix the ordering of the accumulate operand in SchedRW list. Patch by Dave Estes <cestes@codeaurora.org> http://reviews.llvm.org/D4037 llvm-svn: 210446	2014-06-09 01:54:00 +00:00
Chad Rosier	010594577d	[AArch64] When combining constant mul of power of 2 plus/minus 1, prefer shift plus add. The shift can be folded into the add. This only effects codegen when the constant is 3. llvm-svn: 210445	2014-06-09 01:25:51 +00:00
Alp Toker	ad1219e3fd	Revert "Do materialize for floating point" 1) The commit was made despite profound lack of understanding: "I did not understand the comment about using dyn_cast instead of isa. I will commit as is and make the update after. You can explain what you meant to me." Commit first, understand later isn't OK. 2) Review comments were simply ignored: "Can you edit the summary to describe what the patch is for? It appears to be a list of commits at the moment." 3) The patch got LGTM'd off-list without any indication of readiness. 4) The public mailing list was excluded from patch review so all of this was hidden from the community. This reverts commit r210414. llvm-svn: 210424	2014-06-08 09:13:42 +00:00
Reed Kotler	75bd26e764	Do materialize for floating point Summary: start to do simple constants finish simplestore add test case format Merge branch 'master' into 1756_8 Add basic functionality for assignment of ints. This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel Merge branch 'master' into 1756_8 Add basic functionality for assignment of ints. This creates a lot of core infrastructure in which to add, with little effort, quite a bit more to mips fast-isel in progress finish integer materialize test cases test cases in progress Finish up fast-isel materialize for ints. Finish materialize for ints test cases simplestorei.ll Merge branch 'master' into 1756_8 fix fp constants for fast-isel Merge branch '1758_1' of dmz-portal.mips.com:llvm into 1758_1 in progress lastest for fp materialization clean up Merge branch 'master' into 1758_1 formatting add test case finish test case Merge branch 'master' into 1758_2 Test Plan: simplestore.ll simplestore.ll Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3659 llvm-svn: 210414	2014-06-08 03:30:32 +00:00
Saleem Abdulrasool	a55e9f811e	test: add test case for SVN r210406 Add missing test case for constructor section selection. Thanks David Blaikie! llvm-svn: 210409	2014-06-08 01:27:32 +00:00
Alp Toker	62946e907c	Fix typos llvm-svn: 210401	2014-06-07 21:23:09 +00:00
Saleem Abdulrasool	6fbe07452a	ARM: correct assertion for long-calls on WoA COFF/PE, so the relocation model is never static. Loosen the assertion accordingly. The relocation can still be emitted properly, as it will be converted to an IMAGE_REL_ARM_ADDR32 which will be resolved by the loader taking the base relocation into account. This is necessary to permit the emission of long calls which can be controlled via the -mlong-calls option in the driver. llvm-svn: 210399	2014-06-07 20:29:27 +00:00
Benjamin Kramer	ab2896f4aa	X86: Don't turn shifts into ands if there's another use that may not check for equality. Fixes PR19964. llvm-svn: 210371	2014-06-06 21:08:55 +00:00
Filipe Cabecinhas	bcbf7c1220	Fixed a bug in lowering shuffle_vectors to insertps Summary: We were being too strict and not accounting for undefs. Added a test case and fixed another one where we improved codegen. Reviewers: grosbach, nadav, delena Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4039 llvm-svn: 210361	2014-06-06 18:07:06 +00:00
Bill Schmidt	647be1ef2c	[PPC64LE] Fix lowering of BUILD_VECTOR and SHUFFLE_VECTOR for little endian This patch fixes a couple of lowering issues for little endian PowerPC. The code for lowering BUILD_VECTOR contains a number of optimizations that are only valid for big endian. For now, we disable those optimizations for correctness. In the future, we will add analogous optimizations that are correct for little endian. When lowering a SHUFFLE_VECTOR to a VPERM operation, we again need to make the now-familiar transformation of swapping the input operands and complementing the permute control vector. Correctness of this transformation is tested by the accompanying test case. llvm-svn: 210336	2014-06-06 14:06:26 +00:00
Rafael Espindola	e5f71f18e0	Allow aliases to be unnamed_addr. Alias with unnamed_addr were in a strange state. It is stored in GlobalValue, the language reference talks about "unnamed_addr aliases" but the verifier was rejecting them. It seems natural to allow unnamed_addr in aliases: * It is a property of how it is accessed, not of the data itself. * It is perfectly possible to write code that depends on the address of an alias. This patch then makes unname_addr legal for aliases. One side effect is that the syntax changes for a corner case: In globals, unnamed_addr is now printed before the address space. llvm-svn: 210302	2014-06-06 01:20:28 +00:00
Bill Schmidt	bca214cf27	[PPC64LE] Add test case for r210282 commit Chandler correctly pointed out that I need an LLVM IR test for r210282, which modified the vperm -> shuffle transform for little endian PowerPC. This patch provides that test. llvm-svn: 210297	2014-06-05 22:57:38 +00:00
Tom Roeder	2e8a372526	Adding explicit triples to the ARM jumptable tests llvm-svn: 210288	2014-06-05 21:40:13 +00:00
Tom Roeder	740d86dc79	Add a new attribute called 'jumptable' that creates jump-instruction tables for functions marked with this attribute. It includes a pass that rewrites all indirect calls to jumptable functions to pass through these tables. This also adds backend support for generating the jump-instruction tables on ARM and X86. Note that since the jumptable attribute creates a second function pointer for a function, any function marked with jumptable must also be marked with unnamed_addr. llvm-svn: 210280	2014-06-05 19:29:43 +00:00
Sasa Stankovic	9817e8490e	[mips] Modify long branch for NaCl: * Move the instruction that changes sp outside of the branch delay slot. * Bundle-align the target of indirect branch. Differential Revision: http://llvm-reviews.chandlerc.com/D3928 llvm-svn: 210262	2014-06-05 13:52:08 +00:00
Sasa Stankovic	b9ef62e8e9	Prevent hoisting the instruction whose def might be clobbered by the terminator. llvm-svn: 210261	2014-06-05 13:42:48 +00:00
Matt Arsenault	700a8a731b	R600: Fix test. Using wrong check prefix. llvm-svn: 210244	2014-06-05 08:00:36 +00:00
Matt Arsenault	9e400b2e26	R600/SI: Match rsq instructions llvm-svn: 210226	2014-06-05 00:15:55 +00:00
Eric Christopher	f3e627ce2e	Revert r209381 as it isn't a local variable. Add a testcase so that we know next time this happens. llvm-svn: 210127	2014-06-03 21:01:39 +00:00
Tilmann Scheller	b619db1a71	[AArch64] Add regression tests for the load/store optimizer which cover post-index update folding with sub rather than add. The tests check that the following transform happens: (ldr\|str) X, [x20] ... sub x20, x20, #16 -> (ldr\|str) X, [x20], #-16 with X being either w0, x0, s0, d0 or q0. llvm-svn: 210113	2014-06-03 16:03:00 +00:00
Tim Northover	d56609ce6c	AArch64: mark small types (i1, i8, i16) as promoted This means the output of LowerFormalArguments returns a lowered SDValue with the correct type (expected in SelectionDAGBuilder). Without this, an assertion under a DEBUG macro triggers when those types are passed on the stack. llvm-svn: 210102	2014-06-03 13:54:53 +00:00
Jiangning Liu	531302fb19	[AArch64] Correctly deal with VPR stack parameter passing. llvm-svn: 210067	2014-06-03 03:25:09 +00:00
Rafael Espindola	87cd774844	Allow alias to point to an arbitrary ConstantExpr. This patch changes GlobalAlias to point to an arbitrary ConstantExpr and it is up to MC (or the system assembler) to decide if that expression is valid or not. This reduces our ability to diagnose invalid uses and how early we can spot them, but it also lets us do things like @test5 = alias inttoptr(i32 sub (i32 ptrtoint (i32* @test2 to i32), i32 ptrtoint (i32* @bar to i32)) to i32) An important implication of this patch is that the notion of aliased global doesn't exist any more. The alias has to encode the information needed to access it in its metadata (linkage, visibility, type, etc). Another consequence to notice is that getSection has to return a "const char ". It could return a NullTerminatedStringRef if there was such a thing, but when that was proposed the decision was to just uses "const char*" for that. llvm-svn: 210062	2014-06-03 02:41:57 +00:00
Andrea Di Biagio	3455d1a524	[X86] Fix checked arithmetic for i8 on X86. When lowering a ISD::BRCOND into a test+branch, make sure that we always use the correct condition code to emit the test operation. This fixes PR19858: "i8 checked mul is wrong on x86". Patch by Keno Fisher! llvm-svn: 210032	2014-06-02 16:00:27 +00:00
Tilmann Scheller	7206d94ee3	[AArch64] Add some more regression tests for store pre-index update folding in the load/store optimizer. Add tests for the following transform: add x8, x8, #16 ... str X, [x8] -> str X, [x8, #16]! with X being either w0, x0, s0, d0 or q0. llvm-svn: 210021	2014-06-02 12:33:33 +00:00
Tilmann Scheller	d9a7937b47	[AArch64] Add some more regression tests for load pre-index update folding in the load/store optimizer. Add tests for the following transform: add x8, x8, #16 ... ldr X, [x8] -> ldr X, [x8, #16]! with X being either w0, x0, s0, d0 or q0. llvm-svn: 210018	2014-06-02 11:57:09 +00:00
Christian Pirker	6d4cce97f1	ARMEB: Fix function return type f64 Reviewed at http://reviews.llvm.org/D3968 llvm-svn: 209990	2014-06-01 09:30:52 +00:00
Matt Arsenault	ff3cea9ab5	R600/SI: Fix [s\|u]int_to_fp for i1 llvm-svn: 209971	2014-05-31 06:47:42 +00:00
Filipe Cabecinhas	c85c3e2c02	Make blend tests more specific Following the lead set by r209324, I'm making these tests match the whole instruction, so we can be sure we're lowering them correctly. llvm-svn: 209947	2014-05-31 00:52:23 +00:00
Andrea Di Biagio	3a03708285	[X86] Add two combine rules to simplify dag nodes introduced during type legalization when promoting nodes with illegal vector type. This patch teaches the backend how to simplify/canonicalize dag node sequences normally introduced by the backend when promoting certain dag nodes with illegal vector type. This patch adds two new combine rules: 1) fold (shuffle (bitcast (BINOP A, B)), Undef, <Mask>) -> (shuffle (BINOP (bitcast A), (bitcast B)), Undef, <Mask>) 2) fold (BINOP (shuffle (A, Undef, <Mask>)), (shuffle (B, Undef, <Mask>))) -> (shuffle (BINOP A, B), Undef, <Mask>). Both rules are only triggered on the type-legalized DAG. In particular, rule 1. is a target specific combine rule that attempts to sink a bitconvert into the operands of a binary operation. Rule 2. is a target independet rule that attempts to move a shuffle immediately after a binary operation. llvm-svn: 209930	2014-05-30 23:17:53 +00:00
Filipe Cabecinhas	8abf11ea97	Convert a vselect into a concat_vector if possible Summary: If both vector args to vselect are concat_vectors and the condition is constant and picks half a vector from each argument, convert the vselect into a concat_vectors. Added a test. The ConvertSelectToConcatVector is assuming it doesn't get vselects with arguments of, for example, <undef, undef, true, true>. Those get taken care of in the checks above its call. Reviewers: nadav, delena, grosbach, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3916 llvm-svn: 209929	2014-05-30 23:03:11 +00:00
Filipe Cabecinhas	89440ec19e	Separate the check for blend shuffle_vector masks Summary: Separate the check for blend shuffle_vector masks into isBlendMask. This function will also be used to check if a vector shuffle is legal. No change in functionality was intended, but we ended up improving codegen on two tests, which were being (more) optimized only if the resulting shuffle was legal. Reviewers: nadav, delena, andreadb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3964 llvm-svn: 209923	2014-05-30 21:31:21 +00:00
Logan Chien	79b8446257	Fix MIPS exception personality encoding. For MIPS, we have to encode the personality routine with an indirect pointer to absptr; otherwise, some link warning warning will be raised, and the program might crash in some early MIPS Android device. llvm-svn: 209907	2014-05-30 16:48:56 +00:00
Rafael Espindola	d1ec35ff7d	[pr19636] Fix known bit computation in urem instruction with power of two. Patch by Andrey Kuharev. llvm-svn: 209902	2014-05-30 15:00:45 +00:00
Tim Northover	89515a61ad	SelectionDAG: skip barriers for unordered atomic operations Unordered is strictly weaker than monotonic, so if the latter doesn't have any barriers then the former certainly shouldn't. rdar://problem/16548260 llvm-svn: 209901	2014-05-30 14:41:51 +00:00
Tim Northover	6ee9050b92	ARM: use AAPCS-style prologues for embedded MachO. Darwin prologues save their GPRs in two stages: a narrow push of r0-r7 & lr, followed by a wide push of the remaining registers if there are any. AAPCS uses a single push.w instruction. It turns out that, on average, enough registers get pushed that code is smaller in the AAPCS prologue, which is a nice property for M-class programmers. They also have other options available for back-traces, so can hopefully deal with the fact that FP & LR aren't adjacent in memory. rdar://problem/15909583 llvm-svn: 209895	2014-05-30 13:23:06 +00:00
Tim Northover	73e8ecbf7f	AArch64 & ARM: disable generic test that relies on no CFG changes. llvm-svn: 209885	2014-05-30 10:56:12 +00:00
Tim Northover	3bb84c9bcc	ARM & AArch64: make use of common cmpxchg idioms after expansion The C and C++ semantics for compare_exchange require it to return a bool indicating success. This gets mapped to LLVM IR which follows each cmpxchg with an icmp of the value loaded against the desired value. When lowered to ldxr/stxr loops, this extra comparison is redundant: its results are implicit in the control-flow of the function. This commit makes two changes: it replaces that icmp with appropriate PHI nodes, and then makes sure earlyCSE is called after expansion to actually make use of the opportunities revealed. I've also added -{arm,aarch64}-enable-atomic-tidy options, so that existing fragile tests aren't perturbed too much by the change. Many of them either rely on undef/unreachable too pervasively to be restored to something well-defined (particularly while making sure they test the same obscure assert from many years ago), or depend on a particular CFG shape, which is disrupted by SimplifyCFG. rdar://problem/16227836 llvm-svn: 209883	2014-05-30 10:09:59 +00:00
Tim Northover	6f54bcc408	AArch64 & ARM: remove undefined behaviour from some tests. llvm-svn: 209880	2014-05-30 08:59:55 +00:00
Hao Liu	f3d89d6049	Test cases named with dates is a legacy rule not used now. Rename several test cases. llvm-svn: 209877	2014-05-30 05:58:19 +00:00
Adam Nemet	a1d842ca80	[X86] Move test from r209863 to CodeGen/X86 We should only run this if X86 is in the targets. llvm-svn: 209866	2014-05-29 23:52:53 +00:00
Adam Nemet	94b6f19596	[X86] Remove AVX1 vbroadcast intrinsics The corresponding CFE patch replaces these intrinsics with vector initializers in avxintrin.h. This patch removes the LLVM intrinsics from the backend. We now stop lowering at X86ISD::VBROADCAST custom node rather than lowering that further to the intrinsics. The patch only changes VBROADCASTS* and leaves VBROADCAST[FI]128 to continue to use intrinsics. As explained in the CFE patch, the reason is that we currently don't generate as good code for them without the intrinsics. CodeGen/X86/avx-vbroadcast.ll already provides coverage for this change. It checks that for a series of insertelements we generate the appropriate vbroadcast instruction. Also verified that there was no assembly change in the test-suite before and after this patch. llvm-svn: 209864	2014-05-29 23:35:36 +00:00
Filipe Cabecinhas	0213510318	Added tests for shufflevector lowering to blend instrs. These tests ensure that a change I will propose in clang works as expected. Summary: Added tests for the generation of blend+immediate instructions from a shufflevector. These tests were proposed along with a patch that was dropped. I'm committing the tests anyway to protect against possible regressions in codegen. Reviewers: nadav, bkramer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3600 llvm-svn: 209853	2014-05-29 22:04:42 +00:00
Rafael Espindola	34f4870951	[PPC] Use alias symbols in address computation. This seems to match what gcc does for ppc and what every other llvm backend does. This is a fixed version of r209638. The difference is to avoid any change in behavior for functions. The logic for using constant pools for function addresseses is spread over a few places and we have to keep them in sync. llvm-svn: 209821	2014-05-29 15:41:38 +00:00
Rafael Espindola	c22e74a901	Add a test showing the ppc code sequence for getting a function pointer. This would have found the miscompile in r209638. llvm-svn: 209820	2014-05-29 15:13:23 +00:00
Hao Liu	60bfd53df1	Rename a test case to contain correct date info. llvm-svn: 209799	2014-05-29 09:21:23 +00:00
Hao Liu	0e99724daa	Fix an assertion failure caused by v1i64 in DAGCombiner Shrink. llvm-svn: 209798	2014-05-29 09:19:07 +00:00
Michael J. Spencer	836291b5ff	[x86] Fold extract_vector_elt of a load into the Load's address computation. An address only use of an extract element of a load can be simplified to a load. Without this the result of the extract element is spilled to the stack so that an address is available. llvm-svn: 209788	2014-05-29 01:42:45 +00:00
Rafael Espindola	acdb307db3	[pr19844] Add thread local mode to aliases. This matches gcc's behavior. It also seems natural given that aliases contain other properties that govern how it is accessed (linkage, visibility, dll storage). Clang still has to be updated to expose this feature to C. llvm-svn: 209759	2014-05-28 18:15:43 +00:00
Hal Finkel	530ec71f07	Revert "[DAGCombiner] Split up an indexed load if only the base pointer value is live" This reverts r208640 (I've just XFAILed the test) because it broke ppc64/Linux self-hosting. Because nearly every regression test triggers a segfault, I hope this will be easy to fix. llvm-svn: 209747	2014-05-28 15:33:19 +00:00
Hal Finkel	47a225fb6c	Revert "[PPC] Use alias symbols in address computation." This reverts commit r209638 because it broke self-hosting on ppc64/Linux. (the Clang-compiled TableGen would segfault because it jumped to an invalid address from within _ZNK4llvm17ManagedStaticBase21RegisterManagedStaticEPFPvvEPFvS1_E (which is within the command-line parameter registration process)). llvm-svn: 209745	2014-05-28 15:25:06 +00:00
Tilmann Scheller	99ec622e7d	[AArch64] Add store post-index update folding regression tests for the load/store optimizer. Add regression tests for the following transformation: str X, [x20] ... add x20, x20, #32 -> str X, [x20], #32 with X being either w0, x0, s0, d0 or q0. llvm-svn: 209715	2014-05-28 06:43:00 +00:00
Tilmann Scheller	371ff7e9ac	[AArch64] Add load post-index update folding regression tests for the load/store optimizer. Add regression tests for the following transformation: ldr X, [x20] ... add x20, x20, #32 -> ldr X, [x20], #32 with X being either w0, x0, s0, d0 or q0. llvm-svn: 209711	2014-05-28 05:44:14 +00:00
Sasa Stankovic	960a4f90a1	[mips] Optimize long branch for MIPS64 by removing %higher and %highest. %higher and %highest can have non-zero values only for offsets greater than 2GB, which is highly unlikely, if not impossible when compiling a single function. This makes long branch for MIPS64 3 instructions smaller. Differential Revision: http://llvm-reviews.chandlerc.com/D3281.diff llvm-svn: 209678	2014-05-27 18:53:06 +00:00
Tim Northover	0dd1dfabfd	AArch64: add test for NZCV cross-copy save. llvm-svn: 209665	2014-05-27 16:50:09 +00:00
Tim Northover	13435480e0	AArch64: add AArch64-specific test for 'c' and 'n'. llvm-svn: 209664	2014-05-27 16:50:03 +00:00
Bill Schmidt	b806d02b5b	[PATCH] Correct type used for VADD_SPLAT optimization on PowerPC In PPCISelLowering.cpp: PPCTargetLowering::LowerBUILD_VECTOR(), there is an optimization for certain patterns to generate one or two vector splats followed by a vector add or subtract. This operation is represented by a VADD_SPLAT in the selection DAG. Prior to this patch, it was possible for the VADD_SPLAT to be assigned the wrong data type, causing incorrect code generation. This patch corrects the problem. Specifically, the code previously assigned the value type of the BUILD_VECTOR node to the newly generated VADD_SPLAT node. This is correct much of the time, but not always. The problem is that the call to isConstantSplat() may return a SplatBitSize that is not the same as the number of bits in the original element vector type. The correct type to assign is a vector type with the same element bit size as SplatBitSize. The included test case shows an example of this, where the BUILD_VECTOR node has a type of v16i8. The vector to be built is {0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16, 0, 16}. isConstantSplat detects that we can generate a splat of 16 for type v8i16, which is the type we must assign to the VADD_SPLAT node. If we do not, we generate a vspltisb of 8 and a vaddubm, which generates the incorrect result {16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16}. The correct code generation is a vspltish of 8 and a vadduhm. This patch also corrected code generation for CodeGen/PowerPC/2008-07-10-SplatMiscompile.ll, which had been marked as an XFAIL, so we can remove the XFAIL from the test case. llvm-svn: 209662	2014-05-27 15:57:51 +00:00
Amara Emerson	77fc34e95e	[ARM] Emit correct build attributes for the relocation models. Patch by Asiri Rathnayake. llvm-svn: 209656	2014-05-27 13:30:21 +00:00
Tim Northover	2172cefdfd	ARM: teach AAPCS-VFP to deal with Cortex-M4. Cortex-M4 only has single-precision floating point support, so any LLVM "double" type will have been split into 2 i32s by now. Fortunately, the consecutive-register framework turns out to be precisely what's needed to reconstruct the double and follow AAPCS-VFP correctly! rdar://problem/17012966 llvm-svn: 209650	2014-05-27 10:43:38 +00:00
Filipe Cabecinhas	7df1221986	Convert some X86 blendv* intrinsics into IR. Summary: Implemented an InstCombine transformation that takes a blendv* intrinsic call and translates it into an IR select, if the mask is constant. This will eventually get lowered into blends with immediates if possible, or pblendvb (with an option to further optimize if we can transform the pblendvb into a blend+immediate instruction, depending on the selector). It will also enable optimizations by the IR passes, which give up on sight of the intrinsic. Both the transformation and the lowering of its result to asm got shiny new tests. The transformation is a bit convoluted because of blendvp[sd]'s definition: Its mask is a floating point value! This forces us to convert it and get the highest bit. I suppose this happened because the mask has type __m128 in Intel's intrinsic and v4sf (for blendps) in gcc's builtin. I will send an email to llvm-dev to discuss if we want to change this or not. Reviewers: grosbach, delena, nadav Differential Revision: http://reviews.llvm.org/D3859 llvm-svn: 209643	2014-05-27 03:42:20 +00:00
Rafael Espindola	94cd9a1ed6	[PPC] Use alias symbols in address computation. This seems to match what gcc does for ppc and what every other llvm backend does. llvm-svn: 209638	2014-05-26 19:08:19 +00:00
Tim Northover	10cffb6eef	AArch64: force i1 to be zero-extended at an ABI boundary. This commit is debatable. There are two possible approaches, neither of which is really satisfactory: 1. Use "@foo(i1 zeroext)" to mean an extension to 32-bits on Darwin, and 8 bits otherwise. 2. Redefine "@foo(i1)" to mean that the i1 is extended by the caller to 8 bits. This goes against the spirit of "zeroext" I think, but it's a bit of a vague construct anyway (by definition you're going to extend to the amount required by the ABI, that's why it's the ABI!). This implements option 2. The DAG machinery really isn't setup for the first (there's a fairly strong assumption that "zeroext" goes to at least the smallest register size), and even if it was the resulting DAG looks like it would be inferior in many cases. Theoretically we could add AssertZext nodes in the consumers of ABI-passed values too now, but this actually seems to make the code worse in practice by making truncation proceed in two steps. The code produced is equally valid if we continue to assume only the low bit is defined. Should fix PR19850 llvm-svn: 209637	2014-05-26 17:22:07 +00:00
Tim Northover	fc1b1e8952	AArch64: simplify calling conventions slightly. We can eliminate the custom C++ code in favour of some TableGen to check the same things. Functionality should be identical, except for a buffer overrun that was present in the C++ code and meant webkit failed if any small argument needed to be passed on the stack. llvm-svn: 209636	2014-05-26 17:21:53 +00:00
Tilmann Scheller	0d5b6aa201	[AArch64] Add store + add folding regression tests for the load/store optimization pass. Add tests for the following transform: str X, [x0, #32] ... add x0, x0, #32 -> str X, [x0, #32]! with X being either w1, x1, s0, d0 or q0. llvm-svn: 209627	2014-05-26 13:36:47 +00:00
Tilmann Scheller	bf3f06fd14	[AArch64] Add more regression tests for the load/store optimization pass. Cover the following cases: ldr X, [x0, #32] ... add x0, x0, #32 -> ldr X, [x0, #32]! with X being either w1, x1, s0, d0 or q0. llvm-svn: 209624	2014-05-26 12:15:51 +00:00
Tilmann Scheller	4b78dd1812	Remove accidentally committed whitespace. llvm-svn: 209619	2014-05-26 09:40:40 +00:00
Tilmann Scheller	f505625e35	[AArch64] Add a regression test for the load store optimizer. We have a couple of regression tests for load/store pairing, but (to my knowledge) there are no regression tests for the load/store + add/sub folding. As a first step towards increased test coverage of this area, this commit adds a test for one instance of a load + add to pre-indexed load transformation. llvm-svn: 209618	2014-05-26 09:37:19 +00:00
Rafael Espindola	93615d7bf8	Just check the entire string. Thanks to David Blaikie for the suggestion. llvm-svn: 209610	2014-05-26 04:08:51 +00:00
Rafael Espindola	a40a842933	Emit data or code export directives based on the type. Currently we look at the Aliasee to decide what type of export directive to use. It seems better to use the type of the alias directly. This is similar to how we handle the alias having the same address but other attributes (linkage, visibility) from the aliasee. With this patch it is now possible to do things like target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-pc-windows-msvc" @foo = global [6 x i8] c"\B8\00\00\00\C3", section ".text", align 16 @f = dllexport alias i32 (), [6 x i8] @foo !llvm.module.flags = !{!0} !0 = metadata !{i32 6, metadata !"Linker Options", metadata !1} !1 = metadata !{metadata !2, metadata !3} !2 = metadata !{metadata !"/DEFAULTLIB:libcmt.lib"} !3 = metadata !{metadata !"/DEFAULTLIB:oldnames.lib"} llvm-svn: 209600	2014-05-25 12:49:07 +00:00
Rafael Espindola	885673b0ec	Make these CHECKs a bit more strict. The " at the end of the line makes sure we matched the entire directive. llvm-svn: 209599	2014-05-25 12:43:13 +00:00
Tim Northover	ca0f4dc4f0	AArch64/ARM64: move ARM64 into AArch64's place This commit starts with a "git mv ARM64 AArch64" and continues out from there, renaming the C++ classes, intrinsics, and other target-local objects for consistency. "ARM64" test directories are also moved, and tests that began their life in ARM64 use an arm64 triple, those from AArch64 use an aarch64 triple. Both should be equivalent though. This finishes the AArch64 merge, and everyone should feel free to continue committing as normal now. llvm-svn: 209577	2014-05-24 12:50:23 +00:00
Tim Northover	d7f173214f	AArch64/ARM64: remove AArch64 from tree prior to renaming ARM64. I'm doing this in two phases for a better "git blame" record. This commit removes the previous AArch64 backend and redirects all functionality to ARM64. It also deduplicates test-lines and removes orphaned AArch64 tests. The next step will be "git mv ARM64 AArch64" and rewire most of the tests. Hopefully LLVM is still functional, though it would be even better if no-one ever had to care because the rename happens straight afterwards. llvm-svn: 209576	2014-05-24 12:42:26 +00:00
Tim Northover	8d0c65ea6b	ARM64: extract a 32-bit subreg when selecting an inreg extend After the load/store refactoring, we were sometimes trying to feed a GPR64 into a 32-bit register offset operand. This failed in copyPhysReg. llvm-svn: 209566	2014-05-24 07:05:42 +00:00
Nico Rieck	03369ca3aa	Revert part of "Fix broken FileCheck prefixes" This reverts part of commit r209538. llvm-svn: 209544	2014-05-23 19:33:49 +00:00
Rafael Espindola	79a001c01e	Use alias linkage and visibility to decide tls access mode. This matches both what we do for the non-thread case and what gcc does. With this patch clang would match gcc's behaviour in static __thread int a = 42; extern __thread int b __attribute__((alias("a"))); int f(void) { return &a; } int g(void) { return &b; } if not for pr19843. Manually writing the IL does produce the same access modes. It is also a step in the direction of fixing pr19844. llvm-svn: 209543	2014-05-23 19:16:56 +00:00
Nico Rieck	b7010b8ef7	Remove unused CHECK lines llvm-svn: 209539	2014-05-23 19:06:44 +00:00
Nico Rieck	e0fe65b49d	Fix broken FileCheck prefixes llvm-svn: 209538	2014-05-23 19:06:24 +00:00
Rafael Espindola	1c383955e0	Convert test to use FileCheck. llvm-svn: 209528	2014-05-23 16:51:13 +00:00
Daniel Sanders	b781c1f734	[mips][mips64r6] [ls][dw][lr] are not available in MIPS32r6/MIPS64r6 Summary: Instead the system is required to provide some means of handling unaligned load/store without special instructions. Options include full hardware support, full trap-and-emulate, and hybrids such as hardware support within a cache line and trap-and-emulate for multi-line accesses. MipsSETargetLowering::allowsUnalignedMemoryAccesses() has been configured to assume that unaligned accesses are 'fast' on the basis that I expect few hardware implementations will opt for pure-software handling of unaligned accesses. The ones that do handle it purely in software can override this. mips64-load-store-left-right.ll has been merged into load-store-left-right.ll The stricter testing revealed a Bits!=Bytes bug in passByValArg(). This has been fixed and the variables renamed to clarify the units they hold. Reviewers: zoran.jovanovic, jkolek, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3872 llvm-svn: 209512	2014-05-23 13:18:02 +00:00
Jiangning Liu	97cbeb5ba8	[ARM64] Fix a bug in shuffle vector lowering to generate corect vext ISD with swapped input vectors. llvm-svn: 209495	2014-05-23 02:54:50 +00:00
Matt Arsenault	bfc007dbb5	R600: Try to convert BFE back to standard bit ops when possible. This allows existing DAG combines to work on them, and then we can re-match to BFE if necessary during instruction selection. llvm-svn: 209462	2014-05-22 18:09:12 +00:00
Matt Arsenault	90d0fd2ea0	R600: Add dag combine for BFE llvm-svn: 209461	2014-05-22 18:09:07 +00:00
Matt Arsenault	4ab9246e99	R600: Implement ComputeNumSignBitsForTargetNode for BFE llvm-svn: 209460	2014-05-22 18:09:03 +00:00
Matt Arsenault	c7d0679684	R600: Expand mul24 for GPUs without it llvm-svn: 209458	2014-05-22 18:00:24 +00:00
Matt Arsenault	fcb6cf68ee	R600: Expand mad24 for GPUs without it llvm-svn: 209457	2014-05-22 18:00:20 +00:00
Matt Arsenault	e43426533f	R600: Add intrinsics for mad24 llvm-svn: 209456	2014-05-22 18:00:15 +00:00
Andrea Di Biagio	98dd66445e	[X86] Improve the lowering of BITCAST from MVT::f64 to MVT::v4i16/MVT::v8i8. This patch teaches the x86 backend how to efficiently lower ISD::BITCAST dag nodes from MVT::f64 to MVT::v4i16 (and vice versa), and from MVT::f64 to MVT::v8i8 (and vice versa). This patch extends the logic from revision 208107 to also handle MVT::v4i16 and MVT::v8i8. Also, this patch correctly propagates Undef values when performing the widening of a vector (example: when widening from v2i32 to v4i32, the upper 64bits of the resulting vector are 'undef'). llvm-svn: 209451	2014-05-22 16:21:39 +00:00
Tim Northover	6877f3a322	Segmented stacks: omit __morestack call when there's no frame. Patch by Florian Zeitz llvm-svn: 209436	2014-05-22 13:03:43 +00:00
Daniel Sanders	e5b7eb47f6	[mips] Make unalignedload.ll test stricter and easier to modify for MIPS32r6/MIPS64r6 Summary: * Split into two functions, one to test each struct. * R0 and R2 must be defined by an lw with a %got reference to the correct symbol. * Test for $4 (first argument) where appropriate instead of accepting any register. * Test that the two lbu's are correctly combined into $4 Depends on D3844 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3845 llvm-svn: 209424	2014-05-22 11:55:04 +00:00
Daniel Sanders	c64e291555	[mips] Change lwl and lwr in inlineasm_constraint.ll to lw Summary: lwl and lwr are not available in MIPS32r6/MIPS64r6. The purpose of the test is to check that the '$1' expands to '0($x)' rather than to test something related to the lwl or lwr instructions so we can simply switch to lw. Depends on D3842 Reviewers: jkolek, zoran.jovanovic, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3844 llvm-svn: 209423	2014-05-22 11:51:06 +00:00
Daniel Sanders	a0467a4d71	[mips] Use addiu in inline assembly tests since addi is not available in all ISA's Summary: This patch is necessary so that they do not fail on MIPS32r6/MIPS64r6 when -integrated-as is enabled by default and we correctly detect the host CPU. No functional change since these tests are testing the behaviour of the constraint used for the third operand rather than the mnemonic. Depends on D3842 Reviewers: zoran.jovanovic, jkolek, vmedic Reviewed By: vmedic Differential Revision: http://reviews.llvm.org/D3843 llvm-svn: 209421	2014-05-22 11:46:58 +00:00
Tim Northover	c8bed61f8e	AArch64/ARM64: enable more AArch64 tests. llvm-svn: 209408	2014-05-22 07:40:55 +00:00
Saleem Abdulrasool	fa4b3f6e65	ARM: introduce llvm.arm.undefined intrinsic This intrinsic permits the emission of platform specific undefined sequences. ARM has reserved the 0xde opcode which takes a single integer parameter (ignored by the CPU). This permits the operating system to implement custom behaviour on this trap. The llvm.arm.undefined intrinsic is meant to provide a means for generating the target specific behaviour from the frontend. This is particularly useful for Windows on ARM which has made use of a series of these special opcodes. llvm-svn: 209390	2014-05-22 04:46:46 +00:00
Matt Arsenault	cec6ae49e8	R600/SI: Match fp_to_uint / uint_to_fp for f64 llvm-svn: 209388	2014-05-22 03:20:30 +00:00
David Blaikie	f148b7f5aa	DebugInfo: Use the SPMap to find the parent CU of inlined functions as they may not be in the current CU Committed in r209178 then reverted in r209251 due to LTO breakage, here's a proper fix for the case of the missing subprogram DIE. The DIEs were there, just in other compile units. Using the SPMap we can find the right compile unit to search for and produce cross-unit references to describe this kind of inlining. One existing test case needed to be updated because it had a function that wasn't in the CU's subprogram list, so it didn't appear in the SPMap. llvm-svn: 209335	2014-05-21 23:14:12 +00:00
Matt Arsenault	094f9f1e9c	R600: Partially fix constant initializers for structs and vectors. This should extend the current workaround to work with structs that only contain legal, scalar types. llvm-svn: 209331	2014-05-21 22:42:42 +00:00
Matt Arsenault	dc960fcadc	R600: Add failing testcases for constant initializers. Constant initializers involving illegal types hit an assertion. Patch by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 209330	2014-05-21 22:42:38 +00:00
Quentin Colombet	b70fffa971	[X86] Fix a bug in the lowering of BLENDI introduced in r209043. ISD::VSELECT mask uses 1 to identify the first argument and 0 to identify the second argument. On the other hand, BLENDI uses 0 to identify the first argument and 1 to identify the second argument. Fix the generation of the blend mask to account for this difference. The bug did not show up with r209043, because we were not checking for the actual arguments of the blend instruction! This commit also fixes the test cases. Note: The same mask works for the BLENDr variant because the arguments are swapped during instruction selection (see the BLENDXXrr patterns). <rdar://problem/16975435> llvm-svn: 209324	2014-05-21 22:00:39 +00:00
Dave Estes	7d24ae5558	Test comment commit. llvm-svn: 209306	2014-05-21 16:19:51 +00:00
Saleem Abdulrasool	7f0235499b	ARM: correct bundle generation for MOV32T relocations Although the previous code would construct a bundle and add the correct elements to it, it would not finalise the bundle. This resulted in the InternalRead markers not being added to the MachineOperands nor, more importantly, the externally visible defs to the bundle itself. So, although the bundle was not exposing the def, the generated code would be correct because there was no optimisations being performed. When optimisations were enabled, the post register allocator would kick in, and the hazard recognizer would reorder operations around the load which would define the value being operated upon. Rather than manually constructing the bundle, simply construct and finalise the bundle via the finaliseBundle call after both MIs have been emitted. This improves the code generation with optimisations where IMAGE_REL_ARM_MOV32T relocations are emitted. The changes to the other tests are the result of the bundle generation preventing the scheduler from hoisting the moves across the loads. The net effect of the generated code is equivalent, but, is much more identical to what is actually being lowered. llvm-svn: 209267	2014-05-21 01:25:24 +00:00
Alexey Samsonov	e152790b3c	Fix test added in r209242: llc shouldn't create files in source tree llvm-svn: 209252	2014-05-20 22:40:31 +00:00
Adam Nemet	6b3d606a41	[ARM64] PR19792: Fix cycle in DAG after performPostLD1Combine Povray and dealII currently assert with "Overran sorted position" in AssignTopologicalOrder. The problem is that performPostLD1Combine can introduce cycles. Consider: (insert_vector_elt (INSERT_SUBREG undef, (load (add %vreg0, Constant<8>), undef), <= A TargetConstant<2>), (load %vreg0, undef), <= B Constant<1>) This is turned into a LD1LANEpost node. However the address in A is not a valid user of the post-incremented address of B in LD1LANEpost. llvm-svn: 209242	2014-05-20 21:47:07 +00:00
Eric Christopher	974cef18f4	Move the function and data section flags into the options struct and make the functions to set them non-static. Move and rename the llvm specific backend options to avoid conflicting with the clang option. Paired with a backend commit to update. llvm-svn: 209238	2014-05-20 21:25:34 +00:00
Quentin Colombet	c4e9e184e8	[LSR] Canonicalize reg1 + ... + regN into reg1 + ... + 1*regN. This commit introduces a canonical representation for the formulae. Basically, as soon as a formula has more that one base register, the scaled register field is used for one of them. The register put into the scaled register is preferably a loop variant. The commit refactors how the formulae are built in order to produce such representation. This yields a more accurate, but still perfectible, cost model. <rdar://problem/16731508> llvm-svn: 209230	2014-05-20 19:25:04 +00:00
Renato Golin	7ee7479e73	Avoids DCE on write_register llvm-svn: 209222	2014-05-20 17:40:03 +00:00
Adam Nemet	37337f0359	[PowerPC] PR19796: Also match ISD::TargetConstant in isIntS16Immediate The SplitIndexingFromLoad changes exposed a latent isel bug in the PowerPC64 backend. We matched an immediate offset with STWX8 even though it only supports register offset. The culprit is the complex-pattern predicate, SelectAddrIdx, which decides that if the offset is not ISD::Constant it must be a register. Many thanks to Bill Schmidt for testing this. llvm-svn: 209219	2014-05-20 17:20:34 +00:00
Benjamin Kramer	5f1713b9ab	Legalizer: Make bswap promotion safe for vectors. llvm-svn: 209202	2014-05-20 09:42:31 +00:00
David Blaikie	c6084e64ed	DebugInfo: Assume all subprogram DIEs have been created before any abstract subprograms are constructed. Since we visit the whole list of subprograms for each CU at module start, this is clearly true - don't test for the case, just assert it. A few old test cases seemed to have incomplete subprogram lists, but any attempt to reproduce them shows full subprogram lists that even include entities that have been completely inlined and the out of line definition removed. llvm-svn: 209178	2014-05-19 23:16:19 +00:00
Chad Rosier	3052ad5e5f	[ARM64] Adds Cortex-A53 scheduling support for vector load/store post. Patch by Dave Estes<cestes@codeaurora.org>! PR19761 http://reviews.llvm.org/D3829 llvm-svn: 209176	2014-05-19 22:59:51 +00:00
Andrea Di Biagio	41bcee5bc3	[X86] Add ISel patterns to improve the selection of TZCNT and LZCNT. Instructions TZCNT (requires BMI1) and LZCNT (requires LZCNT), always provide the operand size as output if the input operand is zero. We can take advantage of this knowledge during instruction selection stage in order to simplify a few corner case. llvm-svn: 209159	2014-05-19 20:38:59 +00:00
Filipe Cabecinhas	f09daeadf1	Added more insertps optimizations Summary: When inserting an element that's coming from a vector load or a broadcast of a vector (or scalar) load, combine the load into the insertps instruction. Added PerformINSERTPSCombine for the case where we need to fix the load (load of a vector + insertps with a non-zero CountS). Added patterns for the broadcasts. Also added tests for SSE4.1, AVX, and AVX2. Reviewers: delena, nadav, craig.topper Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D3581 llvm-svn: 209156	2014-05-19 19:45:57 +00:00
Jyotsna Verma	9134441949	reverting r209132 llvm-svn: 209139	2014-05-19 16:22:11 +00:00
Bradley Smith	46b39e0f70	[ARM64] Split tbz/tbnz into W/X register variant llvm-svn: 209134	2014-05-19 15:58:15 +00:00
Jyotsna Verma	dc58cfbd79	Hexagon: Add encoding bits to the mpy instructions. llvm-svn: 209132	2014-05-19 15:32:07 +00:00
Benjamin Kramer	600e24a1cb	SDAG: Legalize vector BSWAP into a shuffle if the shuffle is legal but the bswap not. - On ARM/ARM64 we get a vrev because the shuffle matching code is really smart. We still unroll anything that's not v4i32 though. - On X86 we get a pshufb with SSSE3. Required more cleverness in isShuffleMaskLegal. - On PPC we get a vperm for v8i16 and v4i32. v2i64 is unrolled. llvm-svn: 209123	2014-05-19 13:12:38 +00:00
Filipe Cabecinhas	47b024f100	Change the blend tests to AVX, not AVX2. llvm-svn: 209107	2014-05-19 04:47:12 +00:00
Saleem Abdulrasool	6ab7957740	ARM: improve WoA ABI conformance for frame register Windows on ARM uses R11 for the frame pointer even though the environment is a pure Thumb-2, thumb-only environment. Replicate this behaviour to improve Windows ABI compatibility. This register is used for fast stack walking, and thus is part of the Windows ABI. llvm-svn: 209085	2014-05-18 04:12:52 +00:00
Saleem Abdulrasool	ca75a224fe	test: fix copy-paste mistake Accidental over-quoting of the match string. llvm-svn: 209058	2014-05-17 04:32:38 +00:00
Saleem Abdulrasool	b8e42995a4	ARM: use the proper target object format for WoA WoA uses COFF, not ELF. ARMISelLowering::createTLOF would previously return ELF for any non-MachO platform. This was a missed site when the original change for target format support for Windows on ARM was done. llvm-svn: 209057	2014-05-17 04:28:08 +00:00
Chandler Carruth	a0362e551c	[x86] Fix a bad predicate I spotted by inspection -- pshufhw and pshuflw were added in SSE2, no SSSE3. Found this while auditing all uses of SSSE3 in the X86 target. I don't actually expect this to make a significant difference on anything and I don't have any detailed test cases but I updated the existing test cases that already covered some of this code path. llvm-svn: 209056	2014-05-17 03:29:20 +00:00
Filipe Cabecinhas	3d72585a01	Implemented special cases for PerformVSELECTCombine. vselects with constant masks, after legalization, will get turned into specialized shuffle_vectors so they can be matched to blend+imm instructions. Fixed some tests. llvm-svn: 209044	2014-05-16 22:47:54 +00:00
Filipe Cabecinhas	9acd5d4e5e	Lower vselects into X86ISD::BLENDI when appropriate. LowerVSELECT will, if possible, generate a X86ISD::BLENDI DAG node if the condition is constant and we can emit that instruction, given the subtarget. This is not enough for all cases. An additional SELECTCombine optimization will be committed. Fixed tests that were expecting variable blends but where a blend+imm can be generated. Added test where we can't emit blend+immediate. Added avx2 blend+imm tests. llvm-svn: 209043	2014-05-16 22:47:49 +00:00
Tom Stellard	2022c1eb1b	R600/SI: Promote f32 SELECT to i32 llvm-svn: 209024	2014-05-16 20:56:41 +00:00
Rafael Espindola	c5f7a8c70e	Fix most of PR10367. This patch changes the design of GlobalAlias so that it doesn't take a ConstantExpr anymore. It now points directly to a GlobalObject, but its type is independent of the aliasee type. To avoid changing all alias related tests in this patches, I kept the common syntax @foo = alias i32* @bar to mean the same as now. The cases that used to use cast now use the more general syntax @foo = alias i16, i32* @bar. Note that GlobalAlias now behaves a bit more like GlobalVariable. We know that its type is always a pointer, so we omit the '*'. For the bitcode, a nice surprise is that we were writing both identical types already, so the format change is minimal. Auto upgrade is handled by looking through the casts and no new fields are needed for now. New bitcode will simply have different types for Alias and Aliasee. One last interesting point in the patch is that replaceAllUsesWith becomes smart enough to avoid putting a ConstantExpr in the aliasee. This seems better than checking and updating every caller. A followup patch will delete getAliasedGlobal now that it is redundant. Another patch will add support for an explicit offset. llvm-svn: 209007	2014-05-16 19:35:39 +00:00
David Blaikie	5e7d4ec337	DebugInfo: Assume the CU's Subprogram list only contains definitions. DIBuilder maintains this invariant and the current DwarfDebug code could end up doing weird things if it contained declarations (such as putting the definition DIE inside a CU that contained the declaration - this doesn't seem like a good idea, so rather than adding logic to handle this case we'll just ban in for now & cross that bridge if we come to it later). llvm-svn: 209004	2014-05-16 18:26:53 +00:00
Chad Rosier	c6f45bc281	[ARM64] Increases the Sched Model accuracy for Cortex-A53. Patch by Dave Estes <cestes@codeaurora.org> http://reviews.llvm.org/D3769 llvm-svn: 209001	2014-05-16 17:15:33 +00:00
James Molloy	068abd8acc	Re-enable inline memcpy expansion for Thumb1. Patch by Moritz Roth! llvm-svn: 208994	2014-05-16 14:24:22 +00:00
James Molloy	a4b2ec478f	Fix the Load/Store optimization pass to work with Thumb1. Patch by Moritz Roth! llvm-svn: 208992	2014-05-16 14:14:30 +00:00
Rafael Espindola	6d40091c3c	Revert "Implement global merge optimization for global variables." This reverts commit r208934. The patch depends on aliases to GEPs with non zero offsets. That is not supported and fairly broken. The good news is that GlobalAlias is being redesigned and will have support for offsets, so this patch should be a nice match for it. llvm-svn: 208978	2014-05-16 13:02:18 +00:00
Tim Northover	31e1362588	TableGen: fix operand counting for aliases TableGen has a fairly dubious heuristic to decide whether an alias should be printed: does the alias have lest operands than the real instruction. This is bad enough (particularly with no way to override it), but it should at least be calculated consistently for both strings. This patch implements that logic: first get the correct string for the variant, in the same way as the Matcher, without guessing; then count the number of whitespace chars. There are basically 4 changes this brings about after the previous commits; all of these appear to be good, so I have changed the tests: + ARM64: we print "neg X, Y" instead of "sub X, xzr, Y". + ARM64: we skip implicit "uxtx" and "uxtw" modifiers. + Sparc: we print "mov A, B" instead of "or %g0, A, B". + Sparc: we print "fcmpX A, B" instead of "fcmpX %fcc0, A, B" llvm-svn: 208969	2014-05-16 09:42:04 +00:00
Hao Liu	effb003c48	[ARM64]Implement NEON post-increment LD1(lane) and post-increment LD1R. llvm-svn: 208955	2014-05-16 09:39:02 +00:00
Saleem Abdulrasool	a79a4b0c34	ARM: add some integer/floating point conversion libcalls Add some Windows on ARM specific library calls. These are provided by msvcrt, and can be used to perform integer to floating-point conversions (and vice-versa) mirroring similar functions in the RTABI. llvm-svn: 208949	2014-05-16 05:41:33 +00:00
Jiangning Liu	5366cb42f6	Implement global merge optimization for global variables. This commit implements two command line switches -global-merge-on-external and -global-merge-aligned, and both of them are false by default, so this optimization is disabled by default for all targets. For ARM64, some back-end behaviors need to be tuned to get this optimization further enabled. llvm-svn: 208934	2014-05-15 23:45:42 +00:00
Reed Kotler	e9f617ec39	Finish materialize for ints Summary: We add code to materialize all integer literals. Test Plan: simplestorei.ll Reviewers: dsanders Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D3596 llvm-svn: 208923	2014-05-15 21:54:15 +00:00
NAKAMURA Takumi	e9e3e4e7b0	llvm/test/CodeGen/X86/combine-sse41-intrinsics.ll: Add explicit triple. llvm-svn: 208897	2014-05-15 15:45:31 +00:00
Andrea Di Biagio	ac4cb52129	[X86] Teach the backend how to fold SSE4.1/AVX/AVX2 blend intrinsics. Added target specific combine rules to fold blend intrinsics according to the following rules: 1) fold(blend A, A, Mask) -> A; 2) fold(blend A, B, <allZeros>) -> A; 3) fold(blend A, B, <allOnes>) -> B. Added two new tests to verify that the new folding rules work for all the optimized blend intrinsics. llvm-svn: 208895	2014-05-15 15:18:15 +00:00
Tom Stellard	77051e93a5	R600/SI: Only use SALU instructions for 64-bit add in a block of CF depth 0 llvm-svn: 208886	2014-05-15 14:41:54 +00:00
Tom Stellard	efb8470c62	R600/SI: Use VALU instructions for i1 ops llvm-svn: 208885	2014-05-15 14:41:50 +00:00
Tim Northover	ed117bb644	ARM64: print correct aliases for NEON mov & mvn instructions In all cases, if a "mov" alias exists, it is the canonical form of the instruction. Now that TableGen can support aliases containing syntax variants, we can enable them and improve the quality of the asm output. llvm-svn: 208874	2014-05-15 12:11:02 +00:00
Tim Northover	4ba95d4483	TableGen/ARM64: print aliases even if they have syntax variants. To get at least one use of the change (and some actual tests) in with its commit, I've enabled the AArch64 & ARM64 NEON mov aliases. llvm-svn: 208867	2014-05-15 11:16:32 +00:00
Jiangning Liu	ecd097d587	[ARM64] Support aggressive fastcc/tailcallopt breaking ABI by popping out argument stack from callee. llvm-svn: 208837	2014-05-15 01:33:17 +00:00
David Blaikie	ef32c3bf98	DebugInfo: Sure up subprogram variable list handling with more assertions and fewer conditionals. Many old tests using prior schemas still had some brokenness here (both indirect arrays and arrays with single bogus elements). Fixed those up so they don't hit the new assertions. Also reduced nesting in some places, etc. llvm-svn: 208817	2014-05-14 21:52:46 +00:00
Jay Foad	e0eac700cb	Rename ComputeMaskedBits to computeKnownBits. "Masked" has been inappropriate since it lost its Mask parameter in r154011. llvm-svn: 208811	2014-05-14 21:14:37 +00:00
Christian Pirker	7dd3a40e09	ARM-BE: test files for vector argument passing Reviewed at http://reviews.llvm.org/D3766 llvm-svn: 208793	2014-05-14 16:59:44 +00:00
Christian Pirker	f835f2f7be	[ARM64-BE] Fix byte order of CIE and FDE frames for exception handling Reviewed at http://reviews.llvm.org/D3741 llvm-svn: 208792	2014-05-14 16:51:58 +00:00
Logan Chien	fb407ff21d	Fix ARM EHABI when function has landingpad and nounwind. If the function has the landingpad instruction, then the handlerdata should be emitted even if the function has nouwnind attribute. Otherwise, following code will not work: void test1() noexcept { try { throw_exception(); } catch (...) { log_unexpected_exception(); } } Since the cantunwind was incorrectly emitted and the LSDA is not available. llvm-svn: 208791	2014-05-14 16:38:30 +00:00
Logan Chien	4b117dbbaf	More test case for r208715. The commit r208166 will cause some regression on ARM EHABI. This fix has been committed in r208715, and an assertion failure test case has been committed in r208770. This commit further extends the unittest so that the actual value in the handlerdata will be checked. llvm-svn: 208790	2014-05-14 16:37:32 +00:00
Benjamin Kramer	56a86f3d17	X86: If we have an instruction that sets a flag and a zero test on the input of that instruction try to eliminate the test. For example tzcntl %edi, %ebx testl %edi, %edi je .label can be rewritten into tzcntl %edi, %ebx jb .label A minor complication is that tzcnt sets CF instead of ZF when the input is zero, we have to rewrite users of the flags from ZF to CF. Currently we recognize patterns using lzcnt, tzcnt and popcnt. Differential Revision: http://reviews.llvm.org/D3454 llvm-svn: 208788	2014-05-14 16:14:45 +00:00
Evgeniy Stepanov	7fa438812a	Regression test for ARM EHABI breakage in r208166. llvm-svn: 208770	2014-05-14 11:13:31 +00:00
Matt Arsenault	102b7be363	R600/SI: Try to fix BFE operands when moving to VALU This was broken by r208479 llvm-svn: 208740	2014-05-13 23:45:50 +00:00
Christian Pirker	f4b3e60979	ARMEB: Fix byte order of EH frame unwinding instructions, with modified test file This commit was already commited as revision rL208689 and discussd in phabricator revision D3704. But the test file was crashing on OS X and windows. I fixed the test file in the same way as in rL208340. llvm-svn: 208711	2014-05-13 16:44:30 +00:00
Joey Gouly	c0f94cb136	[CGP] r205941 changed the logic, so that a cast happens before 'Result' is compared to 'AddrMode.BaseReg'. In the case that 'AddrMode.BaseReg' is nullptr, 'Result' will also be nullptr, so the cast causes an assertion. We should use dyn_cast_or_null here to check 'Result' is not null and it is an instruction. Bug found by Mats Petersson, and I reduced his IR to get a test case. llvm-svn: 208705	2014-05-13 15:42:45 +00:00

... 3 4 5 6 7 ...

10203 Commits