llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00

Author	SHA1	Message	Date
Daniel Sanders	c027690ada	[mips] Fix unused variable warning introduced in r221056 llvm-svn: 221058	2014-11-01 18:53:01 +00:00
Daniel Sanders	f5c1f29334	[mips] Remove ByValArgInfo::Address in favour of CCValAssign::getMemLocOffset(). NFC. Summary: ByValArgInfo is practically the same as CCState::ByValInfo now. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5976 llvm-svn: 221057	2014-11-01 18:44:56 +00:00
Daniel Sanders	698188b376	[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC. Summary: There are a couple more changes to make before analyzeFormalArguments can be merged into the standard AnalyzeFormalArguments. I've had to temporarily poke a couple holes in MipsCCState's encapsulation to save having to make all the required changes for this merge all at once. These will be removed shortly. We must merge our ByVal argument handling with the implementation in CCState. This will be done over the next three patches, then the fourth will merge analyzeFormalArguments with AnalyzeFormalArguments. Depends on D5967 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5969 llvm-svn: 221056	2014-11-01 18:38:03 +00:00
Daniel Sanders	6acd6b7829	[mips] Remove MipsCC::CCInfo. NFC. Summary: It's now passed in as an argument to functions that need it. Eventually this argument will be replaced by the 'this' pointer for a MipsCCState object. Depends on D5966 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5967 llvm-svn: 221054	2014-11-01 18:13:52 +00:00
Daniel Sanders	b89d32de0f	[mips] Removed MipsCC::fixedArgFn(). NFC Summary: There is one remaining trace of it in MipsCC::analyzeCallOperands() where Mips16 might override the calling convention. This will moved into tablegen-erated code later. Depends on D5965 Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5966 llvm-svn: 221053	2014-11-01 17:44:51 +00:00
Daniel Sanders	79790d7c5e	[tablegen] Add CustomCallingConv and use it to tablegen-erate the outermost parts of the Mips O32 implementation Summary: CustomCallingConv is simply a CallingConv that tablegen should not generate the implementation for. It allows regular CallingConv's to delegate to these custom functions. This is (currently) necessary for Mips and we cannot use CCCustom without having to adapt to the different API that CCCustom uses. This brings us a bit closer to being able to remove MipsCC::analyzeCallOperands and MipsCC::analyzeFormalArguments in favour of the common implementation. No functional change to the targets. Depends on D3341 Reviewers: vmedic Reviewed By: vmedic Subscribers: vmedic, llvm-commits Differential Revision: http://reviews.llvm.org/D5965 llvm-svn: 221052	2014-11-01 17:38:22 +00:00
Rafael Espindola	20114e4c02	Remove redundant calls to isMaterializable. This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050	2014-11-01 16:46:18 +00:00
Daniel Sanders	d266fea71e	Revert r221048 - Test commit It seems I can't commit unless $DBUS_SESSION_BUS_ADDRESS is set correctly and it is not set for ssh sessions. llvm-svn: 221049	2014-11-01 16:08:03 +00:00
Daniel Sanders	f112d3f8cc	Test commit Added some whitespace to debug some authentication issues I'm having. llvm-svn: 221048	2014-11-01 16:00:40 +00:00
Adrian Prantl	401e61e9a6	Revert "Temporarily revert r220777 to sort out build bot breakage." This reverts commit r221028. Later commits depend on this and reverting just this one causes even more bots to fail. llvm-svn: 221041	2014-11-01 03:19:45 +00:00
NAKAMURA Takumi	2419cf2c14	Revert r220779, "[AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine." Since r221028 (reverting r220777), this caused failures. llvm-svn: 221040	2014-11-01 01:36:14 +00:00
Adrian Prantl	379cee6cd0	Temporarily revert r220777 to sort out build bot breakage. "[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros." llvm-svn: 221028	2014-11-01 00:26:59 +00:00
Duncan P. N. Exon Smith	7004fd9aac	IR: MDNode => Value: Instruction::getMetadata() Change `Instruction::getMetadata()` to return `Value` as part of PR21433. Update most callers to use `Instruction::getMDNode()`, which wraps the result in a `cast_or_null<MDNode>`. llvm-svn: 221024	2014-11-01 00:10:31 +00:00
Reid Kleckner	4f54f1c0fd	Revert "R600: Add missing file to CMakeLists.txt" This reverts commit r220998. It should've been reverted with the other change. llvm-svn: 221021	2014-10-31 23:39:10 +00:00
Reid Kleckner	660a86c442	Revert "R600: Make sure to inline all internal functions" This reverts commit r220996. It introduced layering violations causing link errors in many configurations. llvm-svn: 221020	2014-10-31 23:35:26 +00:00
Reid Kleckner	626248cfdf	Work around bugs in MSVC "14" CTP 3's conversion logic It appears to ignore or find ambiguous MachineInstrBuilder's conversion operators that allow conversion to MachineInstr* and MachineBasicBlock::bundle_iterator. As a workaround, add an explicit way to get the MachineInstr. llvm-svn: 221017	2014-10-31 23:19:46 +00:00
Tom Stellard	7f51cfe90c	R600: Add IPO to the list of required libraries llvm-svn: 221004	2014-10-31 21:52:08 +00:00
Tom Stellard	992a2f893a	R600: Add missing file to CMakeLists.txt llvm-svn: 220998	2014-10-31 20:56:36 +00:00
Tom Stellard	044bedf9b1	R600: Don't promote allocas when one of the users is a ptrtoint instruction We need to figure out how to track ptrtoint values all the way until result is converted back to a pointer in order to correctly rewrite the pointer type. llvm-svn: 220997	2014-10-31 20:52:04 +00:00
Tom Stellard	8a8077171a	R600: Make sure to inline all internal functions Function calls aren't supported yet. llvm-svn: 220996	2014-10-31 20:52:02 +00:00
Bill Schmidt	5c5103e17e	[PowerPC] Initial VSX intrinsic support, with min/max for vector double Now that we have initial support for VSX, we can begin adding intrinsics for programmer access to VSX instructions. This patch adds basic support for VSX intrinsics in general, and tests it by implementing intrinsics for minimum and maximum for the vector double data type. The LLVM portion of this is quite straightforward. There is a companion patch for Clang. llvm-svn: 220988	2014-10-31 19:19:07 +00:00
Chad Rosier	14722ca1a2	[AArch64] Check Dest Register Liveness in CondOpt pass. Our internal test reveals such case should not be transformed: cmp x17, #3 b.lt .LBB10_15 ... subs x12, x12, #1 b.gt .LBB10_1 where x12 is a liveout, becomes: cmp x17, #2 b.le .LBB10_15 ... subs x12, x12, #2 b.ge .LBB10_1 Unable to provide test case as it's difficult to reproduce on community branch. http://reviews.llvm.org/D6048 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220987	2014-10-31 19:02:38 +00:00
Quentin Colombet	06167df4ad	[CodeGenPrepare] Move extractelement close to store if they can be combined. This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978	2014-10-31 17:52:53 +00:00
Chad Rosier	9bc5bd5f9a	[AArch64] CondOpt pass is missing FCMP instructions when searching backward for a CMP which defines the flags used by B.CC. http://reviews.llvm.org/D6047 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220961	2014-10-31 15:17:36 +00:00
Ulrich Weigand	5b86d1f937	[PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code Since block address values can be larger than 2GB in 64-bit code, they cannot be loaded simply using an @l / @ha pair, but instead must be loaded from the TOC, just like GlobalAddress, ConstantPool, and JumpTable values are. The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where temporary labels could not be used as TOC values, since code would attempt (and fail) to use GetOrCreateSymbol to create a symbol of the same name as the temporary label. llvm-svn: 220959	2014-10-31 10:33:14 +00:00
Robert Khasanov	3e398a3800	[AVX512] Added VBROADCAST{SS/SD} encoding for VL subset. Refactored through AVX512_maskable llvm-svn: 220908	2014-10-30 14:21:47 +00:00
Robert Khasanov	b26f057244	[AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP, VSUBP, VMULP, VDIVP, VMAXP, VMINP) Refactored through AVX512_maskable Added encoding tests for them. llvm-svn: 220858	2014-10-29 15:43:02 +00:00
Robert Khasanov	4bea47e6eb	[AVX512] Fix VSQRT packed instructions internal names. No functional change llvm-svn: 220808	2014-10-28 18:22:41 +00:00
Robert Khasanov	2ca56ad410	[AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset. Refactored through AVX512_maskable llvm-svn: 220806	2014-10-28 18:15:20 +00:00
Robert Khasanov	d134194df8	[AVX-512] Expanded rsqrt/rcp instructions to VL subset. Refactored multiclass through AVX512_maskable llvm-svn: 220783	2014-10-28 16:37:13 +00:00
Robert Khasanov	2b3eb6af20	[AVX512] Removed special case for cmp instructions in getVectorMaskingNode. Now cmp intrinsics lower as other intrinsics through VSELECT, and then VSELECT tranforms to AND in PerformSELECTCombine. No functional change. llvm-svn: 220779	2014-10-28 16:17:14 +00:00
Robert Khasanov	fa811056f3	[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros. This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case. Added tests for vselect of <X x i1> values. llvm-svn: 220777	2014-10-28 15:59:40 +00:00
Robert Khasanov	a87de33c61	[AVX512] Bring back vector-shuffle lowering support through broadcasts Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774	2014-10-28 12:28:51 +00:00
Reid Kleckner	af3046bd9e	X86: Implement the vectorcall calling convention This is a Microsoft calling convention that supports both x86 and x86_64 subtargets. It passes vector and floating point arguments in XMM0-XMM5, and passes them indirectly once they are consumed. Homogenous vector aggregates of up to four elements can be passed in sequential vector registers, but this part is not implemented in LLVM and will be handled in Clang. On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as integer register parameters and is callee cleanup. On x86_64, it delegates to the normal win64 calling convention. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D5943 llvm-svn: 220745	2014-10-28 01:29:26 +00:00
Tim Northover	136fb499eb	AArch64: enable Cortex-A57 FP balancing on Cortex-A53. Benchmarks have shown that it's harmless to the performance there, and having a unified set of passes between the two cores where possible helps big.LITTLE deployment. Patch by Z. Zheng. llvm-svn: 220744	2014-10-28 01:24:32 +00:00
NAKAMURA Takumi	231973f544	AArch64InstrInfo.h: Fix a warning introduced in clang r220703. [-Winconsistent-missing-override] llvm-svn: 220739	2014-10-27 23:29:27 +00:00
Adam Nemet	f35a0bba1b	[AVX512] Add vpermil variable version This is implemented via a multiclass that derives from the vperm imm multiclass. Fixes <rdar://problem/18426089> llvm-svn: 220737	2014-10-27 23:08:40 +00:00
Adam Nemet	134480598b	[AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo No functionality change. No change in X86.td.expanded except that we only set the CD8 attributes for the memory variants. (This shouldn't be used unless we have a memory operand.) llvm-svn: 220736	2014-10-27 23:08:37 +00:00
Adam Nemet	8cf84f2568	[AVX512] Derive vpermil* from avx512_perm_imm This used to derive from avx512_pshuf_imm which is confusing. NFC. Compared X86.td.expanded. llvm-svn: 220735	2014-10-27 23:08:34 +00:00
Adam Nemet	78db8293e5	[AVX512] Fix copy-and-paste bugs in vpermil 1) i512mem -> f512mem (this is the packed FP input being permuted) 2) element size is 64 bits in EVEX_CD8 for PD. (A good illustration why X86VectorVTInfo is useful) llvm-svn: 220734	2014-10-27 23:08:31 +00:00
Pete Cooper	01b2132972	Fix a stackmap bug introduced in r220710. For a call to not return in to the stackmap shadow, the shadow must end with the call. To do this, we must insert any required nops before the call, and not after it. llvm-svn: 220728	2014-10-27 22:38:45 +00:00
Juergen Ributzka	b3117b0b86	[FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713	2014-10-27 19:58:36 +00:00
Juergen Ributzka	76c57b0570	[FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'. Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712	2014-10-27 19:46:23 +00:00
Pete Cooper	87efb91a50	Stackmap shadows should consider call returns a branch target. To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets. A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return. llvm-svn: 220710	2014-10-27 19:40:35 +00:00
Juergen Ributzka	3423c5ccda	[FastISel][AArch64] Use 'cbz' also for null values (pointers). The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709	2014-10-27 19:38:05 +00:00
Juergen Ributzka	64c2c99226	[FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704	2014-10-27 19:16:48 +00:00
Juergen Ributzka	57783726dd	[FastISel][AArch64] Fix load/store with frame indices. At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700	2014-10-27 18:21:58 +00:00
Lang Hames	77d387a954	[PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of these sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688	2014-10-27 17:44:25 +00:00
NAKAMURA Takumi	a1ef3346aa	Prune CRLF. llvm-svn: 220678	2014-10-27 12:37:26 +00:00
Oliver Stannard	9a595b2769	[ARM] Select VMAXNM and VMINNM regardless of operand order Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671	2014-10-27 09:23:02 +00:00
Yuri Gorshenin	0701551505	[asan-asm-instrumentation] Added comment describing how asm instrumentation works. Summary: [asan-asm-instrumentation] Added comment describing how asm instrumentation works. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5970 llvm-svn: 220670	2014-10-27 08:38:54 +00:00
Elena Demikhovsky	467839f2f5	AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction llvm-svn: 220638	2014-10-26 09:52:24 +00:00
Simon Pilgrim	c20b3d1fdd	[X86][SSE] Vector integer/float conversion memory folding Tidied up some entries in the folding tables so that they are under the correct comment section (they were categorised as AVX2 instructions when they're AVX1). Minor patch agreed with qcolombet. llvm-svn: 220613	2014-10-25 08:11:20 +00:00
Jingyue Wu	376aaf44c4	[NVPTX] aligned byte-buffers for vector return types Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607	2014-10-25 03:46:16 +00:00
Kevin Enderby	9e33867d25	Fix a Mach-O assembler segfault for a subtraction expression with an undefined symbol. In a Mach-O object file a relocatable expression of the form SymbolA - SymbolB + constant is allowed when both symbols are defined in a section. But when either symbol is undefined it is an error. The code was crashing when it had an undefined symbol in this case. And should have printed a error message using the location information in the relocation entry. rdar://18678402 llvm-svn: 220599	2014-10-24 22:39:40 +00:00
Simon Pilgrim	29b3d266a4	[X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element. An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle. Differential Revision: http://reviews.llvm.org/D5917 llvm-svn: 220592	2014-10-24 21:04:41 +00:00
Colin LeMahieu	a5b4b06680	[Hexagon] Resubmission of 220427 Modified library structure to deal with circular dependency between HexagonInstPrinter and HexagonMCInst. Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220584	2014-10-24 19:00:32 +00:00
Sanjay Patel	4e42d925c1	Allow AVX vrsqrtps generation. This is a follow-on to r220570 that allows a 256-bit (v8f32) version of vrsqrtps to be generated. llvm-svn: 220579	2014-10-24 17:59:18 +00:00
Sanjay Patel	d9b7837012	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Daniel Sanders	3747982b4e	[mips] Replace MipsABIEnum with a MipsABIInfo class. Summary: No functional change yet, it's just an object replacement for an enum. It will allow us to gather ABI information in a single place so that we can start testing for properties of the ABI's instead of the ABI itself. For example we will eventually be able to use: ABI.MinStackAlignmentInBytes() instead of: (isABI_N32() \|\| isABI_N64()) ? 16 : 8 which is clearer and more maintainable. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://reviews.llvm.org/D3341 llvm-svn: 220568	2014-10-24 16:15:27 +00:00
Daniel Sanders	656a7ed742	[mips] Fix >80-column line llvm-svn: 220564	2014-10-24 14:46:00 +00:00
Daniel Sanders	5f05e5e050	[mips] Remove redundant code in RetCC_MipsN. NFC. Summary: i32 is always promoted to i64 so it no longer makes sense to assign i32 to registers. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5964 llvm-svn: 220561	2014-10-24 13:49:54 +00:00
Daniel Sanders	d767694dc4	[mips] For N32/N64, structs must be passed in the upper bits of a register. Summary: Most structs were fixed by r218451 but those of between >32-bits and <64-bits remained broken since they were not marked with [ASZ]ExtUpper. This patch fixes the remaining cases by using CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5963 llvm-svn: 220556	2014-10-24 13:09:19 +00:00
Oliver Stannard	4116b199b2	[AArch64] Fix fast-isel of cbz of i1, i8, i16 This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553	2014-10-24 09:54:41 +00:00
Adam Nemet	01508c4733	[AVX512] FMA support for the 231 variants This is asm/diasm-only support, similar to AVX. For ISeling the register variant, they are no different from 213 other than whether the multiplication or the addition operand is destructed. For ISeling the memory variant, i.e. to fold a load, they are no different than the 132 variant. The addition operand (op3) in both cases can come from memory. Again the ony difference is which operand is destructed. There could be a post-RA pass that would convert a 213 or 132 into a 231. Part of <rdar://problem/17082571> llvm-svn: 220540	2014-10-24 00:03:00 +00:00
Adam Nemet	3beded2b5d	[AVX512] Introduce fma3p_forms from AVX This multiclass generates the different forms: 213, 231, 132 in AVX. 132 in AVX512 is a separate class but I am planning to use this same multiclass to generate 231 relying on the nice the null_frag trick from AVX to disable codegen pattern for 231. No functionality change, no change in X86.td.expanded except for the different instruction definition names. llvm-svn: 220539	2014-10-24 00:02:55 +00:00
Ahmed Bougacha	af95bf4558	[X86] Improve mul w/ overflow codegen, to MUL8+SETO. Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516	2014-10-23 21:55:31 +00:00
Renato Golin	cd30ecdc65	Do not emit intermediate register for zero FP immediate This updates check for double precision zero floating point constant to allow use of instruction with immediate value rather than temporary register. Currently "a == 0.0", where "a" is of "double" type generates: vmov.i32 d16, #0x0 vcmpe.f64 d0, d16 With this change it becomes: vcmpe.f64 d0, #0 Patch by Sergey Dmitrouk. llvm-svn: 220486	2014-10-23 15:31:50 +00:00
NAKAMURA Takumi	b1cc8f92d6	Hexagon/Disassembler/LLVMBuild.txt: Update libdeps. llvm-svn: 220482	2014-10-23 11:32:16 +00:00
NAKAMURA Takumi	985e79f39f	Hexagon/LLVMBuild.txt: Prune CRLF. llvm-svn: 220481	2014-10-23 11:32:03 +00:00
NAKAMURA Takumi	0cdee77f2d	[CMake] Prune CRLF in CMakeLists.txt(s). llvm-svn: 220480	2014-10-23 11:31:50 +00:00
NAKAMURA Takumi	d8b493b106	Revert r220427, "[Hexagon] Adding encoding bits for add opcode." It brought cyclic dependecy between HexagonAsmPrinter and HexagonDesc. llvm-svn: 220478	2014-10-23 11:31:22 +00:00
Zoran Jovanovic	417b0378da	[mips][microMIPS] Implement ADDIUR1SP instruction Differential Revision: http://reviews.llvm.org/D5153 llvm-svn: 220477	2014-10-23 11:13:59 +00:00
Zoran Jovanovic	7d4ae91afe	ps][microMIPS] Implement ADDIUR2 instruction Differential Revision: http://reviews.llvm.org/D5151 llvm-svn: 220476	2014-10-23 11:06:34 +00:00
Zoran Jovanovic	c644ce2e43	ps][microMIPS] Implement LI16 instruction Differential Revision: http://reviews.llvm.org/D5149 llvm-svn: 220475	2014-10-23 10:59:24 +00:00
Zoran Jovanovic	0172553d1e	[mips][microMIPS] Implement CodeGen support for SLL16 and SRL16 instructions Differential Revision: http://reviews.llvm.org/D5774 llvm-svn: 220474	2014-10-23 10:42:01 +00:00
Oliver Stannard	b9f2d13ec6	[Thumb2] Improve disassembly of memory hints Currently, the ARM disassembler will disassemble the Thumb2 memory hint instructions (PLD, PLDW and PLI), even for targets which do not have these instructions. This patch adds the required checks to the disassmebler. llvm-svn: 220472	2014-10-23 08:52:58 +00:00
Akira Hatanaka	f1f0b1a1fd	[ARM, stack protector] If supported, use armv7 instructions. This commit enables using movt/movw to load the stack guard address: movw r0, :lower16:(L_g3$non_lazy_ptr-(LPC0_0+8)) movt r0, :upper16:(L_g3$non_lazy_ptr-(LPC0_0+8)) ldr r0, [pc, r0] Previously a pc-relative load was emitted: ldr r0, LCPI0_0 ldr r0, [pc, r0] rdar://problem/18740489 llvm-svn: 220470	2014-10-23 04:17:05 +00:00
Colin LeMahieu	88461f5b7a	[Hexagon] Adding encoding bits for add opcode. Adding llvm-mc tests. Removing unit tests. http://reviews.llvm.org/D5624 llvm-svn: 220427	2014-10-22 20:58:35 +00:00
Chad Rosier	19ef65a831	[AArch64] Add support for the .inst directive. This has been implement using the MCTargetStreamer interface as is done in the ARM, Mips and PPC backends. Phabricator: http://reviews.llvm.org/D5891 PR20964 llvm-svn: 220422	2014-10-22 20:35:57 +00:00
Hans Wennborg	dbfab2bba5	Fix VS2012 build; C++11 type aliases are not supported. llvm-svn: 220399	2014-10-22 17:47:49 +00:00
Colin LeMahieu	f359444940	Ammending 220393 - Removing unused decoding tables. llvm-svn: 220397	2014-10-22 17:23:01 +00:00
Colin LeMahieu	05568816b4	Ammending 220393 - Removing unused functions. llvm-svn: 220396	2014-10-22 17:03:19 +00:00
Bill Schmidt	1033a05374	[PATCH] Support select-cc for VSFRC when VSX is enabled A previous patch enabled SELECT_VSRC and SELECT_CC_VSRC for VSX to handle <2 x double> cases. This patch adds SELECT_VSFRC and SELECT_CC_VSFRC to allow use of all 64 vector-scalar registers for the f64 type when VSX is enabled. The changes are analogous to those in the previous patch. I've added a new variant to vsx.ll to test the code generation. (I also cleaned up a little formatting in PPCInstrVSX.td from the previous patch.) llvm-svn: 220395	2014-10-22 16:58:20 +00:00
Colin LeMahieu	f6f8bdf091	[Hexagon] Adding basic disassembler. Marking all instructions as CodeGenOnly since encoding bits are not set yet. http://reviews.llvm.org/D5829?vs=on&id=15023&whitespace=ignore-all#toc llvm-svn: 220393	2014-10-22 16:49:14 +00:00
Bill Schmidt	884633f291	[PowerPC] Support select-cc for VSX The tests test/CodeGen/Generic/select-cc.ll and test/CodeGen/PowerPC/select-cc.ll both fail with VSX enabled. The problem is that the lowering logic for the SELECT and SELECT_CC operations doesn't currently support the VSX registers. This patch fixes that. In lib/Target/PowerPC/PPCInstrInfo.td, we have pseudos to handle this for other register classes. Similar pseudos are added in PPCInstrVSX.td (they must be there, because the "vsrc" register class definition appears there) for the VSRC register class. The SELECT_VSRC pseudo is then used in pattern matching for SELECT_CC. The rest of the patch just adds logic for SELECT_VSRC wherever similar logic appears for SELECT_VRRC. There are no new test cases because the existing tests above test this, along with a variant in test/CodeGen/PowerPC/vsx.ll. After discussion with Hal, a future patch will add similar _VSFRC variants to override f64 type handling (currently using F8RC). llvm-svn: 220385	2014-10-22 13:13:40 +00:00
Arnaud A. de Grandmaison	e8d641238b	[AArch64] Cleanup A57PBQPConstraints And add a long awaited testcase. llvm-svn: 220381	2014-10-22 12:40:20 +00:00
Jyoti Allur	555583e57e	[Thumb/Thumb2] Implement restrictions on SP in register list on LDM, STM variants in thumb mode llvm-svn: 220379	2014-10-22 10:41:14 +00:00
Matt Arsenault	2257f6b589	Add minnum / maxnum codegen llvm-svn: 220342	2014-10-21 23:01:01 +00:00
Matt Arsenault	98d33a4281	R600/SI: Add missing parameter to div_fmas intrinsic llvm-svn: 220338	2014-10-21 22:20:55 +00:00
Matt Arsenault	f60479b756	R600: Use default GlobalDirective The overridden one wasn't inserting a space, so you would end up with .globalfoo llvm-svn: 220329	2014-10-21 21:08:36 +00:00
Arnaud A. de Grandmaison	73624b6ac4	[PBQP] Teach PassConfig to tell if the default register allocator is used. This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321	2014-10-21 20:47:22 +00:00
Rafael Espindola	f1e8d6839e	Drop support for an old version of ld64 (from darwin 9). llvm-svn: 220310	2014-10-21 18:31:09 +00:00
Matt Arsenault	bfd13cde1b	R600/SI: Add pattern for bswap llvm-svn: 220304	2014-10-21 16:25:08 +00:00
NAKAMURA Takumi	2ca8461092	X86AsmInstrumentation.cpp: Dissolve initializer-ranged-for. MSC17 disliked it. llvm-svn: 220301	2014-10-21 16:22:52 +00:00
Colin LeMahieu	0a3ea08c1c	Test commit Fixing brief comment. llvm-svn: 220299	2014-10-21 16:03:10 +00:00
Bill Schmidt	b654c2bb89	[PowerPC] Avoid VSX FMA mutate when killed product reg = addend reg With VSX enabled, test/CodeGen/PowerPC/recipest.ll exposes a bug in the FMA mutation pass. If we have a situation where a killed product register is the same register as the FMA target, such as: %vreg5<def,tied1> = XSNMSUBADP %vreg5<tied0>, %vreg11, %vreg5, %RM<imp-use>; VSFRC:%vreg5 F8RC:%vreg11 then the substitution makes no sense. We end up getting a crash when we try to extend the interval associated with the killed product register, as there is already a live range for %vreg5 there. This patch just disables the mutation under those circumstances. Since recipest.ll generates different code with VMX enabled, I've modified that test to use -mattr=-vsx. I've borrowed the code from that test that exposed the bug and placed it in fma-mutate.ll, where it tests several mutation opportunities including the "bad" one. llvm-svn: 220290	2014-10-21 13:02:37 +00:00
Oliver Stannard	ecd544ccbd	[ARM] NEON 32-bit scalar moves are also available in VFPv2 The 32-bit variants of the NEON scalar<->GPR move instructions are also available in VFPv2. The 8- and 16-bit variants do require NEON. Note that the checks in the test file are all -DAG because they are checking a mixture of stdout and stderr, and the ordering is not guaranteed. llvm-svn: 220288	2014-10-21 11:49:14 +00:00
Yuri Gorshenin	a8f1a7e5da	[asan-asm-instrumentation] Fixed memory accesses with rbp as a base or an index register. Summary: Fixed memory accesses with rbp as a base or an index register. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5819 llvm-svn: 220283	2014-10-21 10:22:27 +00:00
Oliver Stannard	70943b7fa8	[Thumb2] LDRS?[BH] cannot load to the PC The Thumb2 LDRS?[BH] instructions are not valid when the destination register is the PC (these encodings are used for preload hints). llvm-svn: 220278	2014-10-21 09:14:15 +00:00
Zoran Jovanovic	5e356e74e3	[mips][microMIPS] Implement ADDU16 and SUBU16 instructions Differential Revision: http://reviews.llvm.org/D5118 llvm-svn: 220276	2014-10-21 08:44:58 +00:00
Zoran Jovanovic	26b6fdd712	[mips][microMIPS] Implement AND16, NOT16, OR16 and XOR16 instructions Differential Revision: http://reviews.llvm.org/D5117 llvm-svn: 220275	2014-10-21 08:32:40 +00:00
Zoran Jovanovic	7982f485af	[mips][microMIPS] Implement microMIPS 16-bit instructions registers Differential Revision: http://reviews.llvm.org/D5116 llvm-svn: 220273	2014-10-21 08:23:11 +00:00
Rafael Espindola	6ffbd5bf5d	Fix a bit of confusion about .set and produce more readable assembly. Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256	2014-10-21 01:17:30 +00:00
Quentin Colombet	1691b0c568	[X86] Fix a bug in the lowering of the mask of VSELECT. X86 code to lower VSELECT messed a bit with the bits set in the mask of VSELECT when it knows it can be lowered into BLEND. Indeed, only the high bits need to be set for those and it optimizes those accordingly. However, when the mask is a compile time constant, the lowering will be handled by the generic optimizer and those modifications will generate bad code in the generic optimizer. This patch fixes that by preventing the optimization if the VSELECT will be handled by the generic optimizer. <rdar://problem/18675020> llvm-svn: 220242	2014-10-20 23:13:30 +00:00
Simon Pilgrim	0ccb373260	[X86] Memory folding for commutative instructions (updated) This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. Updated version of r219584 (reverted in r219595) - the commutation attempt now explicitly ensures that neither of the commuted source operands are tied to the destination operand / register, which was the source of all the regressions that occurred with the original patch attempt. Added additional regression test case provided by Joerg Sonnenberger. Differential Revision: http://reviews.llvm.org/D5818 llvm-svn: 220239	2014-10-20 22:14:22 +00:00
Tim Northover	7a41a526ce	ARM: rework Thumb1 frame index rewriting The previous code had a few problems, motivating the choices here. 1. It could create instructions clobbering CPSR, but the incoming MachineInstr didn't reflect this. A potential source of corruption. This is why the patch has a new PseudoInst for before lowering. 2. Similarly, there was some code to handle the incoming instruction not being ARMCC::AL, but this would have caused massive problems if it was actually invoked when a complex offset needing more than one instruction was requested. 3. It wasn't designed to handle unaligned pointers (or offsets). These should probably be minimised anyway, but the code needs to deal with them properly regardless. 4. It had some rather dubious ad-hoc code to avoid calling emitThumbRegPlusImmediate, a function which should be designed to do precisely this job. We seem to cover the common cases correctly now, and hopefully can enhance emitThumbRegPlusImmediate to handle any extra optimisations we need to add in future. llvm-svn: 220236	2014-10-20 21:28:41 +00:00
Oliver Stannard	4a54d08d15	[Thumb2] RFE, SRS and "SUBS pc, lr" are undefined on v7M These instructions are related to the v7[AR] exception model, and are not defined on v7M. llvm-svn: 220204	2014-10-20 15:37:35 +00:00
Sid Manning	66f7ab4998	Remove unnecessary else. llvm-svn: 220200	2014-10-20 13:08:19 +00:00
Oliver Stannard	23d7518ec9	[ARM] Do not select SMULW[BT] or SMLAW[BT] The current instruction selection patterns for SMULW[BT] and SMLAW[BT] are incorrect. These instructions multiply a 32-bit and a 16-bit value (both signed) and return the top 32 bits of the 48-bit result. This preserves the 16 bits of overflow, whereas the patterns they currently match truncate the result to 16 bits then sign extend. To select these instructions, we would need to match an ISD::SMUL_LOHI, a sign extend, two shifts and an or. There is no way to match SMUL_LOHI in an instruction pattern as it defines multiple values, so this would have to be done in C++. I have raised http://llvm.org/bugs/show_bug.cgi?id=21297 to cover allowing correct selection of these instructions. This fixes http://llvm.org/bugs/show_bug.cgi?id=19396 llvm-svn: 220196	2014-10-20 11:30:35 +00:00
Oliver Stannard	3b66d91575	[Thumb] Fix crash in Thumb1RegisterInfo::rewriteFrameIndex This function can, for some offsets from the SP, split one instruction into two. Since it re-uses the original instruction as the first instruction of the result, we need ensure its result register is not marked as dead before we use it in the second instruction. llvm-svn: 220194	2014-10-20 11:00:18 +00:00
Bob Wilson	113df60c0c	Use triple predicate functions instead of checking values directly. NFC. llvm-svn: 220155	2014-10-19 00:39:30 +00:00
Aaron Watry	ea88a00c76	R600/SI: Add global atomicrmw xchg v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220110	2014-10-17 23:33:03 +00:00
Aaron Watry	d852df8bcb	R600/SI: Add global atomicrmw xor v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220109	2014-10-17 23:33:01 +00:00
Aaron Watry	5f5371bf76	R600/SI: Add global atomicrmw or v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220108	2014-10-17 23:32:59 +00:00
Aaron Watry	d94b6445ca	R600/SI: Add global atomicrmw min/umin v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220107	2014-10-17 23:32:57 +00:00
Aaron Watry	0c7ec3cff8	R600/SI: Add global atomicrmw max/umax v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220106	2014-10-17 23:32:56 +00:00
Aaron Watry	f93e43a3b6	R600/SI: Add global atomicrmw and v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220105	2014-10-17 23:32:54 +00:00
Aaron Watry	ad528e135c	R600/SI: Add global atomicrmw sub v2: Add separate offset/no-offset tests Signed-off-by: Aaron Watry <awatry@gmail.com> Reviewed-by: Matt Arsenault <matthew.arsenault@amd.com> llvm-svn: 220104	2014-10-17 23:32:52 +00:00
Bill Schmidt	4290f0e275	[PowerPC] Change assert to better form llvm-svn: 220092	2014-10-17 21:19:59 +00:00
Matt Arsenault	8a07439402	R600/SI: Remove redundant setting of instruction bits These are all set on the instruction base classes. llvm-svn: 220091	2014-10-17 21:13:11 +00:00
Bill Schmidt	3e4d52c10c	[PowerPC] Change liveness testing in VSX FMA mutation pass With VSX enabled, LLVM crashes when compiling test/CodeGen/PowerPC/fma.ll. I traced this to the liveness test that's revised in this patch. The interval test is designed to only work for virtual registers, but in this case the AddendSrcReg is physical. Since there is already a walk of the MIs between the AddendMI and the FMA, I added a check for def/kill of the AddendSrcReg in that loop. At Hal Finkel's request, I converted the liveness test to an assert restricted to virtual registers. I've changed the fma.ll test to have VSX and non-VSX variants so we can test both kinds of multiply-adds. llvm-svn: 220090	2014-10-17 21:02:44 +00:00
Matt Arsenault	58e20fe65d	Fix typo llvm-svn: 220068	2014-10-17 18:02:31 +00:00
Matt Arsenault	b82c7f40b2	R600/SI: Also check for FPImm literal constants llvm-svn: 220067	2014-10-17 18:00:50 +00:00
Matt Arsenault	9be7600d7c	R600/SI: Allow commuting with source modifiers llvm-svn: 220066	2014-10-17 18:00:48 +00:00
Matt Arsenault	2d0382bc47	R600/SI: Simplify code with hasModifiersSet llvm-svn: 220065	2014-10-17 18:00:45 +00:00
Matt Arsenault	9c6d84b515	R600/SI: Fix general commuting breaking src mods The generic code trying to use findCommutedOpIndices won't understand that it needs to swap the modifier operands also, so it should fail if they are set. llvm-svn: 220064	2014-10-17 18:00:43 +00:00
Matt Arsenault	c43bde1e40	R600/SI: Cleanup code with ChangeToFPImmediate llvm-svn: 220063	2014-10-17 18:00:41 +00:00
Matt Arsenault	69078d03ff	R600/SI: Allow comuting fp immediates llvm-svn: 220062	2014-10-17 18:00:39 +00:00
Matt Arsenault	082244cff4	R600/SI: Use early return instead of checking condition twice Any commutable instruction will have at least src1. llvm-svn: 220061	2014-10-17 18:00:37 +00:00
Matt Arsenault	0b41e1680b	R600/SI: Use complex pattern for MUBUF load patterns. This eliminates a use of the SI_ADDR64_RSRC pseudo llvm-svn: 220057	2014-10-17 17:43:00 +00:00
Matt Arsenault	3ce8aba254	R600/SI: Remove SI_BUFFER_RSRC pseudo Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056	2014-10-17 17:42:56 +00:00
Andrea Di Biagio	7d3fa43e58	[X86] Fix missed selection of non-temporal store of zero vector. When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054	2014-10-17 17:27:06 +00:00
James Molloy	b06648acfb	[AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering. We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051	2014-10-17 17:06:31 +00:00
Bill Schmidt	d3f8b7e4eb	[PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generation Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047	2014-10-17 15:13:38 +00:00
Jan Vesely	1949cffd79	Mips: Only set divrem i64 to custom on 64bit Reviewed-by: Daniel Sanders <daniel.sanders@imgtec.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220046	2014-10-17 14:45:28 +00:00
Vasileios Kalintiris	86aabae168	[mips] Add support for COP1's Branch-On-Cond-Likely instructions Summary: Depends on D5782 Reviewers: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5802 llvm-svn: 220042	2014-10-17 14:08:28 +00:00
Vasileios Kalintiris	f0e37a4687	[mips] Add support for COP0's Branch-On-Cond-Likely instructions Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5782 llvm-svn: 220036	2014-10-17 12:38:35 +00:00
Akira Hatanaka	1c4225a09b	ARM: Fix a bug which was causing convergence failure in constant-island pass. The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015	2014-10-17 01:31:47 +00:00
Matt Arsenault	cb1558abf3	R600/SI: Simplify debug printing llvm-svn: 219999	2014-10-17 00:36:20 +00:00
Matt Arsenault	f84bb3f382	R600/SI: Remove another VALU pattern llvm-svn: 219988	2014-10-16 23:33:37 +00:00
Robin Morisset	8dc41d55aa	Erase fence insertion from SelectionDAGBuilder.cpp (NFC) Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain unsound, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957	2014-10-16 20:34:57 +00:00
Matt Arsenault	ecbe5b08b8	R600/SI: Remove unnecessary VALU patterns These haven't been necessary since allowing selecting SALU instructions in non-entry blocks was enabled. llvm-svn: 219956	2014-10-16 20:31:50 +00:00
Matt Arsenault	75125bd463	R600: Fix nonsensical implementation of computeKnownBits for BFE This was resulting in invalid simplifications of sdiv llvm-svn: 219953	2014-10-16 20:07:40 +00:00
Rafael Espindola	253081a9b6	Delete -std-compile-opts. These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951	2014-10-16 20:00:02 +00:00
Juergen Ributzka	3743a4c0d2	[AArch64] Fix miscompile of sdiv-by-power-of-2. When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934	2014-10-16 16:41:15 +00:00
Vasileios Kalintiris	5822b72dd9	[mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931	2014-10-16 15:41:51 +00:00
Vasileios Kalintiris	8d39330886	[mips] Marked the DI/EI instruction aliases as MIPS32r2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5751 llvm-svn: 219927	2014-10-16 15:23:52 +00:00
Vasileios Kalintiris	b62fa9afef	Test commit access: remove extra new line at the end of file llvm-svn: 219925	2014-10-16 14:37:00 +00:00
Matt Arsenault	c79fc2137a	R600: Remove dead function llvm-svn: 219879	2014-10-16 00:08:09 +00:00

1 2 3 4 5 ...

30661 Commits