llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00

Author	SHA1	Message	Date
Rafael Espindola	9c149d6662	Revert "[mips] Add names and tests for the hardware registers" This reverts commit r221299. The tests LLVM :: MC/Disassembler/Mips/mips32.txt LLVM :: MC/Disassembler/Mips/mips32_le.txt were failing. llvm-svn: 221307	2014-11-04 22:15:05 +00:00
Vasileios Kalintiris	2d9f60f771	[mips] Add names and tests for the hardware registers Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5763 llvm-svn: 221299	2014-11-04 21:30:44 +00:00
Arnaud A. de Grandmaison	bd371f63c3	[PBQP] Callee saved regs should have a higher cost than scratch regs Registers are not all equal. Some are not allocatable (infinite cost), some have to be preserved but can be used, and some others are just free to use. Ensure there is a cost hierarchy reflecting this fact, so that the allocator will favor scratch registers over callee-saved registers. llvm-svn: 221293	2014-11-04 20:51:29 +00:00
Benjamin Kramer	b612f063c8	AArch64: Pattern match integer vector abs like we do on ARM. This kind of pattern is emitted by the loop vectorizer. llvm-svn: 221289	2014-11-04 20:10:06 +00:00
Juergen Ributzka	b7483381d4	[Stackmaps] Make test less fragile. NFC. llvm-svn: 221276	2014-11-04 17:11:00 +00:00
NAKAMURA Takumi	0187705397	Remove "REQUIRES:shell" from tests. They work for me. llvm-svn: 221269	2014-11-04 13:41:33 +00:00
Sanjoy Das	2351ae23a7	The patchpoint lowering logic would crash with live constants equal to the tombstone or empty keys of a DenseMap<int64_t, T>. This patch fixes the issue (and adds a tests case). llvm-svn: 221214	2014-11-04 00:59:21 +00:00
Akira Hatanaka	e7d08f497e	[ARM, inline-asm] Fix ARMTargetLowering::getRegForInlineAsmConstraint to return register class tGPRRegClass if the target is thumb1. This commit fixes a crash that occurs during register allocation which was triggered when a virtual register defined by an inline-asm instruction had to be spilled. rdar://problem/18740489 llvm-svn: 221178	2014-11-03 20:37:04 +00:00
Ahmed Bougacha	f2a8134721	[X86] 8bit divrem: Improve codegen for AH register extraction. For 8-bit divrems where the remainder is used, we used to generate: divb %sil shrw $8, %ax movzbl %al, %eax That was to avoid an H-reg access, which is problematic mainly because it isn't possible in REX-prefixed instructions. This patch optimizes that to: divb %sil movzbl %ah, %eax To do that, we explicitly extend AH, and extract the L-subreg in the resulting register. The extension is done using the NOREX variants of MOVZX. To support signed operations, MOVSX_NOREX is also added. Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is then lowered to a sequence containing a single zext (rather than 2). Differential Revision: http://reviews.llvm.org/D6064 llvm-svn: 221176	2014-11-03 20:26:35 +00:00
Tom Stellard	5b15714b76	Reapply: R600: Make sure to inline all internal functions Function calls aren't supported yet. This was reverted due to build breakages, which should be fixed now. llvm-svn: 221173	2014-11-03 19:49:05 +00:00
Paul Robinson	2b27bd26b6	Normally an 'optnone' function goes through fast-isel, which does not call DAGCombiner. But we ran into a case (on Windows) where the calling convention causes argument lowering to bail out of fast-isel, and we end up in CodeGenAndEmitDAG() which does run DAGCombiner. So, we need to make DAGCombiner check for 'optnone' after all. Commit includes the test that found this, plus another one that got missed in the original optnone work. llvm-svn: 221168	2014-11-03 18:19:26 +00:00
Charlie Turner	05675977c5	Remove the cortex-a9-mp CPU. This CPU definition is redundant. The Cortex-A9 is defined as supporting multiprocessing extensions. Remove its definition and update appropriate tests. LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only difference between the two CPU definitions in ARM.td is that cortex-a9-mp contains the feature FeatureMP for multiprocessing extensions. This is redundant since the Cortex-A9 is defined as having multiprocessing extensions in the TRMs. armcc also defines the Cortex-A9 as having multiprocessing extensions by default. Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7 llvm-svn: 221166	2014-11-03 17:38:00 +00:00
Oliver Stannard	2c73f413f7	[AArch64] Fix miscompile of comparison with 0xffffffffffffffff Some literals in the AArch64 backend had 15 'f's rather than 16, causing comparisons with a constant 0xffffffffffffffff to be miscompiled. llvm-svn: 221157	2014-11-03 15:28:40 +00:00
Sid Manning	f60eba6543	Handle ctor/init_array initialization. Hexagon was not calling InitializeELF and could not select between ctors and init_array. Phabricator revision: http://reviews.llvm.org/D6061 llvm-svn: 221156	2014-11-03 14:56:05 +00:00
Adrian Prantl	401e61e9a6	Revert "Temporarily revert r220777 to sort out build bot breakage." This reverts commit r221028. Later commits depend on this and reverting just this one causes even more bots to fail. llvm-svn: 221041	2014-11-01 03:19:45 +00:00
Adrian Prantl	379cee6cd0	Temporarily revert r220777 to sort out build bot breakage. "[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros." llvm-svn: 221028	2014-11-01 00:26:59 +00:00
Reid Kleckner	660a86c442	Revert "R600: Make sure to inline all internal functions" This reverts commit r220996. It introduced layering violations causing link errors in many configurations. llvm-svn: 221020	2014-10-31 23:35:26 +00:00
Tom Stellard	044bedf9b1	R600: Don't promote allocas when one of the users is a ptrtoint instruction We need to figure out how to track ptrtoint values all the way until result is converted back to a pointer in order to correctly rewrite the pointer type. llvm-svn: 220997	2014-10-31 20:52:04 +00:00
Tom Stellard	8a8077171a	R600: Make sure to inline all internal functions Function calls aren't supported yet. llvm-svn: 220996	2014-10-31 20:52:02 +00:00
Bill Schmidt	5c5103e17e	[PowerPC] Initial VSX intrinsic support, with min/max for vector double Now that we have initial support for VSX, we can begin adding intrinsics for programmer access to VSX instructions. This patch adds basic support for VSX intrinsics in general, and tests it by implementing intrinsics for minimum and maximum for the vector double data type. The LLVM portion of this is quite straightforward. There is a companion patch for Clang. llvm-svn: 220988	2014-10-31 19:19:07 +00:00
Quentin Colombet	06167df4ad	[CodeGenPrepare] Move extractelement close to store if they can be combined. This patch adds an optimization in CodeGenPrepare to move an extractelement right before a store when the target can combine them. The optimization may promote any scalar operations to vector operations in the way to make that possible. Context Some targets use different register files for both vector and scalar operations. This means that transitioning from one domain to another may incur copy from one register file to another. These copies are not coalescable and may be expensive. For example, according to the scheduling model, on cortex-A8 a vector to GPR move is 20 cycles. Motivating Example Let us consider an example: define void @foo(<2 x i32>* %addr1, i32* %dest) { %in1 = load <2 x i32>* %addr1, align 8 %extract = extractelement <2 x i32> %in1, i32 1 %out = or i32 %extract, 1 store i32 %out, i32* %dest, align 4 ret void } As it is, this IR generates the following assembly on armv7: vldr d16, [r0] @vector load vmov.32 r0, d16[1] @ cross-register-file copy: 20 cycles orr r0, r0, #1 @ scalar bitwise or str r0, [r1] @ scalar store bx lr Whereas we could generate much faster code: vldr d16, [r0] @ vector load vorr.i32 d16, #0x1 @ vector bitwise or vst1.32 {d16[1]}, [r1:32] @ vector extract + store bx lr Half of the computation made in the vector is useless, but this allows to get rid of the expensive cross-register-file copy. Proposed Solution To avoid this cross-register-copy penalty, we promote the scalar operations to vector operations. The penalty will be removed if we manage to promote the whole chain of computation in the vector domain. Currently, we do that only when the chain of computation ends by a store and the target is able to combine an extract with a store. Stores are the most likely candidates, because other instructions produce values that would need to be promoted and so, extracted as some point[1]. Moreover, this is customary that targets feature stores that perform a vector extract (see AArch64 and X86 for instance). The proposed implementation relies on the TargetTransformInfo to decide whether or not it is beneficial to promote a chain of computation in the vector domain. Unfortunately, this interface is rather inaccurate for this level of details and although this optimization may be beneficial for X86 and AArch64, the inaccuracy will lead to the optimization being too aggressive. Basically in TargetTransformInfo, everything that is legal has a cost of 1, whereas, even if a vector type is legal, usually a vector operation is slightly more expensive than its scalar counterpart. That will lead to too many promotions that may not be counter balanced by the saving of the cross-register-file copy. For instance, on AArch64 this penalty is just 4 cycles. For now, the optimization is just enabled for ARM prior than v8, since those processors have a larger penalty on cross-register-file copies, and the scope is limited to basic blocks. Because of these two factors, we limit the effects of the inaccuracy. Indeed, I did not want to build up a fancy cost model with block frequency and everything on top of that. [1] We can imagine targets that can combine an extractelement with other instructions than just stores. If we want to go into that direction, the current interfaces must be augmented and, moreover, I think this becomes a global isel problem. Differential Revision: http://reviews.llvm.org/D5921 <rdar://problem/14170854> llvm-svn: 220978	2014-10-31 17:52:53 +00:00
Chad Rosier	9bc5bd5f9a	[AArch64] CondOpt pass is missing FCMP instructions when searching backward for a CMP which defines the flags used by B.CC. http://reviews.llvm.org/D6047 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220961	2014-10-31 15:17:36 +00:00
Ulrich Weigand	5b86d1f937	[PowerPC] Load BlockAddress values from the TOC in 64-bit SVR4 code Since block address values can be larger than 2GB in 64-bit code, they cannot be loaded simply using an @l / @ha pair, but instead must be loaded from the TOC, just like GlobalAddress, ConstantPool, and JumpTable values are. The commit also fixes a bug in PPCLinuxAsmPrinter::doFinalization where temporary labels could not be used as TOC values, since code would attempt (and fail) to use GetOrCreateSymbol to create a symbol of the same name as the temporary label. llvm-svn: 220959	2014-10-31 10:33:14 +00:00
Hao Liu	6cc87eb119	PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend. Initial patch by Oleg Ranevskyy. llvm-svn: 220945	2014-10-31 02:35:34 +00:00
Ahmed Bougacha	38c0bf429c	[SelectionDAG] When scalarizing trunc, don't assert for legal operands. r212242 introduced a legalizer hook, originally to let AArch64 widen v1i{32,16,8} rather than scalarize, because the legalizer expected, when scalarizing the result of a conversion operation, to already have scalarized the operands. On AArch64, v1i64 is legal, so that commit ensured operations such as v1i32 = trunc v1i64 wouldn't assert. It did that by choosing to widen v1 types whenever possible. However, v1i1 types, for which there's no legal widened type, would still trigger the assert. This commit fixes that, by only scalarizing a trunc's result when the operand has already been scalarized, and introducing an extract_elt otherwise. This is similar to r205625. Fixes PR20777. llvm-svn: 220937	2014-10-30 23:46:50 +00:00
Louis Gerbarg	6f92b8978d	Fix incorrect invariant check in DAG Combine Earlier this summer I fixed an issue where we were incorrectly combining multiple loads that had different constraints such alignment, invariance, temporality, etc. Apparently in one case I made copt paste error and swapped alignment and invariance. Tests included. rdar://18816719 llvm-svn: 220933	2014-10-30 22:21:03 +00:00
Tim Northover	d587e3a983	ARM: test default values for TAG_CPU_unaligned_access attribute. It should be on for every target that supports unaligned accesses (e.g. not v6m). Patch by Charlie Turner. llvm-svn: 220912	2014-10-30 17:05:44 +00:00
Robert Khasanov	fa811056f3	[x86] Simplify vector selection if condition value type matches vselect value type and true value is all ones or false value is all zeros. This transformation worked if selector is produced by SETCC, however SETCC is needed only if we consider to swap operands. So I replaced SETCC check for this case. Added tests for vselect of <X x i1> values. llvm-svn: 220777	2014-10-28 15:59:40 +00:00
Robert Khasanov	a87de33c61	[AVX512] Bring back vector-shuffle lowering support through broadcasts Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774	2014-10-28 12:28:51 +00:00
Reid Kleckner	af3046bd9e	X86: Implement the vectorcall calling convention This is a Microsoft calling convention that supports both x86 and x86_64 subtargets. It passes vector and floating point arguments in XMM0-XMM5, and passes them indirectly once they are consumed. Homogenous vector aggregates of up to four elements can be passed in sequential vector registers, but this part is not implemented in LLVM and will be handled in Clang. On 32-bit x86, it is similar to fastcall in that it uses ecx:edx as integer register parameters and is callee cleanup. On x86_64, it delegates to the normal win64 calling convention. Reviewers: majnemer Differential Revision: http://reviews.llvm.org/D5943 llvm-svn: 220745	2014-10-28 01:29:26 +00:00
Tim Northover	136fb499eb	AArch64: enable Cortex-A57 FP balancing on Cortex-A53. Benchmarks have shown that it's harmless to the performance there, and having a unified set of passes between the two cores where possible helps big.LITTLE deployment. Patch by Z. Zheng. llvm-svn: 220744	2014-10-28 01:24:32 +00:00
Pete Cooper	01b2132972	Fix a stackmap bug introduced in r220710. For a call to not return in to the stackmap shadow, the shadow must end with the call. To do this, we must insert any required nops before the call, and not after it. llvm-svn: 220728	2014-10-27 22:38:45 +00:00
Juergen Ributzka	b3117b0b86	[FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713	2014-10-27 19:58:36 +00:00
Juergen Ributzka	76c57b0570	[FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'. Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712	2014-10-27 19:46:23 +00:00
Pete Cooper	87efb91a50	Stackmap shadows should consider call returns a branch target. To avoid emitting too many nops, a stackmap shadow can include emitted instructions in the shadow, but these must not include branch targets. A return from a call should count as a branch target as patching over the instructions after the call would lead to incorrect behaviour for threads currently making that call, when they return. llvm-svn: 220710	2014-10-27 19:40:35 +00:00
Juergen Ributzka	3423c5ccda	[FastISel][AArch64] Use 'cbz' also for null values (pointers). The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709	2014-10-27 19:38:05 +00:00
Juergen Ributzka	64c2c99226	[FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704	2014-10-27 19:16:48 +00:00
Juergen Ributzka	57783726dd	[FastISel][AArch64] Fix load/store with frame indices. At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700	2014-10-27 18:21:58 +00:00
Oliver Stannard	9a595b2769	[ARM] Select VMAXNM and VMINNM regardless of operand order Currently, the ARM backend will select the VMAXNM and VMINNM for these C expressions: (a < b) ? a : b (a > b) ? a : b but not these expressions: (a > b) ? b : a (a < b) ? b : a This patch allows all of these expressions to be matched. llvm-svn: 220671	2014-10-27 09:23:02 +00:00
Jingyue Wu	376aaf44c4	[NVPTX] aligned byte-buffers for vector return types Summary: Fixes PR21100 which is caused by inconsistency between the declared return type and the expected return type at the call site. The new behavior is consistent with nvcc and the NVPTXTargetLowering::getPrototype function. Test Plan: test/Codegen/NVPTX/vector-return.ll Reviewers: jholewinski Reviewed By: jholewinski Subscribers: llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5612 llvm-svn: 220607	2014-10-25 03:46:16 +00:00
Simon Pilgrim	29b3d266a4	[X86][SSE] Bitcast assertion in XFormVExtractWithShuffleIntoLoad Minor patch to fix an issue in XFormVExtractWithShuffleIntoLoad where a load is unary shuffled, then bitcast (to a type with the same number of elements) before extracting an element. An undef was created for the second shuffle operand using the original (post-bitcasted) vector type instead of the pre-bitcasted type like the rest of the shuffle node - this was then causing an assertion on the different types later on inside SelectionDAG::getVectorShuffle. Differential Revision: http://reviews.llvm.org/D5917 llvm-svn: 220592	2014-10-24 21:04:41 +00:00
Sanjay Patel	4e42d925c1	Allow AVX vrsqrtps generation. This is a follow-on to r220570 that allows a 256-bit (v8f32) version of vrsqrtps to be generated. llvm-svn: 220579	2014-10-24 17:59:18 +00:00
Sanjay Patel	d9b7837012	Use rsqrt (X86) to speed up reciprocal square root calcs This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570	2014-10-24 17:02:16 +00:00
Daniel Sanders	d767694dc4	[mips] For N32/N64, structs must be passed in the upper bits of a register. Summary: Most structs were fixed by r218451 but those of between >32-bits and <64-bits remained broken since they were not marked with [ASZ]ExtUpper. This patch fixes the remaining cases by using CCPromoteToUpperBitsInType<i64> on i64's in addition to i32 and smaller. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5963 llvm-svn: 220556	2014-10-24 13:09:19 +00:00
Oliver Stannard	4116b199b2	[AArch64] Fix fast-isel of cbz of i1, i8, i16 This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553	2014-10-24 09:54:41 +00:00
Ahmed Bougacha	f062592e07	Make test for r220533 more robust by using GPR pattern. llvm-svn: 220541	2014-10-24 00:03:46 +00:00
Ahmed Bougacha	a035d19b50	[SelectionDAG] Teach the vector scalarizer about FP conversions. This adds support for legalization of instructions of the form: [fp_conv] <1 x i1> %op to <1 x double> where fp_conv is one of fpto[us]i, [us]itofp. This used to assert because they were simply missing from the vector operand scalarizer. A similar problem arose in r190830, with trunc instead. Fixes PR20778. Differential Revision: http://reviews.llvm.org/D5810 llvm-svn: 220533	2014-10-23 22:49:25 +00:00
Tim Northover	d882ba8bc5	ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodes x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS dependency because it was represented by a pair of CopyFromReg(EFLAGS) -> CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an implicit-def on the instruction, where the result numbers in the DAG and the Uses list in TableGen matched up precisely. The Copy notation seems much more robust, so this patch extends ScheduleDAG rather than refactoring x86. Should fix PR20376. llvm-svn: 220529	2014-10-23 22:31:48 +00:00
Ahmed Bougacha	af95bf4558	[X86] Improve mul w/ overflow codegen, to MUL8+SETO. Currently, @llvm.smul.with.overflow.i8 expands to 9 instructions, where 3 are really needed. This adds X86ISD::UMUL8/SMUL8 SD nodes, and custom lowers them to MUL8/IMUL8 + SETO. i8 is a special case because there is no two/three operand variants of (I)MUL8, so the first operand and return value need to go in AL/AX. Also, we can't write patterns for these instructions: TableGen refuses patterns where output operands don't match SDNode results. In this case, instructions where the output operand is an implicitly defined register. A related special case (and FIXME) exists for MUL8 (X86InstrArith.td): // FIXME: Used for 8-bit mul, ignore result upper 8 bits. // This probably ought to be moved to a def : Pat<> if the // syntax can be accepted. [(set AL, (mul AL, GR8:$src)), (implicit EFLAGS)] Ideally, these go away with UMUL8, but we still need to improve TableGen support of implicit operands in patterns. Before this change: movsbl %sil, %eax movsbl %dil, %ecx imull %eax, %ecx movb %cl, %al sarb $7, %al movzbl %al, %eax movzbl %ch, %esi cmpl %eax, %esi setne %al After: movb %dil, %al imulb %sil seto %al Also, remove a made-redundant testcase for PR19858, and enable more FastISel ALU-overflow tests for SelectionDAG too. Differential Revision: http://reviews.llvm.org/D5809 llvm-svn: 220516	2014-10-23 21:55:31 +00:00
Reid Kleckner	59aa68169a	Revert "Don't count inreg params when mangling fastcall functions" This reverts commit r214981. I'm not sure what I was thinking when I wrote this. Testing with MSVC shows that this function is mangled to '@f@8': int __fastcall f(int a, int b); llvm-svn: 220492	2014-10-23 17:50:42 +00:00

1 2 3 4 5 ...

11148 Commits