llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00

Author	SHA1	Message	Date
Igor Breger	4a5ac08a8e	AVX512: Change store size of kmask. Store size of v8i1, v4i1 , v2i1 and i1 are changed to 16 bits. If KMOVB not supported (require AVX512DQ) only KMOVW can be used so store size should be 2 bytes. Differential Revision: http://reviews.llvm.org/D17138 llvm-svn: 260878	2016-02-15 08:25:28 +00:00
Simon Pilgrim	4d4989655e	[X86][SSE1] Add MOVLHPS/MOVHLPS lowering and memory folding support As discussed on PR26491, this patch adds support for lowering v4f32 shuffles to the MOVLHPS/MOVHLPS instructions. It also adds support for memory folding with their MOVLPS/MOVHPS load equivalents. This first patch only really helps SSE1 targets as SSE2+ targets will widen the shuffle mask and use v2f64 equivalents (although they still combine to MOVLHPS/MOVHLPS for v2f64 splats). This will have to be addressed in a future patch, most likely when we add support for binary target shuffle combines. Differential Revision: http://reviews.llvm.org/D16956 llvm-svn: 260168	2016-02-08 23:03:46 +00:00
Sanjoy Das	6b9e756a71	[X86] Fix a bug in getMemOpBaseRegImmOfs Fix a crash in `getMemOpBaseRegImmOfs` that happens if the base of `MemOp` is a frame index memory operand. The fix is to have `getMemOpBaseRegImmOfs` bail out in such cases. We can possibly be more clever here, if needed. llvm-svn: 259456	2016-02-02 02:32:43 +00:00
Benjamin Kramer	f14c1e99a1	Revert "Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed." and "Add a missing test case for r258847." This reverts commit r258847, r258848. Causes miscompilations and backend errors. llvm-svn: 258927	2016-01-27 12:44:12 +00:00
Cong Hou	20b64d0452	Allow X86::COND_NE_OR_P and X86::COND_NP_OR_E to be reversed. Currently, AnalyzeBranch() fails non-equality comparison between floating points on X86 (see https://llvm.org/bugs/show_bug.cgi?id=23875). This is because this function can modify the branch by reversing the conditional jump and removing unconditional jump if there is a proper fall-through. However, in the case of non-equality comparison between floating points, this can turn the branch "unanalyzable". Consider the following case: jne.BB1 jp.BB1 jmp.BB2 .BB1: ... .BB2: ... AnalyzeBranch() will reverse "jp .BB1" to "jnp .BB2" and then "jmp .BB2" will be removed: jne.BB1 jnp.BB2 .BB1: ... .BB2: ... However, AnalyzeBranch() cannot analyze this branch anymore as there are two conditional jumps with different targets. This may disable some optimizations like block-placement: in this case the fall-through behavior is enforced even if the fall-through block is very cold, which is suboptimal. Actually this optimization is also done in block-placement pass, which means we can remove this optimization from AnalyzeBranch(). However, currently X86::COND_NE_OR_P and X86::COND_NP_OR_E are not reversible: there is no defined negation conditions for them. In order to reverse them, this patch defines two new CondCode X86::COND_E_AND_NP and X86::COND_P_AND_NE. It also defines how to synthesize instructions for them. Here only the second conditional jump is reversed. This is valid as we only need them to do this "unconditional jump removal" optimization. Differential Revision: http://reviews.llvm.org/D11393 llvm-svn: 258847	2016-01-26 20:08:01 +00:00
Simon Pilgrim	a63717e4c0	[X86][AVX] Add commutation support for VPERM2X128 instructions Its main use is to allow memory folding of the 1st operand Differential Revision: http://reviews.llvm.org/D16521 llvm-svn: 258726	2016-01-25 21:51:34 +00:00
Craig Topper	39cda94562	[X86] Make MOV32ri64 a post-RA pseudo instead of a CodeGenOnly instruction. It was only needed for rematerialization. llvm-svn: 256818	2016-01-05 07:44:14 +00:00
David Majnemer	c60cb66b34	Revert "[X86] Use push-pop for materializing small constants under 'minsize'" The red zone consists of 128 bytes beyond the stack pointer so that the allocation of objects in leaf functions doesn't require decrementing rsp. In r255656, we introduced an optimization that would cheaply materialize certain constants via push/pop. Push decrements the stack pointer and stores it's result at what is now the top of the stack. However, this means that using push/pop would encroach on the red zone. PR26023 gives an example where this corrupts an object in the red zone. llvm-svn: 256808	2016-01-05 02:32:06 +00:00
Matthias Braun	c040998a61	MachineInstrBundle: Fix reversed isSuperRegisterEq() call Unfortunately this fix had the effect of exposing the -verify-machineinstrs FIXME of X86InstrInfo.cpp in two testcases for which I disabled it for now. Two testcases also have additional pushq/popq where the corrected code cannot prove that %rax is dead any longer. Looking at the examples, this could potentially be fixed by improving computeRegisterLiveness() to check the live-in lists of the successors blocks when reaching the end of a block. This fixes http://llvm.org/PR25951. llvm-svn: 256799	2016-01-05 00:45:35 +00:00
David Majnemer	5ccb736e05	[X86] Make hasFP constant time We need a frame pointer if there is a push/pop sequence after the prologue in order to unwind the stack. Scanning the instructions to figure out if this happened made hasFP not constant-time which is a violation of expectations. Let's compute this up-front and reuse that computation when we need it. llvm-svn: 256730	2016-01-04 04:49:41 +00:00
Sanjay Patel	32b3efed19	use range-based for-loops; NFCI llvm-svn: 256573	2015-12-29 19:14:23 +00:00
Sanjay Patel	9ffd44cf90	tidy up; NFC llvm-svn: 256506	2015-12-28 18:18:22 +00:00
David Majnemer	38d1ffe261	[X86, Win64] Use a frame pointer if pushf is emitted A frame pointer must be used if stack pointer is modified after the prologue. LLVM will emit pushf/popf if we need to save/restore the FLAGS register, requiring us to have a frame pointer for the function. There is a small twist: this sequence might exist in user code via inline-assembly. For now, conservatively assume that such functions require a frame pointer. For real world justification, please see clang's implementation of __readeflags. This fixes PR25945. llvm-svn: 256456	2015-12-27 06:07:26 +00:00
Craig Topper	fe5f33f108	[X86] Replace MVT::SimpleValueType in the AsmParser library and getX86SubSuperRegister with just an unsigned representing size. This a is step towards fixing a layering violation so the X86 AsmParser won't depending on CodeGen types. llvm-svn: 256425	2015-12-25 22:09:45 +00:00
Elena Demikhovsky	423b83228e	AVX-512: Kreg set 0/1 optimization The patterns that set a mask register to 0/1 KXOR %kn, %kn, %kn / KXNOR %kn, %kn, %kn are replaced with KXOR %k0, %k0, %kn / KXNOR %k0, %k0, %kn - AVX-512 targets optimization. KNL does not recognize dependency-breaking idioms for mask registers, so kxnor %k1, %k1, %k2 has a RAW dependence on %k1. Using %k0 as the undef input register is a performance heuristic based on the assumption that %k0 is used less frequently than the other mask registers, since it is not usable as a write mask. Differential Revision: http://reviews.llvm.org/D15739 llvm-svn: 256365	2015-12-24 08:12:22 +00:00
Craig Topper	b8df4c67a1	[X86] Use range-based for loop. NFC llvm-svn: 256127	2015-12-20 18:41:57 +00:00
Hans Wennborg	6b696434e4	[X86] Use push-pop for materializing small constants under 'minsize' Use the 3-byte (4 with REX prefix) push-pop sequence for materializing small constants. This is smaller than using a mov (5, 6 or 7 bytes depending on size and REX prefix), but it's likely to be slower, so only used for 'minsize'. This is a follow-up to r255656. Differential Revision: http://reviews.llvm.org/D15549 llvm-svn: 255936	2015-12-17 23:18:39 +00:00
Hans Wennborg	c663be5f98	Fix "Not having LAHF/SAHF" assert. It wants to assert that the subtarget is 64-bit, not the register. llvm-svn: 255703	2015-12-15 23:21:46 +00:00
Hans Wennborg	725f905ec1	[X86] Smaller code for materializing 32-bit 1 and -1 constants "movl $-1, %eax" is 5 bytes, "xorl %eax, %eax; decl %eax" is 3 bytes. This commit makes LLVM use the latter when optimizing for size. Differential Revision: http://reviews.llvm.org/D14971 llvm-svn: 255656	2015-12-15 17:10:28 +00:00
Matthias Braun	b8675cada7	CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness() computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362	2015-12-11 19:42:09 +00:00
Simon Pilgrim	a521eb07b2	[X86][ADX] Added memory folding patterns and stack folding tests llvm-svn: 254844	2015-12-05 07:27:50 +00:00
Hans Wennborg	b8bd41122a	X86: Don't emit SAHF/LAHF for 64-bit targets unless explicitly supported These instructions are not supported by all CPUs in 64-bit mode. Emitting them causes Chromium to crash on start-up for users with such chips. (GCC puts these instructions behind -msahf on 64-bit for the same reason.) This patch adds FeatureLAHFSAHF, enables it by default for 32-bit targets and modern CPUs, and changes X86InstrInfo::copyPhysReg back to the lowering from before r244503 when the instructions are not available. Differential Revision: http://reviews.llvm.org/D15240 llvm-svn: 254793	2015-12-04 23:00:33 +00:00
JF Bastien	7719a9592b	X86InstrInfo::copyPhysReg: workaround reg liveness Summary: computeRegisterLiveness and analyzePhysReg are currently getting confused about liveness in some cases, breaking copyPhysReg's calculation of whether AX is dead in some cases. Work around this issue temporarily by assuming that AX is always live. See detail in: https://llvm.org/bugs/show_bug.cgi?id=25033#c7 And associated bugs PR24535 PR25033 PR24991 PR24992 PR25201. This workaround makes the code correct but slightly inefficient, but it seems to confuse the machine instr verifier which now things EAX was undefined in some cases where it's being conservatively saved / restored. Reviewers: majnemer, sanjoy Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15198 llvm-svn: 254680	2015-12-04 01:18:17 +00:00
Craig Topper	22a0f3d3ea	[X86] Use range-based for loops. NFC llvm-svn: 254387	2015-12-01 06:13:15 +00:00
Craig Topper	eaa0236f57	[X86] Use array_lengthof instead of calculating manually. Also change index types to size_t to match. llvm-svn: 254386	2015-12-01 06:13:13 +00:00
Craig Topper	c1aa433aed	Revert r254279 "[X86] Use ArrayRef. NFC". It seems to have upset an MSVC build bot. llvm-svn: 254280	2015-11-30 02:28:19 +00:00
Craig Topper	c6a64c52f0	[X86] Use ArrayRef. NFC llvm-svn: 254279	2015-11-30 02:08:05 +00:00
Vyacheslav Klochkov	fdc2e9e5ae	X86-FMA3: Improved/enabled the memory folding optimization for scalar loads generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}(). Reviewer: David Kreitzer Differential Revision: http://reviews.llvm.org/D14762 llvm-svn: 254140	2015-11-26 07:45:30 +00:00
Sanjay Patel	5c5b0311b8	[x86] remove duplicate movq instruction defs (PR25554) We had duplicated definitions for the same hardware '[v]movq' instructions. For example with SSE: def MOVZQI2PQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src), "mov{d\|q}\t{$src, $dst\|$dst, $src}", // X86-64 only [(set VR128:$dst, (v2i64 (X86vzmovl (v2i64 (scalar_to_vector GR64:$src)))))], IIC_SSE_MOVDQ>; def MOV64toPQIrr : RS2I<0x6E, MRMSrcReg, (outs VR128:$dst), (ins GR64:$src), "mov{d\|q}\t{$src, $dst\|$dst, $src}", [(set VR128:$dst, (v2i64 (scalar_to_vector GR64:$src)))], IIC_SSE_MOVDQ>, Sched<[WriteMove]>; As shown in the test case and PR25554: https://llvm.org/bugs/show_bug.cgi?id=25554 This causes us to miss reusing an operand because later passes don't know these 'movq' are the same instruction. This patch deletes one pair of these defs. Sadly, this won't fix the original test case in the bug report. Something else is still broken. Differential Revision: http://reviews.llvm.org/D14941 llvm-svn: 253988	2015-11-24 15:44:35 +00:00
Simon Pilgrim	ea0cfdd11b	[X86] Use existing MachineInstrBuilder::addDisp to create offseted pointer. NFC. Minor code duplication tidyup to D13988 llvm-svn: 253606	2015-11-19 21:50:57 +00:00
Elena Demikhovsky	6aa44f30d0	AVX-512: Fixed COPY_TO_REGCLASS for mask registers Copying one mask register to another under BW should be done with kmovq instruction, otherwise we can loose some bits. Copying 8 bits under DQ may be done with kmovb. Differential Revision: http://reviews.llvm.org/D14812 llvm-svn: 253563	2015-11-19 13:13:00 +00:00
Vyacheslav Klochkov	262f8eaf50	X86-FMA3: Implemented commute transformations FMA_Int instructions. It made it possible to apply the memory folding optimization for the 2nd operand of FMA_Int instructions. Reviewer: Quentin Colombet Differential Revision: http://reviews.llvm.org/D14550 llvm-svn: 252973	2015-11-13 00:07:35 +00:00
Vyacheslav Klochkov	ccf2a42805	My first/test commit. Removed a trailing whitespace. llvm-svn: 252940	2015-11-12 20:11:57 +00:00
Andrew Kaylor	3cc19fd26f	Improved the operands commute transformation for X86-FMA3 instructions. All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335	2015-11-06 19:47:25 +00:00
Simon Pilgrim	ea116cf21c	Warning fix. llvm-svn: 252078	2015-11-04 21:27:22 +00:00
Simon Pilgrim	d1f5a2789e	[X86][SSE] Add general memory folding for (V)INSERTPS instruction This patch improves the memory folding of the inserted float element for the (V)INSERTPS instruction. The existing implementation occurs in the DAGCombiner and relies on the narrowing of a whole vector load into a scalar load (and then converted into a vector) to (hopefully) allow folding to occur later on. Not only has this proven problematic for debug builds, it also prevents other memory folds (notably stack reloads) from happening. This patch removes the old implementation and moves the folding code to the X86 foldMemoryOperand handler. A new private 'special case' function - foldMemoryOperandCustom - has been added to deal with memory folding of instructions that can't just use the lookup tables - (V)INSERTPS is the first of several that could be done. It also tweaks the memory operand folding code with an additional pointer offset that allows existing memory addresses to be modified, in this case to convert the vector address to the explicit address of the scalar element that will be inserted. Unlike the previous implementation we now set the insertion source index to zero, although this is ignored for the (V)INSERTPSrm version, anything that relied on shuffle decodes (such as unfolding of insertps loads) was incorrectly calculating the source address - I've added a test for this at insertps-unfold-load-bug.ll Differential Revision: http://reviews.llvm.org/D13988 llvm-svn: 252074	2015-11-04 20:48:09 +00:00
Andrew Kaylor	e42c061561	Created new X86 FMA3 opcodes (FMA_Int) that are used now for lowering of scalar FMA intrinsics. Patch by Slava Klochkov The key difference between FMA and FMA_Int opcodes is that FMA_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result. Reviewers: Quentin Colombet, Elena Demikhovsky Differential Revision: http://reviews.llvm.org/D13710 llvm-svn: 252060	2015-11-04 18:10:41 +00:00
Igor Breger	d41ad7cd99	AVX512: Add AVX-512 not materializable instructions. Otherwise value can be reused , despite its value could be changed - produces incorrect assembler. https://llvm.org/bugs/show_bug.cgi?id=25270 Differential Revision: http://reviews.llvm.org/D14057 llvm-svn: 251275	2015-10-26 08:37:12 +00:00
David Majnemer	a0b6521d43	[WinEH] Make FuncletLayout more robust against catchret Catchret transfers control from a catch funclet to an earlier funclet. However, it is not completely clear which funclet the catchret target is part of. Make this clear by stapling the catchret target's funclet membership onto the CATCHRET SDAG node. llvm-svn: 249052	2015-10-01 18:44:59 +00:00
Sanjay Patel	275d6f22b1	[x86] enable machine combiner reassociations for 256-bit vector logical integer insts llvm-svn: 248955	2015-09-30 22:25:55 +00:00
Andrew Kaylor	8d27e2d077	Improved the interface of methods commuting operands, improved X86-FMA3 mem-folding&coalescing. Patch by Slava Klochkov (vyacheslav.n.klochkov@intel.com) Differential Revision: http://reviews.llvm.org/D11370 llvm-svn: 248735	2015-09-28 20:33:22 +00:00
Chad Rosier	432f88b93e	[Machine Combiner] Refactor machine reassociation code to be target-independent. No functional change intended. Patch by Haicheng Wu <haicheng@codeaurora.org>! http://reviews.llvm.org/D12887 PR24522 llvm-svn: 248164	2015-09-21 15:09:11 +00:00
Elena Demikhovsky	e47e32fe51	AVX-512: shufflevector for i1 vectors <2 x i1> .. <64 x i1> AVX-512 does not provide an instruction that shuffles mask register. So I do the following way: mask-2-simd , shuffle simd , simd-2-mask Differential Revision: http://reviews.llvm.org/D12727 llvm-svn: 247876	2015-09-17 06:53:12 +00:00
Sanjay Patel	21adf9726f	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts (2nd try) The changes in: test/CodeGen/X86/machine-cp.ll are just due to scheduling differences after some logic instructions were reassociated. llvm-svn: 247516	2015-09-12 19:47:50 +00:00
Sanjay Patel	9965d5f476	revert r247506; need to verify changes in existing tests llvm-svn: 247507	2015-09-12 15:27:31 +00:00
Sanjay Patel	8f3e510d08	[x86] enable machine combiner reassociations for 128-bit vector logical integer insts llvm-svn: 247506	2015-09-12 14:58:04 +00:00
Sanjay Patel	15727a41c6	[x86] enable machine combiner reassociations for scalar 'xor' insts llvm-svn: 246781	2015-09-03 16:36:16 +00:00
Sanjay Patel	192418c724	rename "slow-unaligned-mem-under-32" to slow-unaligned-mem-16" (NFCI) This is a follow-on suggested by: http://reviews.llvm.org/D12154 ( http://reviews.llvm.org/rL245729 ) http://reviews.llvm.org/D10662 ( http://reviews.llvm.org/rL245075 ) This makes the attribute name match most of the existing lowering logic and regression test expectations. But the current use of this attribute is inconsistent; see the FIXME comment for "allowsMisalignedMemoryAccesses()". That change will result in functional changes and should be coming soon. llvm-svn: 246585	2015-09-01 20:51:51 +00:00
Sanjay Patel	2b8c9b3ed5	[x86] enable machine combiner reassociations for scalar 'or' insts llvm-svn: 246481	2015-08-31 20:27:03 +00:00
Hal Finkel	e59e834a2a	[MIR Serialization] static -> static const in getSerializable*MachineOperandTargetFlags Make the arrays 'static const' instead of just 'static'. Post-commit review comment from Roman Divacky on IRC. NFC. llvm-svn: 246376	2015-08-30 08:07:29 +00:00

1 2 3 4 5 ...

895 Commits