llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00

Author	SHA1	Message	Date
Craig Topper	61842d2894	[X86] Use Ld scheduler classes for instructions with folded loads. llvm-svn: 320459	2017-12-12 07:06:35 +00:00
Simon Pilgrim	ed2f1de918	[X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule class As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general). Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes. Differential Revision: https://reviews.llvm.org/D40351 llvm-svn: 319016	2017-11-27 10:41:32 +00:00
Craig Topper	a9839b29b1	[X86] Add separate intrinsics for scalar FMA4 instructions. Summary: These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits. I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512. I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before. I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics. fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39851 llvm-svn: 318984	2017-11-25 18:32:43 +00:00
Simon Pilgrim	72b3efd24b	Fix spelling in comment. NFCI. llvm-svn: 318687	2017-11-20 19:18:33 +00:00
Craig Topper	4f63d7a01e	[X86] Give priority to EVEX FMA instructions over FMA4 instructions. No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority. llvm-svn: 317775	2017-11-09 08:26:26 +00:00
Craig Topper	1217e81404	[X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible. llvm-svn: 317454	2017-11-06 05:48:26 +00:00
Craig Topper	5dd5a08719	[X86] Add more FMA3 patterns to cover a load in all 3 possible positions. This matches what we already do for AVX512. The peephole pass makes up for this in most if not all cases. But this makes isel behavior for these consistent with every other instruction. llvm-svn: 312613	2017-09-06 03:35:58 +00:00
Craig Topper	4bd1ceb19d	[X86] Mark the FMA nodes as commutable so tablegen will auto generate the patterns. This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands. This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512. I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this. We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have. llvm-svn: 312469	2017-09-04 06:59:50 +00:00
Craig Topper	439189616a	[X86] Add isel patterns for memory forms of FMA3 intrinsic instructions llvm-svn: 312309	2017-09-01 07:58:13 +00:00
Craig Topper	8d8891785f	[X86] Remove unnecessary COPY_TO_REGCLASS(VR128) from the output patterns for FMA instrinsics. The instructions are already defined as writing a VR128 register. llvm-svn: 312308	2017-09-01 07:58:11 +00:00
Craig Topper	e6fa8f8485	[X86] Remove X86ISD::FMADD in favor ISD::FMA There's no reason to have a target specific node with the same semantics as a target independent opcode. This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp Differential Revision: https://reviews.llvm.org/D36983 llvm-svn: 311568	2017-08-23 16:28:04 +00:00
Ayman Musa	8ecd0ea322	[X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions. Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual: Opcode Instruction Op/En 64-Bit Mode Compat/Leg Mode Description 8A /r MOV r8,r/m8 RM Valid Valid Move r/m8 to r8. 88 /r MOV r/m8,r8 MR Valid Valid Move r8 to r/m8. Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions. In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen. Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr. This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables. Differential Revision: https://reviews.llvm.org/D32683 llvm-svn: 304087	2017-05-28 12:39:37 +00:00
Craig Topper	63828a2473	[AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX. We were giving priority if VLX was enabled. llvm-svn: 298046	2017-03-17 06:10:37 +00:00
Craig Topper	624fc12e44	[X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't commutable as operand 0 should pass its upper bits through to the output. llvm-svn: 288011	2016-11-27 21:37:04 +00:00
Craig Topper	bf372031c2	[X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic patterns. llvm-svn: 288010	2016-11-27 21:37:02 +00:00
Craig Topper	5943c6520e	[X86] Create a new instruction format to handle MemOp4 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling. llvm-svn: 279423	2016-08-22 07:38:45 +00:00
Craig Topper	9332f50e72	[X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own. Differential Revision: https://reviews.llvm.org/D22799 llvm-svn: 276987	2016-07-28 15:28:56 +00:00
Craig Topper	39f037759a	[X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions. This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions. llvm-svn: 276559	2016-07-24 08:26:38 +00:00
Vyacheslav Klochkov	d08c394197	X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes. Reviewer: Simon Pilgrim. Differential Revision: http://reviews.llvm.org/D15317 llvm-svn: 255080	2015-12-09 00:12:13 +00:00
Simon Pilgrim	52cfcde3fb	[X86][FMA4] Explicitly set the domain of FMA4 float/double scalar instructions Both were defaulting to the float domain - now matches the packed instructions. llvm-svn: 254841	2015-12-05 07:07:42 +00:00
Vyacheslav Klochkov	fdc2e9e5ae	X86-FMA3: Improved/enabled the memory folding optimization for scalar loads generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}(). Reviewer: David Kreitzer Differential Revision: http://reviews.llvm.org/D14762 llvm-svn: 254140	2015-11-26 07:45:30 +00:00
Sanjay Patel	836b13b706	fix typo; NFC llvm-svn: 254069	2015-11-25 15:33:36 +00:00
Vyacheslav Klochkov	262f8eaf50	X86-FMA3: Implemented commute transformations FMA_Int instructions. It made it possible to apply the memory folding optimization for the 2nd operand of FMA_Int instructions. Reviewer: Quentin Colombet Differential Revision: http://reviews.llvm.org/D14550 llvm-svn: 252973	2015-11-13 00:07:35 +00:00
Andrew Kaylor	3cc19fd26f	Improved the operands commute transformation for X86-FMA3 instructions. All 3 operands of FMA3 instructions are commutable now. Patch by Slava Klochkov Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab). Differential Revision: http://reviews.llvm.org/D13269 llvm-svn: 252335	2015-11-06 19:47:25 +00:00
Andrew Kaylor	e42c061561	Created new X86 FMA3 opcodes (FMA_Int) that are used now for lowering of scalar FMA intrinsics. Patch by Slava Klochkov The key difference between FMA and FMA_Int opcodes is that FMA_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result. Reviewers: Quentin Colombet, Elena Demikhovsky Differential Revision: http://reviews.llvm.org/D13710 llvm-svn: 252060	2015-11-04 18:10:41 +00:00
Michael Kuperstein	4559e5e720	[X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands. The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source. The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied. This modifies the pattern to leave src1 and src2 in their original order. Differential Revision: http://reviews.llvm.org/D9908 llvm-svn: 238131	2015-05-25 12:35:25 +00:00
Craig Topper	0734168db8	Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files. llvm-svn: 222801	2014-11-26 00:46:26 +00:00
Quentin Colombet	9b13d839be	[X86] Selectively mark the FMA variants inside a family as isCommutable. Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or memory) are commutable. E.g., for the 213 family (with the syntax src1, src2, src3): fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3 Now consider the 231 family: fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B But fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231. Working on a reduced test case! <rdar://problem/16800495> llvm-svn: 208252	2014-05-07 21:43:35 +00:00
Lang Hames	e2f8671084	[X86] Make the VFMA*231 variants commutable and relax the alignment restrictions on FMA3 memory operands. FMA3 instructions are VEX encoded, so they can load from unaligned memory. Testcase to follow, along with related patch. <rdar://problem/16478629> llvm-svn: 205472	2014-04-02 22:06:16 +00:00
Lang Hames	4397ed9ac7	[X86] Only 213 FMA3 variants should be marked commutable. Commuting the 231 and 132 variants would swap addends and multiplicands/multipliers, which isn't valid. I'm still trying to reduce a decent test case for this. llvm-svn: 200792	2014-02-04 19:42:47 +00:00
Lang Hames	884a7dc676	Replace X86 FMA intrinsic pseduo-instructions with def pats. It looks like these pseudos were only used for pattern matching. Def pats are the appropriate way to do that. As a bonus, these intrinsics will now have memory operands folded properly, and better FMA3 variants selected where appropriate (see r199933). <rdar://problem/15611947> llvm-svn: 200577	2014-01-31 21:29:19 +00:00
Lang Hames	8b08ff3852	Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator loops. Writing back to the accumulator (231-type) allows the coalescer to eliminate an extra copy. llvm-svn: 199933	2014-01-23 20:23:36 +00:00
Craig Topper	4a48c26e38	Add a new x86 specific instruction flag to force some isCodeGenOnly instructions to go through to the disassembler tables without resorting to string matches. Apply flag to all _REV instructions. llvm-svn: 198543	2014-01-05 04:17:28 +00:00
Craig Topper	57b949fa83	Mark all x86 Int_ and _Int patterns as isCodeGenOnly so the disassembler table builder doesn't need to string match them to exclude them. llvm-svn: 198323	2014-01-02 17:28:14 +00:00
Craig Topper	a4bd7d9c3c	Various x86 disassembler fixes. Add VEX_LIG to scalar FMA4 instructions. Use VEX_LIG in some of the inheriting checks in disassembler table generator. Make use of VEX_L_W, VEX_L_W_XS, VEX_L_W_XD contexts. Don't let VEX_L_W, VEX_L_W_XS, VEX_L_W_XD, VEX_L_W_OPSIZE inherit from their non-L forms unless VEX_LIG is set. Let VEX_L_W, VEX_L_W_XS, VEX_L_W_XD, VEX_L_W_OPSIZE inherit from all of their non-L or non-W cases. Increase ranking on VEX_L_W, VEX_L_W_XS, VEX_L_W_XD, VEX_L_W_OPSIZE so they get chosen over non-L/non-W forms. llvm-svn: 191649	2013-09-30 02:46:36 +00:00
Craig Topper	ef2cf025cd	Remove alignment restrictions from FMA load folding. llvm-svn: 191136	2013-09-21 05:58:59 +00:00
Craig Topper	58b9662000	Simplify nested strconcats in X86 td files since strconcat can take more than 2 arguments. llvm-svn: 172379	2013-01-14 07:46:34 +00:00
Craig Topper	152bee45fa	Mark all the _REV instructions as not having side effects. They aren't really emitted by the backend, but it reduces the number of instructions in the output files with unmodelled side effects to make auditing easier. llvm-svn: 171118	2012-12-26 21:30:22 +00:00
Craig Topper	f0d2332d86	Fix execution domain for packed FMA4 instructions. llvm-svn: 168417	2012-11-21 08:08:21 +00:00
Craig Topper	7c37abcace	Add explicit VEX_L tags to all 256-bit instructions. This will allow us to remove code from the code emitters that examined operands to set the L-bit. llvm-svn: 164202	2012-09-19 06:06:34 +00:00
Craig Topper	2e53378ff6	Mark FMA4 instructions as commutable and add them to the folding tables. llvm-svn: 163035	2012-08-31 23:10:34 +00:00
Craig Topper	917333c8c7	Mark FMA3 instructions as commutable so that the operands to the multiply part can be commuted. llvm-svn: 163001	2012-08-31 16:31:13 +00:00
Craig Topper	6bb3145d0d	Add support for converting llvm.fma to fma4 instructions. llvm-svn: 162999	2012-08-31 15:40:30 +00:00
Craig Topper	aa2444a397	Convert FMA4 patterns to use target specific nodes instead of intrinsics to align with FMA3. llvm-svn: 162829	2012-08-29 07:18:25 +00:00
Jakob Stoklund Olesen	48bb81b28a	Remove more mayLoad workarounds. llvm-svn: 162556	2012-08-24 14:43:22 +00:00
Craig Topper	aa57ba3944	Custom lower FMA intrinsics to target specific nodes and remove the patterns. llvm-svn: 162534	2012-08-24 04:03:22 +00:00
Craig Topper	e432edabf1	Cleanup the scalar FMA3 definitions. Add patterns to fold loads with scalar forms. llvm-svn: 162260	2012-08-21 07:11:11 +00:00
Craig Topper	2e63b3ea18	Merge FMA3 instructions with and without patterns into single classes using null_frag. llvm-svn: 162257	2012-08-21 05:56:45 +00:00
Craig Topper	77406bef3b	Remove FMA3 intrinsic instructions in favor of patterns. llvm-svn: 162194	2012-08-20 06:21:25 +00:00
Craig Topper	64c93f9d07	Use correct intrinsic for 256-bit VFMSUBADDPS. llvm-svn: 162193	2012-08-20 06:03:04 +00:00

1 2

74 Commits