1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

74 Commits

Author SHA1 Message Date
Craig Topper
61842d2894 [X86] Use Ld scheduler classes for instructions with folded loads.
llvm-svn: 320459
2017-12-12 07:06:35 +00:00
Simon Pilgrim
ed2f1de918 [X86][FMA] Tag all FMA/FMA4 instructions with WriteFMA schedule class
As mentioned on PR17367, many instructions are missing scheduling tags preventing us from setting 'CompleteModel = 1' for better instruction analysis. This patch deals with FMA/FMA4 which is one of the bigger offenders (along with AVX512 in general).

Annoyingly all scheduler models need to define WriteFMA (now that its actually used), even for older targets without FMA/FMA4 support, but that is an existing problem shared by other schedule classes.

Differential Revision: https://reviews.llvm.org/D40351

llvm-svn: 319016
2017-11-27 10:41:32 +00:00
Craig Topper
a9839b29b1 [X86] Add separate intrinsics for scalar FMA4 instructions.
Summary:
These instructions zero the non-scalar part of the lower 128-bits which makes them different than the FMA3 instructions which pass through the non-scalar part of the lower 128-bits.

I've only added fmadd because we should be able to derive all other variants using operand negation in the intrinsic header like we do for AVX512.

I think there are still some missed negate folding opportunities with the FMA4 instructions in light of this behavior difference that I hadn't noticed before.

I've split the tests so that we can use different intrinsics for scalar testing between the two. I just copied the tests split the RUN lines and changed out the scalar intrinsics.

fma4-fneg-combine.ll is a new test to make sure we negate the fma4 intrinsics correctly though there are a couple TODOs in it.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D39851

llvm-svn: 318984
2017-11-25 18:32:43 +00:00
Simon Pilgrim
72b3efd24b Fix spelling in comment. NFCI.
llvm-svn: 318687
2017-11-20 19:18:33 +00:00
Craig Topper
4f63d7a01e [X86] Give priority to EVEX FMA instructions over FMA4 instructions.
No existing processor has both so it doesn't really matter what we do here. But we were previously just relying on pattern order which gave FMA4 priority.

llvm-svn: 317775
2017-11-09 08:26:26 +00:00
Craig Topper
1217e81404 [X86] Use EVEX encoded intrinsics for legacy FMA intrinsics when possible.
llvm-svn: 317454
2017-11-06 05:48:26 +00:00
Craig Topper
5dd5a08719 [X86] Add more FMA3 patterns to cover a load in all 3 possible positions.
This matches what we already do for AVX512. The peephole pass makes up for this in most if not all cases. But this makes isel behavior for these consistent with every other instruction.

llvm-svn: 312613
2017-09-06 03:35:58 +00:00
Craig Topper
4bd1ceb19d [X86] Mark the FMA nodes as commutable so tablegen will auto generate the patterns.
This uses the capability introduced in r312464 to make SDNode patterns commutable on the first two operands.

This allows us to remove some of the extra FMA patterns that have to put loads and mask operands in different places to cover all cases. This even includes patterns that were missing to support match a load in the first operand with FMA4. Non-broadcast loads with masking for AVX512.

I believe this is causing us to generate some duplicate patterns because tablegen's isomorphism checks don't catch isomorphism between the patterns as written in the td. It only detects isomorphism in the commuted variants it tries to create. The the unmasked 231 and 132 memory forms are isomorphic as written in the td file so we end up keeping both. I think we precommute the 132 pattern to fix this.

We also need a follow up patch to go back to the legacy FMA3 instructions and add patterns to the 231 and 132 forms which we currently don't have.

llvm-svn: 312469
2017-09-04 06:59:50 +00:00
Craig Topper
439189616a [X86] Add isel patterns for memory forms of FMA3 intrinsic instructions
llvm-svn: 312309
2017-09-01 07:58:13 +00:00
Craig Topper
8d8891785f [X86] Remove unnecessary COPY_TO_REGCLASS(VR128) from the output patterns for FMA instrinsics.
The instructions are already defined as writing a VR128 register.

llvm-svn: 312308
2017-09-01 07:58:11 +00:00
Craig Topper
e6fa8f8485 [X86] Remove X86ISD::FMADD in favor ISD::FMA
There's no reason to have a target specific node with the same semantics as a target independent opcode.

This should simplify D36335 so that it doesn't need to touch X86ISelDAGToDAG.cpp

Differential Revision: https://reviews.llvm.org/D36983

llvm-svn: 311568
2017-08-23 16:28:04 +00:00
Ayman Musa
8ecd0ea322 [X86] Adding FoldGenRegForm helper field (for memory folding tables tableGen backend) to X86Inst class and set its value for the relevant instructions.
Some register-register instructions can be encoded in 2 different ways, this happens when 2 register operands can be folded (separately). 
For example if we look at the MOV8rr and MOV8rr_REV, both instructions perform exactly the same operation, but are encoded differently. Here is the relevant information about these instructions from Intel's 64-ia-32-architectures-software-developer-manual:

Opcode  Instruction  Op/En  64-Bit Mode  Compat/Leg Mode  Description
8A /r   MOV r8,r/m8  RM     Valid        Valid            Move r/m8 to r8.
88 /r   MOV r/m8,r8  MR     Valid        Valid            Move r8 to r/m8.
Here we can see that in order to enable the folding of the output and input registers, we had to define 2 "encodings", and as a result we got 2 move 8-bit register-register instructions.

In the X86 backend, we define both of these instructions, usually one has a regular name (MOV8rr) while the other has "_REV" suffix (MOV8rr_REV), must be marked with isCodeGenOnly flag and is not emitted from CodeGen.

Automatically generating the memory folding tables relies on matching encodings of instructions, but in these cases where we want to map both memory forms of the mov 8-bit (MOV8rm & MOV8mr) to MOV8rr (not to MOV8rr_REV) we have to somehow point from the MOV8rr_REV to the "regular" appropriate instruction which in this case is MOV8rr.

This field enable this "pointing" mechanism - which is used in the TableGen backend for generating memory folding tables.

Differential Revision: https://reviews.llvm.org/D32683

llvm-svn: 304087
2017-05-28 12:39:37 +00:00
Craig Topper
63828a2473 [AVX-512] Give priority to EVEX encoded scalar FMA instructions when we have FMA, AVX512 and no VLX.
We were giving priority if VLX was enabled.

llvm-svn: 298046
2017-03-17 06:10:37 +00:00
Craig Topper
624fc12e44 [X86][FMA4] Remove isCommutable from FMA4 scalar intrinsics. They aren't commutable as operand 0 should pass its upper bits through to the output.
llvm-svn: 288011
2016-11-27 21:37:04 +00:00
Craig Topper
bf372031c2 [X86][FMA] Add missing Predicates qualifier around scalar FMA intrinsic patterns.
llvm-svn: 288010
2016-11-27 21:37:02 +00:00
Craig Topper
5943c6520e [X86] Create a new instruction format to handle MemOp4 encoding. This saves one bit in TSFlags and simplifies MRMSrcMem/MRMSrcReg format handling.
llvm-svn: 279423
2016-08-22 07:38:45 +00:00
Craig Topper
9332f50e72 [X86] Remove CustomInserter for FMA3 instructions. Looks like since we got full commuting support for FMAs after this was added, the coalescer can now get this right on its own.
Differential Revision: https://reviews.llvm.org/D22799

llvm-svn: 276987
2016-07-28 15:28:56 +00:00
Craig Topper
39f037759a [X86] Make the FMA3 instruction names consistent between VEX and EVEX encoded versions.
This places the 132/213/231 form number in front of the SS/SD/PS/PD. Move the Y for 256-bit versions to be after the PS/PD. Change the AVX512 scalar forms to include a Z in the their name. This new format should be consistent with the general naming of instructions.

llvm-svn: 276559
2016-07-24 08:26:38 +00:00
Vyacheslav Klochkov
d08c394197 X86-FMA3: Defined the ExeDomain property for Scalar FMA3 opcodes.
Reviewer: Simon Pilgrim.
Differential Revision: http://reviews.llvm.org/D15317

llvm-svn: 255080
2015-12-09 00:12:13 +00:00
Simon Pilgrim
52cfcde3fb [X86][FMA4] Explicitly set the domain of FMA4 float/double scalar instructions
Both were defaulting to the float domain - now matches the packed instructions.

llvm-svn: 254841
2015-12-05 07:07:42 +00:00
Vyacheslav Klochkov
fdc2e9e5ae X86-FMA3: Improved/enabled the memory folding optimization for scalar loads
generated for _mm_losd_s{s,d}() intrinsics and used in scalar FMAs generated 
for FMA intrinsics _mm_f{madd,msub,nmadd,nmsub}_s{s,d}().

Reviewer: David Kreitzer
Differential Revision: http://reviews.llvm.org/D14762

llvm-svn: 254140
2015-11-26 07:45:30 +00:00
Sanjay Patel
836b13b706 fix typo; NFC
llvm-svn: 254069
2015-11-25 15:33:36 +00:00
Vyacheslav Klochkov
262f8eaf50 X86-FMA3: Implemented commute transformations FMA*_Int instructions.
It made it possible to apply the memory folding optimization for the 2nd
operand of FMA*_Int instructions.

Reviewer: Quentin Colombet
Differential Revision: http://reviews.llvm.org/D14550

llvm-svn: 252973
2015-11-13 00:07:35 +00:00
Andrew Kaylor
3cc19fd26f Improved the operands commute transformation for X86-FMA3 instructions.
All 3 operands of FMA3 instructions are commutable now.

Patch by Slava Klochkov

Reviewers: Quentin Colombet(qcolombet), Ahmed Bougacha(ab).

Differential Revision: http://reviews.llvm.org/D13269

llvm-svn: 252335
2015-11-06 19:47:25 +00:00
Andrew Kaylor
e42c061561 Created new X86 FMA3 opcodes (FMA*_Int) that are used now for lowering of scalar FMA intrinsics.
Patch by Slava Klochkov 

The key difference between FMA* and FMA*_Int opcodes is that FMA*_Int opcodes are handled more conservatively. It is illegal to commute the 1st operand of FMA*_Int instructions as the upper bits of scalar FMA intrinsic result must be taken from the 1st operand, but such commute transformation would change those upper bits and invalidate the intrinsic's result.

Reviewers: Quentin Colombet, Elena Demikhovsky

Differential Revision: http://reviews.llvm.org/D13710

llvm-svn: 252060
2015-11-04 18:10:41 +00:00
Michael Kuperstein
4559e5e720 [X86] When pattern-matching scalar FMA3 intrinsics, don't re-arrange the first and second operands.
The semantics of the scalar FMA intrinsics are that the high vector elements are copied from the first source.
The existing pattern switches src1 and src2 around, to match the "213" order, which ends up tying the original src2 to the dest. Since the actual scalar fma3 instructions copy the high elements from the dest register, the wrong values are copied.

This modifies the pattern to leave src1 and src2 in their original order.

Differential Revision: http://reviews.llvm.org/D9908

llvm-svn: 238131
2015-05-25 12:35:25 +00:00
Craig Topper
0734168db8 Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.
llvm-svn: 222801
2014-11-26 00:46:26 +00:00
Quentin Colombet
9b13d839be [X86] Selectively mark the FMA variants inside a family as isCommutable.
Given a FMA family (e.g., 213, 231), not all the variants (i.e., register or
memory) are commutable.
E.g., for the 213 family (with the syntax src1, src2, src3):
fmaXXX213 A, B, reg3/mem3 == fmaXXX213 B, A, reg3/mem3

Now consider the 231 family:
fmaXXX231 A, B, reg3 == fmaXXX231 A, reg3, B
But
fmaXXX231 A, B, mem3 != fmaXXX231 A, mem3, B
Indeed, mem3 cannot be the second argument of the memory variant of fmaXXX231.

Working on a reduced test case!

<rdar://problem/16800495>

llvm-svn: 208252
2014-05-07 21:43:35 +00:00
Lang Hames
e2f8671084 [X86] Make the VFMA*231 variants commutable and relax the alignment restrictions
on FMA3 memory operands. FMA3 instructions are VEX encoded, so they can load
from unaligned memory.

Testcase to follow, along with related patch.

<rdar://problem/16478629>

llvm-svn: 205472
2014-04-02 22:06:16 +00:00
Lang Hames
4397ed9ac7 [X86] Only 213 FMA3 variants should be marked commutable.
Commuting the 231 and 132 variants would swap addends and
multiplicands/multipliers, which isn't valid.

I'm still trying to reduce a decent test case for this.

llvm-svn: 200792
2014-02-04 19:42:47 +00:00
Lang Hames
884a7dc676 Replace X86 FMA intrinsic pseduo-instructions with def pats.
It looks like these pseudos were only used for pattern matching. Def pats are
the appropriate way to do that. As a bonus, these intrinsics will now have
memory operands folded properly, and better FMA3 variants selected where
appropriate (see r199933).

<rdar://problem/15611947>

llvm-svn: 200577
2014-01-31 21:29:19 +00:00
Lang Hames
8b08ff3852 Replace vfmaddxx213 instructions with their 231-type equivalents in accumulator
loops. Writing back to the accumulator (231-type) allows the coalescer to
eliminate an extra copy.

llvm-svn: 199933
2014-01-23 20:23:36 +00:00
Craig Topper
4a48c26e38 Add a new x86 specific instruction flag to force some isCodeGenOnly instructions to go through to the disassembler tables without resorting to string matches. Apply flag to all _REV instructions.
llvm-svn: 198543
2014-01-05 04:17:28 +00:00
Craig Topper
57b949fa83 Mark all x86 Int_ and _Int patterns as isCodeGenOnly so the disassembler table builder doesn't need to string match them to exclude them.
llvm-svn: 198323
2014-01-02 17:28:14 +00:00
Craig Topper
a4bd7d9c3c Various x86 disassembler fixes.
Add VEX_LIG to scalar FMA4 instructions.
Use VEX_LIG in some of the inheriting checks in disassembler table generator.
Make use of VEX_L_W, VEX_L_W_XS, VEX_L_W_XD contexts.
Don't let VEX_L_W, VEX_L_W_XS, VEX_L_W_XD, VEX_L_W_OPSIZE inherit from their non-L forms unless VEX_LIG is set.
Let VEX_L_W, VEX_L_W_XS, VEX_L_W_XD, VEX_L_W_OPSIZE inherit from all of their non-L or non-W cases.
Increase ranking on VEX_L_W, VEX_L_W_XS, VEX_L_W_XD, VEX_L_W_OPSIZE so they get chosen over non-L/non-W forms.

llvm-svn: 191649
2013-09-30 02:46:36 +00:00
Craig Topper
ef2cf025cd Remove alignment restrictions from FMA load folding.
llvm-svn: 191136
2013-09-21 05:58:59 +00:00
Craig Topper
58b9662000 Simplify nested strconcats in X86 td files since strconcat can take more than 2 arguments.
llvm-svn: 172379
2013-01-14 07:46:34 +00:00
Craig Topper
152bee45fa Mark all the _REV instructions as not having side effects. They aren't really emitted by the backend, but it reduces the number of instructions in the output files with unmodelled side effects to make auditing easier.
llvm-svn: 171118
2012-12-26 21:30:22 +00:00
Craig Topper
f0d2332d86 Fix execution domain for packed FMA4 instructions.
llvm-svn: 168417
2012-11-21 08:08:21 +00:00
Craig Topper
7c37abcace Add explicit VEX_L tags to all 256-bit instructions. This will allow us to remove code from the code emitters that examined operands to set the L-bit.
llvm-svn: 164202
2012-09-19 06:06:34 +00:00
Craig Topper
2e53378ff6 Mark FMA4 instructions as commutable and add them to the folding tables.
llvm-svn: 163035
2012-08-31 23:10:34 +00:00
Craig Topper
917333c8c7 Mark FMA3 instructions as commutable so that the operands to the multiply part can be commuted.
llvm-svn: 163001
2012-08-31 16:31:13 +00:00
Craig Topper
6bb3145d0d Add support for converting llvm.fma to fma4 instructions.
llvm-svn: 162999
2012-08-31 15:40:30 +00:00
Craig Topper
aa2444a397 Convert FMA4 patterns to use target specific nodes instead of intrinsics to align with FMA3.
llvm-svn: 162829
2012-08-29 07:18:25 +00:00
Jakob Stoklund Olesen
48bb81b28a Remove more mayLoad workarounds.
llvm-svn: 162556
2012-08-24 14:43:22 +00:00
Craig Topper
aa57ba3944 Custom lower FMA intrinsics to target specific nodes and remove the patterns.
llvm-svn: 162534
2012-08-24 04:03:22 +00:00
Craig Topper
e432edabf1 Cleanup the scalar FMA3 definitions. Add patterns to fold loads with scalar forms.
llvm-svn: 162260
2012-08-21 07:11:11 +00:00
Craig Topper
2e63b3ea18 Merge FMA3 instructions with and without patterns into single classes using null_frag.
llvm-svn: 162257
2012-08-21 05:56:45 +00:00
Craig Topper
77406bef3b Remove FMA3 intrinsic instructions in favor of patterns.
llvm-svn: 162194
2012-08-20 06:21:25 +00:00
Craig Topper
64c93f9d07 Use correct intrinsic for 256-bit VFMSUBADDPS.
llvm-svn: 162193
2012-08-20 06:03:04 +00:00