llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-23 04:52:54 +02:00

Author	SHA1	Message	Date
Coby Tayree	801b34c954	[x86][icelake]GFNI galois field arithmetic (GF(2^8)) insns: gf2p8affineinvqb gf2p8affineqb gf2p8mulb Differential Revision: https://reviews.llvm.org/D40373 llvm-svn: 318993	2017-11-26 09:36:41 +00:00
Simon Pilgrim	59a5ef9f69	[X86][SSE] Use (V)PHMINPOSUW for vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions (PR32841) (V)PHMINPOSUW determines the UMIN element in an v8i16 input, with suitable bit flipping it can also be used for SMAX/SMIN/UMAX cases as well. This patch matches vXi16 SMAX/SMIN/UMAX/UMIN horizontal reductions and reduces the input down to a v8i16 vector before calling (V)PHMINPOSUW. A later patch will use this for v16i8 reductions as well (PR32841). Differential Revision: https://reviews.llvm.org/D39729 llvm-svn: 318917	2017-11-23 13:50:27 +00:00
Craig Topper	36a95a03e9	[X86] Lower all ISD::MGATHER nodes to X86ISD:MGATHER. Now we consistently represent the mask result without relying on isel ignoring it. We now have a more general SDNode and type constraints to represent these nodes in isel patterns. This allows us to present both both vXi1 and XMM/YMM mask types with a single set of constraints. llvm-svn: 318821	2017-11-22 07:11:03 +00:00
Craig Topper	8295be1621	[X86] Allow vpclmulqdq instructions to be commuted during isel to allow load folding. The commuting patterns for the AVX version actually still had priority over the new patterns. llvm-svn: 318800	2017-11-21 21:05:21 +00:00
Coby Tayree	836d1e6a37	[x86][icelake]vpclmulqdq introduction an icelake promotion of pclmulqdq Differential Revision: https://reviews.llvm.org/D40101 llvm-svn: 318741	2017-11-21 09:30:33 +00:00
Coby Tayree	48de83a1a7	[x86][icelake]VAES introduction an icelake promotion of AES Differential Revision: https://reviews.llvm.org/D40078 llvm-svn: 318740	2017-11-21 09:11:41 +00:00
Mohammed Agabaria	059fc817b2	[LV][X86] Support of AVX2 Gathers code generation and update the LV with this This patch depends on: https://reviews.llvm.org/D35348 Support of pattern selection of masked gathers of AVX2 (X86\AVX2 code gen) Update LoopVectorize to generate gathers for AVX2 processors. Reviewers: delena, zvi, RKSimon, craig.topper, aaboud, igorb Reviewed By: delena, RKSimon Differential Revision: https://reviews.llvm.org/D35772 llvm-svn: 318641	2017-11-20 08:18:12 +00:00
Craig Topper	64463b58e5	[X86] Fix SQRTSS/SQRTSD/RCPSS/RCPSD intrinsics to use sse_load_f32/sse_load_f64 to increase load folding opportunities. llvm-svn: 318016	2017-11-13 05:25:24 +00:00
Craig Topper	a08d3b7667	[X86] Use EVEX encoded VRNDSCALE instructions to implement the legacy round intrinsics. The VRNDSCALE instructions implement a superset of the (V)ROUND instructions. They are equivalent if the upper 4-bits of the immediate are 0. This patch lowers the legacy intrinsics to the VRNDSCALE ISD node and masks the upper bits of the immediate to 0. This allows us to take advantage of the larger register encoding space. We should maybe consider converting VRNDSCALE back to VROUND in the EVEX to VEX pass if the extended registers are not being used. I notice some load folding opportunities being missed for the VRNDSCALESS/SD instructions that I'll try to fix in future patches. llvm-svn: 318008	2017-11-13 02:03:00 +00:00
Craig Topper	d9e3d1df1d	[X86] Use vrndscaleps/pd for 128/256 ffloor/ftrunc/fceil/fnearbyint/frint when avx512vl is enabled. This matches what we do for scalar and 512-bit types. llvm-svn: 317991	2017-11-11 21:44:51 +00:00
Craig Topper	eea0adbe11	[X86] Correct the execution domain on ROUND/VROUND instructions. llvm-svn: 317968	2017-11-11 02:26:05 +00:00
Craig Topper	ce5298b157	[X86] Remove the default for one of the arguments to some tablegen multiclasses. NFC No one ever uses this default and probably shouldn't since it sets the execution domain to generic. llvm-svn: 317967	2017-11-11 02:26:02 +00:00
Craig Topper	7c58c91da4	[X86] Allow legacy vcvtps2ph intrinsics to select EVEX encoded instructions. Rely on EVEX->VEX to convert back. Missed store folding opportunities will be fixed in a subsequent commit. llvm-svn: 317661	2017-11-08 04:00:30 +00:00
Craig Topper	e479609a83	[X86] Add patterns for folding a v16i8 with the VEX vcvtph2ps intrinsics. Disable the peephole pass to prove that the pattern is working. llvm-svn: 317547	2017-11-07 07:13:06 +00:00
Craig Topper	3639dfdc1a	[X86] Add support for using EVEX instructions for the legacy vcvtph2ps intrinsics. Looks like there's some missed load folding opportunities for i64 loads. llvm-svn: 317544	2017-11-07 07:13:03 +00:00
Craig Topper	2f44b08ea5	[X86] Use IMPLICIT_DEF in VEX/EVEX vcvtss2sd/vcvtsd2ss patterns instead of a COPY_TO_REGCLASS. ExeDepsFix pass should take care of making the registers match. llvm-svn: 317542	2017-11-07 04:44:22 +00:00
Craig Topper	49531839f0	[X86] Remove 'Requires' from instructions with no patterns. NFC llvm-svn: 317541	2017-11-07 04:44:21 +00:00
Craig Topper	432e7c9ea1	[X86] Use EVEX encoded instructions for legacy scalar sqrt intrinsics. Fixes PR35161. llvm-svn: 317445	2017-11-06 04:04:01 +00:00
Craig Topper	d2db85fa05	[X86] Remove some more RCP and RSQRT patterns from InstrAVX512.td that I missed in r317413. llvm-svn: 317441	2017-11-05 21:14:05 +00:00
Craig Topper	57953473b7	[X86] Fix outdated comment. NFC llvm-svn: 317440	2017-11-05 21:14:04 +00:00
Craig Topper	6175484ebe	[X86] Don't use RCP14 and RSQRT14 for reciprocal estimations or for legacy SSE rcp/rsqrt intrinsics when AVX512 features are enabled. Summary: AVX512 added RCP14 and RSQRT instructions which improve accuracy over the legacy RCP and RSQRT instruction, but not enough accuracy to remove the need for a Newton Raphson refinement. Currently we use these new instructions for the legacy packed SSE instrinics, but not the scalar instrinsics. And we use it for fast math optimization of division and reciprocal sqrt. I think switching the legacy instrinsics maybe surprising to the user since it changes the answer based on which processor you're using regardless of any fastmath settings. It's also weird that we did something different between scalar and packed. As far at the reciprocal estimation, I think it creates unnecessary deltas in our output behavior (and prevents EVEX->VEX). A little playing around with gcc and icc and godbolt suggest they don't change which instructions they use here. This patch adds new X86ISD nodes for the RCP14/RSQRT14 and uses those for the new intrinsics. Leaving the old intrinsics to use the old instructions. Going forward I think our focus should be on -Supporting 512-bit vectors, which will have to use the RCP14/RSQRT14. -Using RSQRT28/RCP28 to remove the Newton Raphson step on processors with AVX512ER -Supporting double precision. Reviewers: zvi, DavidKreitzer, RKSimon Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39583 llvm-svn: 317413	2017-11-04 18:26:41 +00:00
Craig Topper	2760c7242a	[X86] Give AVX512VL instructions priority over their AVX equivalents. I thought we had gotten all these priority bugs worked out, but I guess not. llvm-svn: 317283	2017-11-02 23:23:37 +00:00
Simon Pilgrim	08fead44ad	[X86][SSE] Remove AssertZext stage from PEXTRW/PEXTRB lowering. NFCI. Remove AssertZext and instead add PEXTRW/PEXTRB support to computeKnownBitsForTargetNode to simplify instruction selection. Differential Revision: https://reviews.llvm.org/D39169 llvm-svn: 316336	2017-10-23 16:00:57 +00:00
Craig Topper	547a7eb13c	[X86] Add VEX_WIG to VROUNDSSrr/VROUNDSSrm/VROUNDSDrr/VROUNDSDrm llvm-svn: 316283	2017-10-22 06:18:20 +00:00
Craig Topper	fe21921783	[X86] Add patterns for vzmovl+cvtpd2dq/cvttpd2dq with a load. llvm-svn: 315802	2017-10-14 07:04:48 +00:00
Craig Topper	9c365407c9	[X86] Add patterns for vzmovl+cvtpd2ps with a load. llvm-svn: 315800	2017-10-14 05:55:42 +00:00
Craig Topper	071b231d00	[X86] Add an additional isel pattern to CVTDQ2PDrm/VCVTDQ2PDrm to enable load folding without the peephole pass. This pattern is already used in AVX512VL version of these instructions. Though AVX512VL version is missing other patterns. llvm-svn: 315794	2017-10-14 04:18:06 +00:00
Craig Topper	97770df374	[X86] Use X86ISD::VBROADCAST in place of v2f64 X86ISD::MOVDDUP when AVX2 is available This is particularly important for AVX512VL where we are better able to recognize the VBROADCAST loads to fold with other operations. For AVX512VL we now use X86ISD::VBROADCAST for all of the patterns and remove the 128-bit X86ISD::VMOVDDUP. We may be able to use this for AVX1 as well which would allow us to remove more isel patterns. I also had to add X86ISD::VBROADCAST as a node to call combineShuffle for so that we treat it similar to X86ISD::MOVDDUP. Differential Revision: https://reviews.llvm.org/D38836 llvm-svn: 315768	2017-10-13 21:56:48 +00:00
Craig Topper	805f2f3469	[X86] Add broadcast patterns that allow a scalar_to_vector between the broadcast and the load. We already have these patterns for AVX512VL, but not AVX1 or 2. llvm-svn: 315382	2017-10-10 22:40:31 +00:00
Craig Topper	61bc7dca24	[X86] Fix some patterns that select VLX instructions, but were incorrectly also checking presence of BWI instructions. The EVEX->VEX pass probably obscures this. llvm-svn: 315365	2017-10-10 21:07:14 +00:00
Craig Topper	20e1cfd79a	[X86] Prefer MOVSS/SD over BLENDI during legalization. Remove BLENDI versions of scalar arithmetic patterns Summary: We currently disable some converting of shuffles to MOVSS/MOVSD during legalization if SSE41 is enabled. But later during shuffle combining we go back to prefering MOVSS/MOVSD. Additionally we have patterns that look for BLENDIs to detect scalar arithmetic operations. I believe due to the combining using MOVSS/MOVSD these are unnecessary. Interestingly, we still codegen blend instructions even though lowering/isel emit movss/movsd instructions. Turns out machine CSE commutes them to blend, and then commuting those blends back into blends that are equivalent to the original movss/movsd. This patch fixes the inconsistency in legalization to prefer MOVSS/MOVSD. The one test change was caused by this change. The problem is that we have integer types and are mostly selecting integer instructions except for the shufps. This shufps forced the execution domain, but the vpblendw couldn't have its domain changed with a naive instruction swap. We could fix this by special casing VPBLENDW based on the immediate to widen the element type. The rest of the patch is removing all the excess scalar patterns. Long term we should probably add isel patterns to make MOVSS/MOVSD emit blends directly instead of relying on the double commute. We may also want to consider emitting movss/movsd for optsize. I also wonder if we should still use the VEX encoded blendi instructions even with AVX512. Blends have better throughput, and that may outweigh the register constraint. Reviewers: RKSimon, zvi Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38023 llvm-svn: 315181	2017-10-08 16:57:23 +00:00
Ayman Musa	54d6dc952e	[X86] Add new attribute to X86 instructions to enable marking them as "not memory foldable" This attribute will be used in a tablegen backend that generated the X86 memory folding tables which will be added in a future pass. Instructions with this attribute unset will be excluded from the full set of X86 instructions available for the pass. Differential Revision: https://reviews.llvm.org/D38027 llvm-svn: 315171	2017-10-08 08:32:56 +00:00
Craig Topper	cd20fc982b	[X86] Redefine MOVSS/MOVSD instructions to take VR128 regclass as input instead of FR32/FR64 This patch redefines the MOVSS/MOVSD instructions to take VR128 as its second input. This allows the MOVSS/SD->BLEND commute to work without requiring a COPY to be inserted. This should fix PR33079 Overall this looks to be an improvement in the generated code. I haven't checked the EXPENSIVE_CHECKS build but I'll do that and update with results. Differential Revision: https://reviews.llvm.org/D38449 llvm-svn: 314914	2017-10-04 17:20:12 +00:00
Craig Topper	561ca5cd68	[AVX-512] Add patterns to make fp compare instructions commutable during isel. llvm-svn: 314598	2017-09-30 17:02:39 +00:00
Craig Topper	6c77770b5b	[X86] Remove isel checks for immediate size on floating point compare and xop compare instructions. NFCI If these checks fail we end up not selecting an instruction at all. So we are already relying on the immediate being checked upstream of isel. So doing the check in isel is just bloat to the isel table. Interestingly, we didn't check on the AVX512 version of the instructions anyway. llvm-svn: 313724	2017-09-20 06:38:41 +00:00
Craig Topper	5cd79a5b17	[X86] Remove the X86ISD::MOVLHPD. Lowering doesn't use it and it's not a real instruction. It was used in patterns, but we had the exact same patterns with Unpckl as well. So now just use Unpckl in the instruction patterns. llvm-svn: 313506	2017-09-18 00:20:53 +00:00
Craig Topper	c4365274b5	[X86] Synchronize a pattern between SSE1 and AVX/AVX512. For some reason the SSE1 pattern expected a X86Movlhps pattern to have a v4f32 type, but AVX and AVX512 expected it to have a v4i32 type. I'm not even sure this pattern is even reachable post SSE1, but I'm starting with fixing this obvious bug. llvm-svn: 313495	2017-09-17 18:59:32 +00:00
Craig Topper	0683698331	[X86] Colocate all of the X86VBroadcast patterns for v2i64 and v2f64. NFC The memory patterns were near the MOVDDUP definition, but the non-memory patterns were near the broadcast instructions. llvm-svn: 313494	2017-09-17 18:59:30 +00:00
Craig Topper	eff0aa5cbe	[X86] Remove patterns for X86Movddup with v4i64 type. Lowering doesn't emit these. llvm-svn: 313493	2017-09-17 18:59:28 +00:00
Craig Topper	50e9e8ffcb	[X86] Remove isel patterns for X86Movhlps and X86Movlhps with integer types. Lowering doesn't emit these. llvm-svn: 313492	2017-09-17 18:59:26 +00:00
Craig Topper	18143e1869	[X86] Remove isel patterns for movlpd/movlps with integer types. Lowering doesn't emit these. llvm-svn: 313491	2017-09-17 18:59:24 +00:00
Craig Topper	c9df7024c7	[X86] Remove integer X86ISD::SHUFP patterns. Lowering doesn't emit these. llvm-svn: 313477	2017-09-17 06:09:32 +00:00
Craig Topper	2a324415cd	[X86] Add patterns to make blends with immediate control commutable during isel for load folding. llvm-svn: 313476	2017-09-17 05:06:05 +00:00
Craig Topper	d1020c20b9	[X86] Remove some unused defaults from some multiclass parameters. llvm-svn: 313475	2017-09-17 05:06:03 +00:00
Craig Topper	16eca3ad0c	[X86] Make PLCMULQDQ instructions commutable during isel to fold loads. This adds new patterns and SDNodeXForm to enable the immediate to commuted. llvm-svn: 313472	2017-09-16 23:18:50 +00:00
Craig Topper	0b5789f3cc	[X86] Add isel patterns to be able to fold loads into VPERM2F128 even when the load is on the first input to the SDNode. We just need to toggle bits 1 and 5 of the immediate and swap the sources. The peephole pass could trigger commuting/folding for this later, but its easy enough to fix in isel. Disable the peephole pass on the main vperm2x128 test so we know we're doing this through isel. llvm-svn: 313455	2017-09-16 09:16:48 +00:00
Craig Topper	7542d6cd2e	[X86] Remove VPERM2X128 isel patterns with 32-bit elements. Now that the intrinsics are gone we only need 64-bit elements since that's what shuffle lowering uses. llvm-svn: 313453	2017-09-16 08:15:52 +00:00
Craig Topper	44c5591029	[X86] Force shuffle lowering to only create X86ISD::VPERM2X128 with 64-bit element types so we can remove some patterns from isel. Intrinsic handling is still creating these nodes with 32-bit elements as well. But at least this gets rid of 8 and 16. Ideally, someday we'll convert the intrinsics to generic vector shuffles and remove the intrinsics. llvm-svn: 312702	2017-09-07 06:11:10 +00:00
Craig Topper	4a0c76faea	[X86] Remove patterns for selecting a v8f32 X86ISD::MOVSS or v4f64 X86ISD::MOVSD. I don't think we ever generate these. If we did, I would expect we would also be able to generate v16f32 and v8f64, but we don't have those patterns. llvm-svn: 312694	2017-09-07 05:08:16 +00:00
Craig Topper	ee95b66a3a	[X86] Move more isel patterns to X86InstrVecCompiler.td. NFC This moves more of our subvector insert/extract tricks to X86InstrVecCompiler.td and refactors them into multiclasses. llvm-svn: 312661	2017-09-06 19:03:55 +00:00

1 2 3 4 5 ...

1503 Commits