1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 21:42:54 +02:00
Commit Graph

212 Commits

Author SHA1 Message Date
Simon Pilgrim
eae94c8d81 [CostModel][X86] Fixed vXi8 uniform shift costs.
The 'fast' costs should only work for shifts by uniform constants (uniform non-constant are lowered using the slow default implementation).

Logical shifts were not taking into account that we must mask the psrlw result, so the costs needed to be doubled.

Added missing AVX2/AVX512BW costs as well.

llvm-svn: 291391
2017-01-08 14:14:36 +00:00
Simon Pilgrim
ff884adfb6 [CostModel][X86] Moved legal uniform shift costs earlier.
XOP was prematurely matching, doubling the cost of ashr/lshr uniform shifts.

llvm-svn: 291390
2017-01-08 13:12:03 +00:00
Simon Pilgrim
ca989f4346 [CostModel][X86] Update SSE41/AVX1 vXi32 SHL costs
SSE41 provides pmulld which allows the simpler pslld/paddd/cvttps2dq/pmulld pattern than SSE2's use of pmuludq.

llvm-svn: 291372
2017-01-07 22:27:43 +00:00
Simon Pilgrim
c8cdf126d3 [CostModel][X86] Fix AVX2 v16i16 shift 'splat' costs.
llvm-svn: 291366
2017-01-07 22:08:09 +00:00
Simon Pilgrim
269a611dc0 [CostModel][X86] Match 256-bit vector shift 'splat' costs for AVX2 and above
We were matching against general vector shift costs before the uniform splat costs

llvm-svn: 291365
2017-01-07 21:47:10 +00:00
Simon Pilgrim
5eef6afa17 [CostModel][AVX512BW] Add v32i16 vector shift costs for avx512bw targets.
llvm-svn: 291354
2017-01-07 17:54:10 +00:00
Simon Pilgrim
271b5ff570 [X86][AVX512] Use lowerShuffleAsRepeatedMaskAndLanePermute for non-VBMI v64i8 shuffles (PR31470)
llvm-svn: 291347
2017-01-07 15:37:50 +00:00
Simon Pilgrim
b938c434f9 [CostModel][X86] Add AVX512 and 512-bit vector shift cost tests.
llvm-svn: 291269
2017-01-06 19:41:26 +00:00
Chad Rosier
2e3e81f00b [AArch64] Reduce vector insert/extract cost for Falkor.
Differential Revision: https://reviews.llvm.org/D28403

llvm-svn: 291254
2017-01-06 18:03:26 +00:00
Simon Pilgrim
f29ec731bd [CostModel][X86] Fix 512-bit SDIV/UDIV 'big' costs.
Set the costs on the lowest target that supports the type.

llvm-svn: 291229
2017-01-06 11:12:53 +00:00
Simon Pilgrim
6597231ee4 [CostModel][X86] Add SDIV/UDIV cost tests for a wider range of targets
Added a test demonstrating bug in AVX512 division costs

llvm-svn: 291228
2017-01-06 11:02:40 +00:00
Simon Pilgrim
87a1dcdf6d [CostModel][X86] Include the cost of 256-bit upper subvector extract/insertion in AVX1 v4i64 MUL
Matches other MUL/ADD/SUB 256-bit case on AVX1

llvm-svn: 291149
2017-01-05 18:20:25 +00:00
Chad Rosier
54356c6a0e [AArch64][CostModel] Add coverage for bswap intrinsics.
llvm-svn: 291140
2017-01-05 16:55:32 +00:00
Simon Pilgrim
e00b6e9f42 [CostModel][X86] Add support for broadcast shuffle costs
Currently only for broadcasts with input and output of the same width.

Differential Revision: https://reviews.llvm.org/D27811

llvm-svn: 291122
2017-01-05 15:56:08 +00:00
Chad Rosier
2caaab6e8f [AArch64] Remove mcpu option as this test is not target specific. NFC.
llvm-svn: 291117
2017-01-05 15:05:03 +00:00
Chad Rosier
4891053009 [AArch64] Remove unused arguments from tests. NFC.
llvm-svn: 291112
2017-01-05 14:48:53 +00:00
Simon Pilgrim
f1fa399ee0 [CostModel][X86] Updated vXi8 and vXi16 Reverse/Alternate shuffle costs
Actual codegen is much better than the extract+insert patterns that was assumed.

llvm-svn: 290962
2017-01-04 14:01:33 +00:00
Elena Demikhovsky
49038bfad5 Fixed shuffle-reverse cost on AVX-512.
(This changed was approved in https://reviews.llvm.org/D28118, but Simon asked to submit it separately).

llvm-svn: 290812
2017-01-02 11:44:10 +00:00
Elena Demikhovsky
ac6dc0e1f0 AVX-512 Loop Vectorizer: Cost calculation for interleave load/store patterns.
X86 target does not provide any target specific cost calculation for interleave patterns.It uses the common target-independent calculation, which gives very high numbers. As a result, the scalar version is chosen in many cases. The situation on AVX-512 is even worse, since we have 3-src shuffles that significantly reduce the cost.

In this patch I calculate the cost on AVX-512. It will allow to compare interleave pattern with gather/scatter and choose a better solution (PR31426).

* Shiffle-broadcast cost will be changed in Simon's upcoming patch.

Differential Revision: https://reviews.llvm.org/D28118

llvm-svn: 290810
2017-01-02 10:37:52 +00:00
Simon Pilgrim
2882ed164e [X86][SSE] Improve lowering of vXi64 multiplies
As mentioned on PR30845, we were performing our vXi64 multiplication as:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi, 32)+ psllqi(AhiBlo, 32);

when we could avoid one of the upper shifts with:

AloBlo = pmuludq(a, b);
AloBhi = pmuludq(a, psrlqi(b, 32));
AhiBlo = pmuludq(psrlqi(a, 32), b);
return AloBlo + psllqi(AloBhi + AhiBlo, 32);

This matches the lowering on gcc/icc.

Differential Revision: https://reviews.llvm.org/D27756

llvm-svn: 290267
2016-12-21 20:00:10 +00:00
Matthew Simpson
bf784fec18 [AArch64] Guard Misaligned 128-bit store penalty by subtarget feature
This patch checks that the SlowMisaligned128Store subtarget feature is set
when penalizing such stores in getMemoryOpCost.

Differential Revision: https://reviews.llvm.org/D27677

llvm-svn: 289845
2016-12-15 18:36:59 +00:00
Simon Pilgrim
97b5f6a46b [CostModel][X86] Updated reverse shuffle costs
llvm-svn: 289819
2016-12-15 14:24:07 +00:00
Simon Pilgrim
6219e36a4f [CostModel] Fix long standing bug with reverse shuffle mask detection
Incorrect 'undef' mask index matching meant that broadcast shuffles could be detected as reverse shuffles

llvm-svn: 289811
2016-12-15 12:12:45 +00:00
Simon Pilgrim
d5264e8f7c [CostModel][X86] Add tests for reverse shuffle costs
llvm-svn: 289800
2016-12-15 10:45:53 +00:00
Haicheng Wu
20ce778776 [AArch64] Correct the check of signed 9-bit imm in isLegalAddressingMode()
In the addressing mode, signed 9-bit imm is [-256, 255], not [-512, 511].

Differential Revision: https://reviews.llvm.org/D27480

llvm-svn: 288876
2016-12-07 01:45:04 +00:00
Haicheng Wu
c157a2a55a [TTI/CostModel] Correct the way getGEPCost() calls isLegalAddressingMode()
Fix a bug when we call isLegalAddressingMode() from getGEPCost().

Differential Revision: https://reviews.llvm.org/D27357

llvm-svn: 288569
2016-12-03 01:57:24 +00:00
Guozhi Wei
27571a5d37 [ppc] Correctly compute the cost of loading 32/64 bit memory into VSR
VSX has instructions lxsiwax/lxsdx that can load 32/64 bit value into VSX register cheaply. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.

Differential Revision: https://reviews.llvm.org/D26713

llvm-svn: 288560
2016-12-03 00:41:43 +00:00
Alexey Bataev
7135b5ad24 [SLP] Fixed cost model for horizontal reduction.
Currently when cost of scalar operations is evaluated the vector type is
used for scalar operations. Patch fixes this issue and fixes evaluation
of the vector operations cost.
Several test showed that vector cost model is too optimistic. It
allowed vectorization of 8 or less add/fadd operations, though scalar
code is faster. Actually, only for 16 or more operations vector code
provides better performance.

Differential Revision: https://reviews.llvm.org/D26277

llvm-svn: 288398
2016-12-01 18:42:42 +00:00
Alexey Bataev
68437dbc6c [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288377
2016-12-01 17:26:54 +00:00
Alexey Bataev
ef66b3c144 Revert "[SLP] Additional tests with the cost of vector operations."
This reverts commit a61718435fc4118c82f8aa6133fd81f803789c1e.

llvm-svn: 288371
2016-12-01 16:45:04 +00:00
Alexey Bataev
7f4c68a45a [SLP] Additional tests with the cost of vector operations.
llvm-svn: 288369
2016-12-01 16:11:48 +00:00
Simon Pilgrim
b2804b00f4 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Simon Pilgrim
5df905e2e1 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim
1f99fea9ae [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Simon Pilgrim
648b765007 [CostModel][X86] Add v2f32 -> v2i64 fptosi/fptoui cost tests
llvm-svn: 287756
2016-11-23 11:43:00 +00:00
Simon Pilgrim
b94dcea1ae [CostModel][X86] Updated sitofp/uitofp scalar/vector cost tests
Better coverage of all legal types + special cases.

Removed old fptoui tests which are all handled in fptoui.ll

llvm-svn: 287678
2016-11-22 18:55:49 +00:00
Craig Topper
d5ba7319d4 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Simon Pilgrim
50c2f1dd68 [CostModel][X86] Added mul costs for vXi8 vectors
More realistic v16i8/v32i8/v64i8 MUL costs - we have to extend to vXi16, use PMULLW and then truncate the result

llvm-svn: 286838
2016-11-14 15:54:24 +00:00
Simon Pilgrim
bf629ee6cc [X86][AVX] Fixed v16i16/v32i8 ADD/SUB costs on AVX1 subtargets
Add explicit v16i16/v32i8 ADD/SUB costs, matching the costs of v4i64/v8i32 - they were missing for some reason.

This has side effects on the LV max bandwidth tests (AVX1 now prefers 128-bit vectors vs AVX2 which still prefers 256-bit)

llvm-svn: 286832
2016-11-14 14:45:16 +00:00
Simon Pilgrim
f59c806c0f [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Alexey Bataev
40602a91a6 Improved cost model for FDIV and FSQRT, by Andrew Tischenko
There is a bug describing poor cost model for floating point operations:
Bug 29083 - [X86][SSE] Improve costs for floating point operations. This
patch is the second one in series of patches dealing with cost model.

Differential Revision: https://reviews.llvm.org/D25722

llvm-svn: 285564
2016-10-31 12:10:53 +00:00
Simon Pilgrim
68d206a9d2 [X86][AVX512] Fix MUL v8i64 costs on non-AVX512DQ targets
llvm-svn: 285329
2016-10-27 18:32:06 +00:00
Simon Pilgrim
268d797e5a [X86][AVX512DQ] Improve lowering of MUL v2i64 and v4i64
With DQI but without VLX, lower v2i64 and v4i64 MUL operations with v8i64 MUL (vpmullq).

Updated cost table accordingly.

Differential Revision: https://reviews.llvm.org/D26011

llvm-svn: 285304
2016-10-27 15:27:00 +00:00
Simon Pilgrim
464fee80a0 [CostModel][X86] Added tests for current integer signed/unsigned remainder costs
llvm-svn: 284940
2016-10-23 18:35:02 +00:00
Simon Pilgrim
9a5c4b864b [X86][SSE] Add SSE41/AVX1 costs for vector shifts.
We were defaulting to SSE2 costs which weren't taking into account the availability of PBLENDW/PBLENDVB to improve merging of per-element shift results.

llvm-svn: 284939
2016-10-23 16:49:04 +00:00
Simon Pilgrim
6405c6dd36 [CostModel][X86] Added tests for current integer trunc costs
llvm-svn: 284938
2016-10-23 15:17:52 +00:00
Simon Pilgrim
6773ac2510 [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv uniformconst costs for 256/512 bit integer vectors
We weren't checking for uniform const costs before the general cost, resulting in very high estimates.

llvm-svn: 284755
2016-10-20 18:00:35 +00:00
Simon Pilgrim
aaf8c094e3 [CostModel][X86] Added tests for sdiv/udiv costs for uniform const and uniform const power-of-2
Shows poor costings in AVX1/AVX512BW for certain vector types

llvm-svn: 284748
2016-10-20 17:16:38 +00:00
Simon Pilgrim
9091f77856 [CostModel][X86] Fixed AVX1/AVX512 sdiv/udiv general costs for 256/512 bit integer vectors
We weren't accounting for legal types on every subtarget, meaning that many of the costs were using defaults.

We still don't correctly cost (or test) the 512-bit sdiv/udiv by uniform const cases, nor the power-of-2 cases.

llvm-svn: 284744
2016-10-20 16:39:11 +00:00
Simon Pilgrim
696a79808d [CostModel][X86] Added tests for sdiv/udiv costs for scalar and 128/256/512 bit integer vectors
Shows current bug in AVX1/AVX512BW costs for 256 bit vector types

llvm-svn: 284723
2016-10-20 12:34:00 +00:00