1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-25 22:12:57 +02:00
Commit Graph

394 Commits

Author SHA1 Message Date
Elena Demikhovsky
7d3b86db52 AVX-512: Added VBROADCASTF64X4, VBROADCASTF64X2, VBROADCASTI32X8, and other instructions from this set
Added encoding tests.

llvm-svn: 237557
2015-05-18 06:42:57 +00:00
Elena Demikhovsky
1e28397b5a AVX-512: fixed a bug in mask operations - (i1 1) pattern
Filling k-reg with all-ones value was wrong,
(i1 1) should switch on only one bit in mask register

llvm-svn: 237536
2015-05-17 07:28:51 +00:00
Elena Demikhovsky
0803046ed4 AVX-512: fixed a bug in encoding of VPSRAQ instrcution,
added a bunch of encoding tests.

llvm-svn: 237232
2015-05-13 07:35:05 +00:00
Elena Demikhovsky
f25b492812 AVX-512: Added SKX instructions and intrinsics:
{add/sub/mul/div/} x {ps/pd} x {128/256} 2. max/min with sae

By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236971
2015-05-11 06:05:05 +00:00
Elena Demikhovsky
1ed7ba869f AVX-512: fixed a bug in i1 vectors lowering
llvm-svn: 236947
2015-05-10 10:33:32 +00:00
Elena Demikhovsky
28f6bb84a5 AVX-512: Added all forms of FP compare instructions for KNL and SKX.
Added intrinsics for the instructions. CC parameter of the intrinsics was changed from i8 to i32 according to the spec.

By Igor Breger (igor.breger@intel.com)

llvm-svn: 236714
2015-05-07 11:24:42 +00:00
Elena Demikhovsky
16b6cc68cf AVX-512: added calling convention for i1 vectors in 32-bit mode.
Fixed some bugs in extend/truncate for AVX-512 target.
Removed VBROADCASTM (masked broadcast) node, since it is not used any more.

llvm-svn: 236420
2015-05-04 12:40:50 +00:00
Elena Demikhovsky
40362f45c8 AVX-512: added integer "add" and "sub" instructions with saturation for SKX
with intrinsics and tests

by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236418
2015-05-04 12:35:55 +00:00
Elena Demikhovsky
5b00c277f4 AVX-512: Added VPACK* instructions forms for KNL and SKX
and their intrinsics
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 236414
2015-05-04 09:14:02 +00:00
Elena Demikhovsky
201b5c4641 Masked gather and scatter - added DAGCombine visitors
and AVX-512 instruction selection patterns.
All other patches, including tests will follow.

http://reviews.llvm.org/D7665

llvm-svn: 236211
2015-04-30 08:38:48 +00:00
Elena Demikhovsky
3485573818 AVX-512: Extend/Truncate operations for SKX,
SETCC for bit-vectors

llvm-svn: 235875
2015-04-27 12:57:59 +00:00
Elena Demikhovsky
13b5e09c11 AVX-512: Added VPMOVx2M instructions for SKX,
fixed encoding of VPMOVM2x.

llvm-svn: 235385
2015-04-21 14:38:31 +00:00
Elena Demikhovsky
61a239b83c AVX-512: Added VPTESTM and VPTESTNM instructions for SKX
llvm-svn: 235383
2015-04-21 13:13:46 +00:00
Elena Demikhovsky
abf0138a81 AVX-512: Added logical and arithmetic instructions for SKX
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 235375
2015-04-21 10:27:40 +00:00
Elena Demikhovsky
74d944b41a AVX-512: intrinsics for VPADD, VPMULDQ and VPSUB
by Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233906
2015-04-02 10:51:40 +00:00
Elena Demikhovsky
0e38b477c5 AVX-512: blank lines, duplicated tests, no functional changes
see comments http://reviews.llvm.org/D6835

llvm-svn: 233528
2015-03-30 09:29:28 +00:00
Elena Demikhovsky
16b58bef45 AVX-512: Fixed the "commutative" property flag in VPANDN instruction
By Asaf Badouh (asaf.badouh@intel.com)

llvm-svn: 233489
2015-03-29 09:14:29 +00:00
Elena Demikhovsky
e21b4ac3e7 AVX-512: Added encoding tests for VPROR, VPROL instructions,
fixed opcode.

llvm-svn: 232018
2015-03-12 07:28:41 +00:00
Elena Demikhovsky
320252ae4d AVX-512: Added SKX forms of shift instructions.
Added rotation instructions, encoding only.
Added encoding tests for all these forms.

llvm-svn: 231916
2015-03-11 10:25:42 +00:00
Elena Demikhovsky
a71b2e475e AVX-512, SKX: Enabled masked_load/store operations for this target.
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.

llvm-svn: 231371
2015-03-05 15:11:35 +00:00
Elena Demikhovsky
04be7be81d AVX-512: Moved patterns for masked load/store under avx_store, avx_load classes.
No functional changes.

llvm-svn: 231069
2015-03-03 15:03:35 +00:00
Elena Demikhovsky
769ec279fb AVX-512: Simplified MOV patterns, no functional changes.
llvm-svn: 230954
2015-03-02 12:46:21 +00:00
Craig Topper
60ae8a07f3 [X86] Fix diassembler crash on AVX512 cmpps/cmppd with immediate that doesn't fit in 5-bits. Fixes PR22743.
llvm-svn: 230924
2015-03-02 00:22:29 +00:00
Elena Demikhovsky
9eda5391f2 Reverted 230471 - gather scatter handling in table gen.
llvm-svn: 230892
2015-03-01 08:23:41 +00:00
Elena Demikhovsky
e032aa37b6 AVX-512: Added mask and rounding mode for scalar arithmetics
Added more tests for scalar instructions to destinguish between AVX and AVX-512 forms.

llvm-svn: 230891
2015-03-01 07:44:04 +00:00
Elena Demikhovsky
0e7ac15634 AVX-512: Gather and Scatter patterns
Gather and scatter instructions additionally write to one of the source operands - mask register.
In this case Gather has 2 destination values - the loaded value and the mask.
Till now we did not support code gen pattern for gather - the instruction was generated from 
intrinsic only and machine node was hardcoded.
When we introduce the masked_gather node, we need to select instruction automatically,
in the standard way.
I added a flag "hasTwoExplicitDefs" that allows to handle 2 destination operands.

(Some code in the X86InstrFragmentsSIMD.td is commented out, just to split one big
patch in many small patches)

llvm-svn: 230471
2015-02-25 09:46:31 +00:00
Elena Demikhovsky
b15d81ba19 AVX-512: recommitted 229837 + bugfix + test
llvm-svn: 230223
2015-02-23 15:12:31 +00:00
Eric Christopher
c93875565e Revert "AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics."
The instructions were being generated on architectures that don't support avx512.

This reverts commit r229837.

llvm-svn: 229942
2015-02-20 00:45:28 +00:00
Eric Christopher
2da0c6081c Add a license header to the AVX512 file.
llvm-svn: 229941
2015-02-20 00:36:53 +00:00
Elena Demikhovsky
41438d50e6 AVX-512: Full implementation for VRNDSCALESS/SD instructions and intrinsics.
llvm-svn: 229837
2015-02-19 10:48:04 +00:00
Elena Demikhovsky
2ae2229fab AVX-512: Added support for FP instructions with embedded rounding mode.
By Asaf Badouh <asaf.badouh@intel.com>

llvm-svn: 229645
2015-02-18 07:59:20 +00:00
Elena Demikhovsky
30ee20b16b AVX-512: changes in intel_ocl_bi calling conventions
- added mask types v8i1 and v16i1 to possible function parameters
- enabled passing 512-bit vectors in standard CC
- added a test for KNL intel_ocl_bi conventions

llvm-svn: 229482
2015-02-17 09:20:12 +00:00
Elena Demikhovsky
d8fd06be73 AVX-512: Fixed the "test" operation for i1 type
Using KORTESTW for comparison i1 value with zero was wrong since the instruction tests 16 bits.
KORTESTW may be used with KSHIFTL+KSHIFTR that clean the 15 upper bits.
I removed (X86cmp i1, 0) pattern and zero-extend i1 to i8 and then use TESTB.

There are some cases where i1 is in the mask register and the upper bits are already zeroed.
Then KORTESTW is the better solution, but it is subject for optimization.
Meanwhile, I'm fixing the correctness issue.

llvm-svn: 228916
2015-02-12 08:40:34 +00:00
Craig Topper
a684f06383 [X86] Remove 'memop' uses from AVX512. Use 'load' instead.
llvm-svn: 228562
2015-02-09 04:04:50 +00:00
Elena Demikhovsky
237f19f35f AVX-512: Added FMA intrinsics with rounding mode
By Asaf Badouh and Elena Demikhovsky

Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.

llvm-svn: 227303
2015-01-28 10:21:27 +00:00
Craig Topper
fdec9f588d [X86] Teach disassembler to handle illegal immediates on AVX512 integer compare instructions.
llvm-svn: 227302
2015-01-28 10:09:56 +00:00
Elena Demikhovsky
6889f421a2 AVX-512: Changes in operations on masks registers for KNL and SKX
- Added KSHIFTB/D/Q for skx
- Added KORTESTB/D/Q for skx
- Fixed store operation for v8i1 type for KNL
- Store size of v8i1, v4i1 and v2i1 are changed to 8 bits

llvm-svn: 227043
2015-01-25 12:47:15 +00:00
Craig Topper
568161290d [X86] Give scalar VRNDSCALE instructions priority in AVX512 mode.
llvm-svn: 227039
2015-01-25 08:49:22 +00:00
Craig Topper
410d43b01b Simplify a multiclass. No functional change.
llvm-svn: 227038
2015-01-25 08:49:19 +00:00
Craig Topper
011934eb9c [X86] Replace i32i8imm on SSE/AVX instructions with i32u8imm which will make the assembler bounds check them. It will also make them print as unsigned.
llvm-svn: 227032
2015-01-25 02:21:16 +00:00
Craig Topper
88aba4703b [X86] Use u8imm in several places that used i32i8imm that don't require an i32 type.
llvm-svn: 227031
2015-01-25 02:21:13 +00:00
Craig Topper
9d9a3e1bb4 [X86] Add IntrNoMem to the AVX512 conflict intrinsics.
llvm-svn: 226897
2015-01-23 06:11:45 +00:00
Craig Topper
70ebe9380b Revert r226798. Guess I missed the patterns.
llvm-svn: 226802
2015-01-22 09:01:20 +00:00
Craig Topper
a72d3b9171 Use u8imm instead of i32i8imm on a couple instructions that have no patterns and thus no reason to use a larger operand size.
llvm-svn: 226798
2015-01-22 08:53:11 +00:00
Craig Topper
120c2e1291 [X86] Remove some unused multiclasses from AVX512 instruction file.
llvm-svn: 226797
2015-01-22 08:53:08 +00:00
Craig Topper
f3ec7f2d5a [X86] Convert all the i8imm used by AVX512 and MMX instructions to u8imm.
llvm-svn: 226646
2015-01-21 08:43:49 +00:00
Craig Topper
8282659ccb [x86] Add assembly parser bounds checking to the immediate value for cmpss/cmpsd/cmpps/cmppd.
llvm-svn: 226642
2015-01-21 06:07:53 +00:00
Craig Topper
4312f7bdd7 [x86] Add some mayLoad/hasSideEffects flags. Remove one that was already covered by a pattern.
llvm-svn: 226562
2015-01-20 12:15:30 +00:00
Craig Topper
3d7b2aaaae [x86] Change AVX512 intrinsics to take a 8-bit immediate for the comparision kind instead of a 32-bit immediate. This better aligns with the emitted instruction. It also matches SSE and AVX1 equivalents. Also add auto upgrade support.
llvm-svn: 226430
2015-01-19 06:07:27 +00:00
Adam Nemet
9fe8c32290 [AVX512] Add intrinsics for masked aligned FP loads and stores
Similar to the unaligned cases.

Test was generated with update_llc_test_checks.py.

Part of <rdar://problem/17688758>

llvm-svn: 226296
2015-01-16 18:50:09 +00:00
Craig Topper
d86e76971b Hide some redundant AVX512 instructions from the asm parser, but force them to show up in the disassembler.
llvm-svn: 226155
2015-01-15 09:37:15 +00:00
Craig Topper
cf2aae6d68 [x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Correctly this time. I did the wrong patterns the first time.
llvm-svn: 224891
2014-12-27 20:08:45 +00:00
Craig Topper
d899e6932a [x86] Prevent instruction selection of AVX512 cmp.ps/pd/ss/sd intrinsics with illegal immediates. Forgot to do this when I did SSE/SSE2/AVX/AVX2.
llvm-svn: 224887
2014-12-27 18:51:06 +00:00
Elena Demikhovsky
bb8ca1f551 AVX-512: Added FMA instructions, intrinsics an tests for KNL and SKX targets
by Asaf Badouh

http://reviews.llvm.org/D6456

llvm-svn: 224764
2014-12-23 10:30:39 +00:00
Elena Demikhovsky
cfbcf5995c AVX-512: BLENDM - fixed encoding of the broadcast version
Added more intrinsics and encoding tests.

llvm-svn: 224760
2014-12-23 09:36:28 +00:00
Elena Demikhovsky
aeb7ff5f14 AVX-512: Added all forms of BLENDM instructions,
intrinsics, encoding tests for AVX-512F and skx instructions.

llvm-svn: 224707
2014-12-22 13:52:48 +00:00
Elena Demikhovsky
744da8554e Masked load and store codegen - fixed 128-bit vectors
The codegen failed on 128-bit types on AVX2.
I added patterns and in td files and tests.

llvm-svn: 224647
2014-12-19 23:27:57 +00:00
Robert Khasanov
8231be9f66 [AVX512] Add a comment for avx512_broadcast_pat multiclass
llvm-svn: 224341
2014-12-16 16:12:11 +00:00
Elena Demikhovsky
51c511a201 AVX-512: Added EXPAND instructions and intrinsics.
llvm-svn: 224241
2014-12-15 10:03:52 +00:00
Robert Khasanov
634dbbea7c [AVX512] Minor fix in lowering pattern for broadcast intrustions.
No functional change.

llvm-svn: 224122
2014-12-12 14:21:30 +00:00
Cameron McInally
a7f40d9986 [AVX512] Add support for 512b variable bit shift intrinsics.
llvm-svn: 224028
2014-12-11 17:13:05 +00:00
Elena Demikhovsky
e879b19906 AVX-512: Added all forms of COMPRESS instruction
+ intrinsics + tests

llvm-svn: 224019
2014-12-11 15:02:24 +00:00
Robert Khasanov
14af293376 [AVX512] Added lowering for VBROADCASTSS/SD instructions.
Lowering patterns were written through avx512_broadcast_pat multiclass as pattern generates VBROADCAST and COPY_TO_REGCLASS nodes.
Added lowering tests.

llvm-svn: 223804
2014-12-09 18:45:30 +00:00
Robert Khasanov
8a2292f8b6 [AVX512] Added VPBROADCAST{BWDQ} (Load with Broadcast Integer Data from General Purpose Register) encodings for AVX512-BW/VL subsets
Added encoding tests.
        

llvm-svn: 223787
2014-12-09 16:38:41 +00:00
Elena Demikhovsky
b30aead98b AVX-512: Added some comments to ERI scalar intrinsics.
No functional change.

llvm-svn: 223761
2014-12-09 07:06:32 +00:00
Elena Demikhovsky
befed29343 Masked Load / Store Intrinsics - the CodeGen part.
I'm recommiting the codegen part of the patch.
The vectorizer part will be send to review again.

Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 223348
2014-12-04 09:40:44 +00:00
Michael Liao
59822d4755 [X86] Clean up whitespace as well as minor coding style
llvm-svn: 223339
2014-12-04 05:20:33 +00:00
Duncan P. N. Exon Smith
73ce6dbb2b Revert "Masked Vector Load and Store Intrinsics."
This reverts commit r222632 (and follow-up r222636), which caused a host
of LNT failures on an internal bot.  I'll respond to the commit on the
list with a reproduction of one of the failures.

Conflicts:
	lib/Target/X86/X86TargetTransformInfo.cpp

llvm-svn: 222936
2014-11-28 21:29:14 +00:00
Elena Demikhovsky
868b76ae69 AVX-512: Scalar ERI intrinsics
including SAE mode and memory operand.
Added AVX512_maskable_scalar template, that should cover all scalar instructions in the future.

The main difference between AVX512_maskable_scalar<> and AVX512_maskable<> is using X86select instead of vselect.
I need it, because I can't create vselect node for MVT::i1 mask for scalar instruction.

http://reviews.llvm.org/D6378

llvm-svn: 222820
2014-11-26 10:46:49 +00:00
Cameron McInally
c32dadfa69 [AVX512] Add 512b integer shift by variable intrinsics and patterns.
llvm-svn: 222786
2014-11-25 20:41:51 +00:00
Craig Topper
3ac6865792 Remove space before tab in all AVX512 mnemonic strings.
llvm-svn: 222778
2014-11-25 20:11:23 +00:00
Elena Demikhovsky
36a2243ab7 Masked Vector Load and Store Intrinsics.
Introduced new target-independent intrinsics in order to support masked vector loads and stores. The loop vectorizer optimizes loops containing conditional memory accesses by generating these intrinsics for existing targets AVX2 and AVX-512. The vectorizer asks the target about availability of masked vector loads and stores.
Added SDNodes for masked operations and lowering patterns for X86 code generator.
Examples:
<16 x i32> @llvm.masked.load.v16i32(i8* %addr, <16 x i32> %passthru, i32 4 /* align */, <16 x i1> %mask)
declare void @llvm.masked.store.v8f64(i8* %addr, <8 x double> %value, i32 4, <8 x i1> %mask)

Scalarizer for other targets (not AVX2/AVX-512) will be done in a separate patch.

http://reviews.llvm.org/D6191

llvm-svn: 222632
2014-11-23 08:07:43 +00:00
Craig Topper
eb88940e3d [x86] Remove two redundant isel patterns. They equivalent already exists in the instruction pattern.
llvm-svn: 222094
2014-11-16 09:24:16 +00:00
Cameron McInally
8882e35937 [AVX512] Add 512b masked integer shift by immediate patterns.
llvm-svn: 222002
2014-11-14 15:43:00 +00:00
Elena Demikhovsky
2450049261 AVX-512: Intrinsics for ERI
3 instructions: vrcp28, vrsqrt28, vexp2, only vector forms.
Intrinsics include SAE (Suppres All Exceptions) parameter.

http://reviews.llvm.org/D6214

llvm-svn: 221774
2014-11-12 07:31:03 +00:00
Robert Khasanov
3e398a3800 [AVX512] Added VBROADCAST{SS/SD} encoding for VL subset.
Refactored through AVX512_maskable
        

llvm-svn: 220908
2014-10-30 14:21:47 +00:00
Robert Khasanov
b26f057244 [AVX512] Implemented AVX512VL FP bnary packed instructions (VADDP*, VSUBP*, VMULP*, VDIVP*, VMAXP*, VMINP*)
Refactored through AVX512_maskable
Added encoding tests for them.

llvm-svn: 220858
2014-10-29 15:43:02 +00:00
Robert Khasanov
4bea47e6eb [AVX512] Fix VSQRT packed instructions internal names.
No functional change

llvm-svn: 220808
2014-10-28 18:22:41 +00:00
Robert Khasanov
2ca56ad410 [AVX512] Extended avx512_sqrt_packed (sqrt instructions) to VL subset.
Refactored through AVX512_maskable

llvm-svn: 220806
2014-10-28 18:15:20 +00:00
Robert Khasanov
d134194df8 [AVX-512] Expanded rsqrt/rcp instructions to VL subset.
Refactored multiclass through AVX512_maskable

llvm-svn: 220783
2014-10-28 16:37:13 +00:00
Robert Khasanov
a87de33c61 [AVX512] Bring back vector-shuffle lowering support through broadcasts
Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code.

Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll):

 define   <16 x i32> @_inreg16xi32(i32 %a) {
 ; CHECK-LABEL: _inreg16xi32:
 ; CHECK:       ## BB#0:
-; CHECK-NEXT:    vpbroadcastd %edi, %zmm0
+; CHECK-NEXT:    vmovd %edi, %xmm0
+; CHECK-NEXT:    vpbroadcastd %xmm0, %ymm0
+; CHECK-NEXT:    vinserti64x4 $1, %ymm0, %zmm0, %zmm0
 ; CHECK-NEXT:    retq
 %b = insertelement <16 x i32> undef, i32 %a, i32 0
 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer
 ret <16 x i32> %c
}

Here, 256-bit broadcast was generated instead of 512-bit one.

In this patch
1) I added vector-shuffle lowering through broadcasts
2) Removed asserts and branches likes because this is incorrect
-  assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI");
3) Fixed lowering tests

llvm-svn: 220774
2014-10-28 12:28:51 +00:00
Adam Nemet
f35a0bba1b [AVX512] Add vpermil variable version
This is implemented via a multiclass that derives from the vperm imm
multiclass.

Fixes <rdar://problem/18426089>

llvm-svn: 220737
2014-10-27 23:08:40 +00:00
Adam Nemet
134480598b [AVX512] Clean up avx512_perm_imm to use X86VectorVTInfo
No functionality change.  No change in X86.td.expanded except that we only set
the CD8 attributes for the memory variants.  (This shouldn't be used unless we
have a memory operand.)

llvm-svn: 220736
2014-10-27 23:08:37 +00:00
Adam Nemet
8cf84f2568 [AVX512] Derive vpermil* from avx512_perm_imm
This used to derive from avx512_pshuf_imm which is confusing.

NFC.  Compared X86.td.expanded.

llvm-svn: 220735
2014-10-27 23:08:34 +00:00
Adam Nemet
78db8293e5 [AVX512] Fix copy-and-paste bugs in vpermil
1) i512mem -> f512mem (this is the packed FP input being permuted)
2) element size is 64 bits in EVEX_CD8 for PD.

(A good illustration why X86VectorVTInfo is useful)

llvm-svn: 220734
2014-10-27 23:08:31 +00:00
Elena Demikhovsky
467839f2f5 AVX-512: Fixed encoding of VPBROADCASTM and added SKX forms of this instruction
llvm-svn: 220638
2014-10-26 09:52:24 +00:00
Adam Nemet
01508c4733 [AVX512] FMA support for the 231 variants
This is asm/diasm-only support, similar to AVX.

For ISeling the register variant, they are no different from 213 other than
whether the multiplication or the addition operand is destructed.

For ISeling the memory variant, i.e. to fold a load, they are no different
than the 132 variant.  The addition operand (op3) in both cases can come from
memory.  Again the ony difference is which operand is destructed.

There could be a post-RA pass that would convert a 213 or 132 into a 231.

Part of <rdar://problem/17082571>

llvm-svn: 220540
2014-10-24 00:03:00 +00:00
Adam Nemet
3beded2b5d [AVX512] Introduce fma3p_forms from AVX
This multiclass generates the different forms: 213, 231, 132 in AVX.

132 in AVX512 is a separate class but I am planning to use this same
multiclass to generate 231 relying on the nice the null_frag trick from AVX to
disable codegen pattern for 231.

No functionality change, no change in X86.td.expanded except for the different
instruction definition names.

llvm-svn: 220539
2014-10-24 00:02:55 +00:00
Adam Nemet
e7c4f25494 [AVX512] Add DQ subvector inserts
In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4
respectively.  These are matched by "Alt" Pat<>'s (Alt stands for alternative
VTs).

Since DQ has native support for these intructions, I peeled off the non-"Alt"
part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are
derived from this multiclass.  The "Alt" Pat<>'s are disabled with DQ.

Fixes <rdar://problem/18426089>

llvm-svn: 219874
2014-10-15 23:42:17 +00:00
Adam Nemet
a3b1e47840 [AVX512] Two new attributes in X86VectorVTInfo for subvector insert
The new attributes are NumElts and the CD8TupleForm.  This prepares the code
to enable x8 and x2 inserts.

NFC, no change in X86.td.expanded except for the new attributes.

llvm-svn: 219871
2014-10-15 23:42:09 +00:00
Adam Nemet
ea53faaf7d [AVX512] Rename arg from Opcode32/64 to Opcode128/256 in vinsert_for_size
It's the W bit that selects between 32 or 64 elt type and not the opcode.  The
opcode selects between the width of the insert (128 or 256).

llvm-svn: 219870
2014-10-15 23:42:04 +00:00
Robert Khasanov
1896253278 [AVX512] Extended avx512_binop_rm to DQ/VL subsets.
Added encoding tests.

llvm-svn: 219686
2014-10-14 15:13:56 +00:00
Robert Khasanov
518a523445 [AVX512] Extended avx512_binop_rm to BW/VL subsets.
Added encoding tests.

llvm-svn: 219685
2014-10-14 14:36:19 +00:00
Robert Khasanov
625ba0e53e [AVX512] Extended avx512_binop_rm for AVX512VL subsets.
Added avx512_binop_rm_vl multiclass for VL subset
Added encoding tests

llvm-svn: 219390
2014-10-09 08:38:48 +00:00
Adam Nemet
248c8e9281 [AVX512] Rename AVX512_masking* to AVX512_maskable*
No functional change.

This is the current AVX512_maskable multiclass hierarchy:

                 maskable_custom
                    /       \
                   /         \
          maskable_common   maskable_in_asm
            /         \
           /           \
      maskable        maskable_3src

llvm-svn: 219363
2014-10-08 23:25:39 +00:00
Adam Nemet
9fae2c02a0 [AVX512] Intrinsics for vextract*x4
This adds the Pat<>'s for the intrinsics.  These are necessary because we
don't lower these intrinsics to SDNodes but match them directly.  See the
rational in the previous commit.

llvm-svn: 219362
2014-10-08 23:25:37 +00:00
Adam Nemet
671fc00888 [AVX512] Add asm-only support for vextract*x4 masking variants
These derive from the new asm-only masking definitions.

Unfortunately I wasn't able to find a ISel pattern that we could legally
generate for the masking variants.  The problem is that since the destination
is v4* we would need VK4 register classes and v4i1 value types to express the
masking.  These are however not legal types/classes in AVX512f but only in VL,
so things get complicated pretty quickly.  We can revisit this question later
if we have a more pressing need to express something like this.

So the ISel patterns are empty for the masking instructions and the next patch
will add Pat<>s instead to match the intrinsics calls with instructions.

llvm-svn: 219361
2014-10-08 23:25:33 +00:00
Adam Nemet
701fdeb2f8 [AVX512] Move DAG for all-zero node to X86VectorVTInfo
No functional change.

No change in X86.td.expanded except for the appearance of the new attributes.

The new attributes will be used in the subsequent patch.

llvm-svn: 219360
2014-10-08 23:25:31 +00:00
Adam Nemet
81fdcfd475 [AVX512] Peel off an asm-only class from AVX512_masking_common.
No functional change.

This enables the generation of masking instructions that don't provide a
ISel pattern.

llvm-svn: 219358
2014-10-08 23:25:23 +00:00
Robert Khasanov
643ca10b43 [AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking.
Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable.
No functional change.

llvm-svn: 219310
2014-10-08 14:37:45 +00:00
Elena Demikhovsky
0be5f4deeb AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q.
This instruction allows to broadacst mask vector to data vector.

llvm-svn: 219083
2014-10-05 14:11:08 +00:00
Adam Nemet
9a91f09952 [AVX512] Pull pattern for subvector insert into the instruction definition
No functional change intended.

Very similar to the change I made for subvector extract in r218480.

test/CodeGen/X86/avx512-insert-extract.ll covers this.

llvm-svn: 218928
2014-10-02 23:18:30 +00:00
Adam Nemet
bcff277351 [AVX512] Refactor subvector inserts
No functional change.

Very similar to the extract refactoring I did in r218478.

Compared X86.td.expanded before and after.

llvm-svn: 218927
2014-10-02 23:18:28 +00:00
Adam Nemet
36d7986f4b [AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rm
Just like in the case of extracts, the refactoring is uncovering some typos in
the code.

llvm-svn: 218926
2014-10-02 23:18:26 +00:00
Adam Nemet
d7923b8d25 [AVX512] Remove space before \t in AsmStrings.
llvm-svn: 218725
2014-10-01 00:41:32 +00:00
Robert Khasanov
61a1201dfa [AVX512] Added intrinsics for 128- and 256-bit versions of VCMPEQ{BWDQ}
Fixed lowering of this intrinsics in case when mask is v2i1 and v4i1.
Now cmp intrinsics lower in the following way:
 (i8 (int_x86_avx512_mask_pcmpeq_q_128
             (v2i64 %a), (v2i64 %b), (i8 %mask))) ->
 (i8 (bitcast
   (v8i1 (insert_subvector undef,
           (v2i1 (and (PCMPEQM %a, %b),
                      (extract_subvector
                         (v8i1 (bitcast %mask)), 0))), 0))))

llvm-svn: 218669
2014-09-30 11:41:54 +00:00
Adam Nemet
b414c07349 [AVX512] Use X86VectorVTInfo in the masking helper classes and the FMAs
No functionality change.

Makes the code more compact (see the FMA part).

This needs a new type attribute MemOpFrag in X86VectorVTInfo.  For now I only
defined this in the simple cases.  See the commment before the attribute.

Diff of X86.td.expanded before and after is empty except for the appearance of
the new attribute.

llvm-svn: 218637
2014-09-29 22:54:41 +00:00
Adam Nemet
75bf4d85fb [AVX512] Simplify use of !con()
No change in X86.td.expanded.

llvm-svn: 218485
2014-09-26 00:53:12 +00:00
Adam Nemet
2f9330edf6 [AVX512] Pull pattern for subvector extract into the instruction definition
No functional change.

I initially thought that pulling the Pat<> into the instruction pattern was
not possible because it was doing a transform on the index in order to convert
it from a per-element (extract_subvector) index into a per-chunk (vextract*x4)
index.

Turns out this also works inside the pattern because the vextract_extract
PatFrag has an OperandTransform EXTRACT_get_vextract{128,256}_imm, so the
index in $idx goes through the same conversion.

The existing test CodeGen/X86/avx512-insert-extract.ll extended in the
previous commit provides coverage for this change.

llvm-svn: 218480
2014-09-25 23:48:49 +00:00
Adam Nemet
d8b4967294 [AVX512] Refactor subvector extracts
No functional change.

These are now implemented as two levels of multiclasses heavily relying on the
new X86VectorVTInfo class.  The multiclass at the first level that is called
with float or int provides the 128 or 256 bit subvector extracts.  The second
level provides the register and memory variants and some more Pat<>s.

I've compared the td.expanded files before and after.  One change is that
ExeDomain for 64x4 is SSEPackedDouble now.  I think this is correct, i.e. a
bugfix.

(BTW, this is the change that was blocked on the recent tablegen fix.  The
class-instance values X86VectorVTInfo inside vextract_for_type weren't
properly evaluated.)

Part of <rdar://problem/17688758>

llvm-svn: 218478
2014-09-25 23:48:45 +00:00
Adam Nemet
1dca2e1268 [AVX512] Fix typo
F->I in VEXTRACTF32x4rr.

llvm-svn: 218477
2014-09-25 23:48:42 +00:00
Chandler Carruth
322207fcc8 [x86] Rename X86ISD::VPERMILP to X86ISD::VPERMILPI (and the same for the
td pattern). Currently we only model the immediate operand variation of
VPERMILPS and VPERMILPD, we should make that clear in the pseudos used.
Will be adding support for the variable mask variant in my next commit.

llvm-svn: 218282
2014-09-22 22:29:42 +00:00
Robert Khasanov
65b11b1f03 [SKX] Deriving rmb multiclasses from general one (avx512_icmp_packed_rmb and avx512_icmp_cc_rmb).
Thanks Adam Nemet for notice about this.

llvm-svn: 218051
2014-09-18 14:06:55 +00:00
Chandler Carruth
5b09348e8e [x86] Fix a pretty horrible bug and inconsistency in the x86 asm
parsing (and latent bug in the instruction definitions).

This is effectively a revert of r136287 which tried to address
a specific and narrow case of immediate operands failing to be accepted
by x86 instructions with a pretty heavy hammer: it introduced a new kind
of operand that behaved differently. All of that is removed with this
commit, but the test cases are both preserved and enhanced.

The core problem that r136287 and this commit are trying to handle is
that gas accepts both of the following instructions:

  insertps $192, %xmm0, %xmm1
  insertps $-64, %xmm0, %xmm1

These will encode to the same byte sequence, with the immediate
occupying an 8-bit entry. The first form was fixed by r136287 but that
broke the prior handling of the second form! =[ Ironically, we would
still emit the second form in some cases and then be unable to
re-assemble the output.

The reason why the first instruction failed to be handled is because
prior to r136287 the operands ere marked 'i32i8imm' which forces them to
be sign-extenable. Clearly, that won't work for 192 in a single byte.
However, making thim zero-extended or "unsigned" doesn't really address
the core issue either because it breaks negative immediates. The correct
fix is to make these operands 'i8imm' reflecting that they can be either
signed or unsigned but must be 8-bit immediates. This patch backs out
r136287 and then changes those places as well as some others to use
'i8imm' rather than one of the extended variants.

Naturally, this broke something else. The custom DAG nodes had to be
updated to have a much more accurate type constraint of an i8 node, and
a bunch of Pat immediates needed to be specified as i8 values.

The fallout didn't end there though. We also then ceased to be able to
match the instruction-specific intrinsics to the instructions so
modified. Digging, this is because they too used i32 rather than i8 in
their signature. So I've also switched those intrinsics to i8 arguments
in line with the instructions.

In order to make the intrinsic adjustments of course, I also had to add
auto upgrading for the intrinsics.

I suspect that the intrinsic argument types may have led everything down
this rabbit hole. Pretty happy with the result.

llvm-svn: 217310
2014-09-06 10:00:01 +00:00
Robert Khasanov
a051690c52 [SKX] Added new versions of cmp instructions in avx512_icmp_cc multiclass, added VL multiclass.
Added encoding tests

llvm-svn: 216532
2014-08-27 09:34:37 +00:00
Elena Demikhovsky
fec7633c50 AVX-512: Added intrinsic for VMOVSS store form with mask.
llvm-svn: 216530
2014-08-27 07:38:43 +00:00
Robert Khasanov
4316b2ca5f [SKX] avx512_icmp_packed multiclass extension
Extended avx512_icmp_packed multiclass by masking versions.
Added avx512_icmp_packed_rmb multiclass for embedded broadcast versions.
Added corresponding _vl multiclasses.
Added encoding tests for CPCMP{EQ|GT}* instructions.
Add more fields for X86VectorVTInfo.
Added AVX512VLVectorVTInfo that include X86VectorVTInfo for 512/256/128-bit versions

Differential Revision: http://reviews.llvm.org/D5024

llvm-svn: 216383
2014-08-25 14:49:34 +00:00
Adam Nemet
ab33858cc4 [AVX512] Add class to group common template arguments related to vector type
We discussed the issue of generality vs. readability of the AVX512 classes
recently.  I proposed this approach to try to hide and centralize the mappings
we commonly perform based on the vector type.  A new class X86VectorVTInfo
captures these.

The idea is to pass an instance of this class to classes/multiclasses instead
of the corresponding ValueType.  Then the class/multiclass can use its field
for things that derive from the type rather than passing all those as separate
arguments.

I modified avx512_valign to demonstrate this new approach.  As you can see
instead of 7 related template parameters we now have one.  The downside is
that we have to refer to fields for the derived values.  I named the argument
'_' in order to make this as invisible as possible.  Please let me know if you
absolutely hate this.  (Also once we allow local initializations in
multiclasses we can recover the original version by assigning the fields to
local variables.)

Another possible use-case for this class is to directly map things, e.g.:

  RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC

llvm-svn: 216209
2014-08-21 19:50:07 +00:00
Elena Demikhovsky
e4fb4063fb AVX-512: Fixed a bug in emitting compare for MVT:i1 type.
Added a test.

llvm-svn: 215889
2014-08-18 11:59:06 +00:00
Adam Nemet
ccc6a7155c [AVX512] Add masking variant for the FMA instructions
This change further evolves the base class AVX512_masking in order to make it
suitable for the masking variants of the FMA instructions.

Besides AVX512_masking there is now a new base class that instructions
including FMAs can use: AVX512_masking_3src.  With three-source (destructive)
instructions one of the sources is already tied to the destination.  This
difference from AVX512_masking is captured by this new class.  The common bits
between _masking and _masking_3src are broken out into a new super class
called AVX512_masking_common.

As with valign, there is some corresponding restructuring of the underlying
format classes.  The idea is the same we want to derive from two classes
essentially: one providing the format bits and another format-independent
multiclass supplying the various masking and non-masking instruction variants.

Existing fma tests in avx512-fma*.ll provide coverage here for the non-masking
variants.  For masking, the next patches in the series will add intrinsics and
intrinsic tests.

For AVX512_masking_3src to work, the (ins ...) dag has to be passed *without*
the leading source operand that is tied to dst ($src1).  This is necessary to
properly construct the (ins ...) for the different variants.  For the record,
I did check that if $src is mistakenly included, you do get a fairly intuitive
error message from the tablegen backend.

Part of <rdar://problem/17688758>

llvm-svn: 215660
2014-08-14 17:13:19 +00:00
Robert Khasanov
8a64ff14da [SKX] Extended non-temporal load/store instructions for AVX512VL subsets.
Added avx512_movnt_vl multiclass for handling 256/128-bit forms of instruction.
Added encoding and lowering tests.

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 215536
2014-08-13 10:46:00 +00:00
Adam Nemet
6d75e0e06f [AVX512] Handle valign masking intrinsic via C++ lowering
I think that this will scale better in most cases than adding a Pat<> for each
mapping from the intrinsic DAG to the intruction (i.e. rri, rrik, rrikz).  We
can just lower to the SDNode and have the resulting DAG be matches by the DAG
patterns.

Alternatively (long term), we could keep the Pat<>s but generate them via the
new AVX512_masking multiclass.  The difficulty is that in order to formulate
that we would have to concatenate DAGs.  Currently this is only supported if
the operators of the input DAGs are identical.

llvm-svn: 215473
2014-08-12 21:13:12 +00:00
Elena Demikhovsky
ca6b9b192b AVX-512: added a missing bitcast from v16f32 to v16i32
llvm-svn: 215351
2014-08-11 09:59:08 +00:00
Adam Nemet
028986cf25 [AVX512] Add zero-masking variant to AVX512_masking multiclass
This completes one item from the todo-list of r215125 "Generate masking
instruction variants with tablegen".

The AddedComplexity is needed just like for the k variant.

Added a codegen test based on valignq.

llvm-svn: 215173
2014-08-07 23:53:38 +00:00
Adam Nemet
d85e0c0ff8 [AVX512] Add codegen test for the masking variant of valign
The AddedComplexity is needed just like in avx512_perm_3src.  There may be a
bug in the complexity computation...

llvm-svn: 215168
2014-08-07 23:18:18 +00:00
Adam Nemet
f757ff3eea [AVX512] Generate masking instruction variants with tablegen
After adding the masking variants to several instructions, I have decided to
experiment with generating these from the non-masking/unconditional
variant. This will hopefully reduce the amount repetition that we currently
have in order to define an instruction with all its variants (for a reg/mem
instruction this would be 6 instruction defs and 2 Pat<> for the intrinsic).

The patch is the first cut that is currently only applied to valignd/q to make
the patch small.

A few notes on the approach:

  * In order to stitch together the dag for both the conditional and the
  unconditional patterns I pass the RHS of the set rather than the full
  pattern (set dest, RHS).
  * Rather than subclassing each instruction base class (e.g. AVX512AIi8),
  with a masking variant which wouldn't scale, I derived the masking
  instructions from a new base class AVX512 (this is just I<> with
  Requires<HasAVX512>).  The instructions derive from this now, plus a new set
  of classes that add the format bits and everything else that instruction
  base class provided (i.e. AVX512AIi8 vs. AVX512AIi8Base).

I hope we can go incrementally from here.  I expect that:

  * We will need different variants of the masking class.  One example is
  instructions requiring three vector sources.  In this case we tie one of the
  source operands to dest rather than a new implicit source operand ($src0)
  * Add the zero-masking variant
  * Add more AVX512*Base classes as new uses are added

I've looked at X86.td.expanded before and after to make sure that nothing got
lost for valignd/q.

llvm-svn: 215125
2014-08-07 17:53:55 +00:00
Adam Nemet
aea49d4d5f [X86] Fixes commit r214890 to match the posted patch
This was another fallout from my local rebase where something went wrong :(

llvm-svn: 214951
2014-08-06 07:13:12 +00:00
Adam Nemet
d956b84b6b [AVX512] Add masking variant and intrinsics for valignd/q
This is similar to what I did with the two-source permutation recently.  (It's
almost too similar so that we should consider generating the masking variants
with some tablegen help.)

Both encoding and intrinsic tests are added as well.  For the latter, this is
what the IR that the intrinsic test on the clang side generates.

Part of <rdar://problem/17688758>

llvm-svn: 214890
2014-08-05 17:23:04 +00:00
Adam Nemet
6ce89969d7 [X86] Add lowering to VALIGN
This was currently part of lowering to PALIGNR with some special-casing to
make interlane shifting work.  Since AVX512F has interlane alignr (valignd/q)
and AVX512BW has vpalignr we need to support both of these *at the same time*,
e.g. for SKX.

This patch breaks out the common code and then add support to check both of
these lowering options from LowerVECTOR_SHUFFLE.

I also added some FIXMEs where I think the AVX512BW and AVX512VL additions
should probably go.

llvm-svn: 214888
2014-08-05 17:22:59 +00:00
Adam Nemet
1a6d2ace74 [AVX512] alignr: Use suffix rather than name argument to multiclass
Again no functional change.  This prepares for the suffix to be used with the
intrinsic matching.

llvm-svn: 214886
2014-08-05 17:22:52 +00:00
Adam Nemet
b6384f0bad [AVX512] Pull everything alignr-related into the multiclass
The packed integer pattern becomes the DAG pattern for rri and the packed
float, another Pat<> inside the multiclass.

No functional change.

llvm-svn: 214885
2014-08-05 17:22:50 +00:00
Adam Nemet
04f70d285b Wrap long lines
llvm-svn: 214884
2014-08-05 17:22:47 +00:00
Robert Khasanov
35dfdfef2d [SKX] Enabling load/store instructions: encoding
Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS,

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 214719
2014-08-04 14:35:15 +00:00
Robert Khasanov
d86d770d47 [SKX] Enabling mask logic instructions: encoding, lowering
Instructions: KAND{BWDQ}, KANDN{BWDQ}, KOR{BWDQ}, KXOR{BWDQ}, KXNOR{BWDQ}

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 214081
2014-07-28 13:46:45 +00:00
Robert Khasanov
cfc9aa43e1 [SKX] Enabling mask instructions: encoding, lowering
KMOVB, KMOVW, KMOVD, KMOVQ, KNOTB, KNOTW, KNOTD, KNOTQ

Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 213757
2014-07-23 14:49:42 +00:00
Elena Demikhovsky
50a62c2883 AVX-512: Fixed intrinsic of VSQRTPS/PD instructions.
I set number and types of parameters according to GCC intrinsics.

llvm-svn: 213640
2014-07-22 11:07:31 +00:00
Robert Khasanov
ae2da173af [SKX] Enabling SKX target and AVX512BW, AVX512DQ, AVX512VL features.
Enabling HasAVX512{DQ,BW,VL} predicates.
Adding VK2, VK4, VK32, VK64 masked register classes.
Adding new types (v64i8, v32i16) to VR512.
Extending calling conventions for new types (v64i8, v32i16)

Patch by Zinovy Nis <zinovy.y.nis@intel.com>
Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com>

llvm-svn: 213545
2014-07-21 14:54:21 +00:00
Adam Nemet
ee6f661620 [X86] AVX512: Only allow k1-k7 as predicates to vpcmp*
As destination k0 is allowed but not as predicate/writemask.

I also modified the test to allow checking of error messages by the assembler.
I applied a similar approach to the test ret.s in the same directory.

llvm-svn: 212504
2014-07-08 00:22:32 +00:00
Adam Nemet
e66defe4d5 [X86] AVX512: Allow writemask argument in vpermt* intrinsics
llvm-svn: 212223
2014-07-02 21:26:01 +00:00
Adam Nemet
d43dcabf24 [X86] AVX512: Generate Pat<>'s for the vpermt2* intrinsics via multiclass
This new multiclass, avx512_perm_table_3src derives from the current one and
provides the Pat<>.  The next patch will add another Pat<> that uses the
writemask.

Note that I dropped the type annotation from the intrinsic call, i.e.: (v16f32
VR512:$src1) -> R512:$src1.  I think that this should be fine (at least many
intrinsic calls don't provide them) and it greatly reduces the number of
template arguments.

llvm-svn: 212222
2014-07-02 21:25:58 +00:00
Adam Nemet
ef83b36688 [X86] AVX512: Add writemask variants for vperm*2*
This includes assembler and codegen support (see the new tests in
avx512-encodings.s and avx512-shuffle.ll).

<rdar://problem/17492620>

llvm-svn: 212221
2014-07-02 21:25:54 +00:00
Adam Nemet
da774db14e [X86] AVX512: Allow writemasks with vpcmp
For now I only updated the _alt variants.  The main variants are used by
codegen and that will need a bit more work to trigger.

<rdar://problem/17492620>

llvm-svn: 212114
2014-07-01 18:03:45 +00:00
Adam Nemet
3b6318203d [X86] AVX512: Factor generating the AsmString into avx512_icmp_cc
Adding a writemask variant would require a third asm string to be passed to
the template.  Generate the AsmString in the template instead.

No change in X86.td.expanded.

llvm-svn: 212113
2014-07-01 18:03:43 +00:00
Adam Nemet
e14e099b2e [X86] AVX512: Add vbroadcasti*
For now I used a separate template for these sub-vector/tuple broadcasts
rather than sharing the mem variants with avx512_int_broadcast_rm.

<rdar://problem/17402869>

llvm-svn: 211828
2014-06-27 00:43:38 +00:00
Adam Nemet
aa04918ad4 [X86] AVX512: Fix asm syntax for packed vcmp
The *_alt defs for vcmp are used by the InstParser (the asm string in the main
def is used by the InstPrinter) .  The former was accepting vector registers
as destination rather than mask registers.

llvm-svn: 211750
2014-06-26 00:21:12 +00:00
Adam Nemet
d36a2c2dba [X86] AVX512: Add non-temporal stores
Note that I followed the AVX2 convention here and didn't add LLVM intrinsics
for stores.  These can be generated with the nontemporal hint on LLVM IR
stores (see new test). The GCC builtins are lowered directly into nontemporal
stores.

<rdar://problem/17082571>

llvm-svn: 211176
2014-06-18 16:51:10 +00:00
Adam Nemet
2673c0a107 [X86] AVX512: Specify compressed displacement for vmovntdqa
Use the max 64-bit element size with EVEX_CD8.  This should work since element
size is ignored for a full-vector access (FVM).

llvm-svn: 211175
2014-06-18 16:51:07 +00:00
Cameron McInally
8c86d371c4 Add pattern for unsigned v4i32->v4f64 convert on AVX512.
llvm-svn: 211164
2014-06-18 14:04:37 +00:00
Cameron McInally
1bfa586059 Hook up vector int_ctlz for AVX512.
llvm-svn: 211024
2014-06-16 14:12:28 +00:00
Cameron McInally
8909338d50 Add HasCDI predicate to AVX512 VPBROADCASTM*.
llvm-svn: 210892
2014-06-13 11:40:31 +00:00
Cameron McInally
a3c0d56a4d Add AVX512 masked leadz instrinsic support.
llvm-svn: 210652
2014-06-11 12:54:45 +00:00
Adam Nemet
db983b8c6b [X86] AVX512: Add vmovntdqa
Along with the corresponding intrinsic and tests.

llvm-svn: 210543
2014-06-10 16:39:53 +00:00
Elena Demikhovsky
784490ba2d AVX-512: changes in intrinsics
1) Changed gather and scatter intrinsics. Now they are aligned with GCC built-ins. There is no more non-masked form. Masked intrinsic receives -1 if all lanes are executed.
2) I changed the function that works with intrinsics inside X86ISelLowering.cpp. I put all intrinsics in one table. I did it for INTRINSICS_W_CHAIN and plan to put all intrinsics from WO_CHAIN set to the same table in order to avoid the long-long "switch". (I wanted to use static map initialization that allowed by C++11 but I wasn't able to compile it on VS2012).
3) I added gather/scatter prefetch intrinsics.
4) I fixed MRMm encoding for masked instructions.

llvm-svn: 208522
2014-05-12 07:18:51 +00:00
Elena Demikhovsky
52b1f22e9c AVX-512: minor change in rndscale intrinsic
llvm-svn: 207937
2014-05-04 13:35:37 +00:00
Elena Demikhovsky
b3422ddc5e AVX-512: optimized a shuffle pattern to VINSERTI64x4.
Added intrinsics for VPERMT2PS/PD/D/Q instructions.

llvm-svn: 207513
2014-04-29 09:09:15 +00:00
Elena Demikhovsky
0038f7ae47 AVX-512: store and truncstore for i1 values
llvm-svn: 206897
2014-04-22 14:13:10 +00:00
Robert Khasanov
5fa2a3546f [AVX512] Implemented integer conversions up/down with masking.
Added encoding tests.

llvm-svn: 206884
2014-04-22 11:36:19 +00:00
Filipe Cabecinhas
1ca83b7941 Rename X86insrtps to the proper instruction name.
Summary:
The INSERTPS pattern fragment was called insrtps (mising 'e'), which
would make it harder to grep for the patterns related to this instruction.
Renaming it to use the proper instruction name.

Reviewers: nadav

CC: llvm-commits

Differential Revision: http://reviews.llvm.org/D3443

llvm-svn: 206779
2014-04-21 20:07:29 +00:00
Elena Demikhovsky
56ab81fd87 AVX-512: insert element to mask vector; store i1 data
Implemented INSERT_VECTOR_ELT operation for v16i1 and v8i1 vectors;
Implemented "store" for i1 type

llvm-svn: 205850
2014-04-09 12:37:50 +00:00
Elena Demikhovsky
73f5b6faba AVX-512: Added fp_to_uint and uint_to_fp patterns.
llvm-svn: 205754
2014-04-08 07:24:02 +00:00
Robert Khasanov
37729a0b8e Test commit.
llvm-svn: 205214
2014-03-31 16:01:38 +00:00
Elena Demikhovsky
624ece9d50 AVX-512: Implemented masking for integer arithmetic & logic instructions.
By Robert Khasanov rob.khasanov@gmail.com

llvm-svn: 204906
2014-03-27 09:45:08 +00:00
Cameron McInally
8872097c93 Fix AVX512 Gather and Scatter execution domains.
llvm-svn: 204804
2014-03-26 13:50:50 +00:00
Elena Demikhovsky
d3e6f6628c AVX-512: masked load/store + intrinsics for them.
llvm-svn: 203790
2014-03-13 12:05:52 +00:00
Elena Demikhovsky
71d04d27da AVX-512: Added rrk, rrkz, rmk, rmkz, rmbk, rmbkz versions of AVX512 FP packed instructions, added encoding tests for them.
By Robert Khazanov.

llvm-svn: 203098
2014-03-06 08:45:30 +00:00
Elena Demikhovsky
838b163a58 AVX-512: Fixed extract_vector_elt for v8i1 vector
llvm-svn: 202624
2014-03-02 09:19:44 +00:00
Elena Demikhovsky
ade0be1dbb AVX-512: Fixed encoding of VPCMPEQ and VPCMPGT
llvm-svn: 202015
2014-02-24 10:08:30 +00:00
Elena Demikhovsky
1804845947 AVX-512: Fixed encoding of VPTESTMQ
llvm-svn: 201980
2014-02-23 14:28:35 +00:00
Elena Demikhovsky
af8e1ef280 AVX-512: Assembly parsing of broadcast semantic in AVX-512; imlemented by Nis Zinovy (zinovy.y.nis@intel.com)
Fixed truncate i32 to i1; a test will be provided in the next commit.

llvm-svn: 201757
2014-02-20 06:34:39 +00:00
Cameron McInally
7173a45caf Fix AVX512 vector sqrt assembly strings.
llvm-svn: 201681
2014-02-19 15:16:09 +00:00
Craig Topper
de78f4304d Add an x86 prefix encoding for instructions that would decode to a different instruction with 0xf2/f3/66 were in front of them, but don't themselves have a prefix. For now this doesn't change any bbehavior, but plan to use it to fix some bugs in the disassembler.
llvm-svn: 201538
2014-02-18 00:21:49 +00:00
Elena Demikhovsky
0e85630ee2 AVX-512: implemented zext fron i1 to i16
llvm-svn: 201502
2014-02-17 07:29:33 +00:00
Elena Demikhovsky
2cdad2b3d4 AVX-512: simpyfied BUILD_VECTOR for masks; fixed cmp/test sequence
llvm-svn: 201487
2014-02-16 11:34:23 +00:00
Elena Demikhovsky
ac4dfca982 AVX-512: Optimized BUILD_VECTOR pattern;
fixed encoding of VEXTRACTPS instruction.

llvm-svn: 201134
2014-02-11 07:25:59 +00:00
Elena Demikhovsky
110d93ce93 AVX-512: Fixed extract_vector_elt for v16i1 and v8i1 vectors.
llvm-svn: 201066
2014-02-10 07:02:39 +00:00
Elena Demikhovsky
2e0202b75e AVX-512: Added intrinsic for cvtph2ps.
Added VPTESTNM instruction.
Added a pattern to vselect (lit tests will follow).

llvm-svn: 200823
2014-02-05 07:05:03 +00:00
Craig Topper
f97f309449 Simplify some x86 format classes and remove some ambiguities in their application.
llvm-svn: 200608
2014-02-01 08:17:56 +00:00
Craig Topper
7940cc70b4 Remove duplicate pattern and add predicate checks on other patterns.
llvm-svn: 200455
2014-01-30 06:03:19 +00:00
Elena Demikhovsky
6f951ffaa0 AVX-512: added VPERM2D VPERM2Q VPERM2PS VPERM2PD instructions,
they give better sequences than VPERMI

llvm-svn: 199893
2014-01-23 14:27:26 +00:00
Elena Demikhovsky
4e0fae8031 AVX-512: optimized scalar compare patterns
removed AVX512SI format, since it is similar to AVX512BI.

llvm-svn: 199217
2014-01-14 15:10:08 +00:00
Craig Topper
1c1ecbff81 Separate the concept of 16-bit/32-bit operand size controlled by 0x66 prefix and the current mode from the concept of SSE instructions using 0x66 prefix as part of their encoding without being affected by the mode.
This should allow SSE instructions to be encoded correctly in 16-bit mode which r198586 probably broke.

llvm-svn: 199193
2014-01-14 07:41:20 +00:00
Elena Demikhovsky
e635ade802 AVX-512: Embedded Rounding Control - encoding and printing
Changed intrinsics for vrcp14/vrcp28 vrsqrt14/vrsqrt28 - aligned with GCC.

llvm-svn: 199102
2014-01-13 12:55:03 +00:00
Elena Demikhovsky
1ecccf9364 AVX-512: Added more intrinsics for pmin/pmax, pabs, blend, pmuldq.
llvm-svn: 198745
2014-01-08 10:54:22 +00:00
Elena Demikhovsky
591c25725f AVX-512: added intrinsic vcvtpd2ps (with rounding mode and without)
llvm-svn: 198593
2014-01-06 08:45:54 +00:00
Elena Demikhovsky
935f81172d AVX-512: changed property name from "neverHasSideEffects=1" to "hasSideEffects=0", added this property to VMOVSS/VMOVSD;
Optimized a truncate pattern.

llvm-svn: 198562
2014-01-05 14:21:07 +00:00
Elena Demikhovsky
034a667c24 AVX-512: Added more intrinsics for convert and min/max.
Removed vzeroupper from AVX-512 mode - our optimization gude does not recommend to insert vzeroupper at all.

llvm-svn: 198557
2014-01-05 10:46:09 +00:00
Craig Topper
eea372cfa3 Mark x86 _alt instructions as AsmParserOnly so they will be omitted from disassembler without string matches.
llvm-svn: 198545
2014-01-05 04:55:55 +00:00
Craig Topper
4a48c26e38 Add a new x86 specific instruction flag to force some isCodeGenOnly instructions to go through to the disassembler tables without resorting to string matches. Apply flag to all _REV instructions.
llvm-svn: 198543
2014-01-05 04:17:28 +00:00
Craig Topper
57b949fa83 Mark all x86 Int_ and _Int patterns as isCodeGenOnly so the disassembler table builder doesn't need to string match them to exclude them.
llvm-svn: 198323
2014-01-02 17:28:14 +00:00
Elena Demikhovsky
7174584583 AVX-512: Added intrinsics for vcvt, vcvtt, vrndscale, vcmp
Printing rounding control.
Enncoding for EVEX_RC (rounding control).

llvm-svn: 198277
2014-01-01 15:12:34 +00:00
Elena Demikhovsky
ee5004d112 AVX-512: Result type of scalar SETCC is MVT::i1 for AVX-512.
llvm-svn: 198008
2013-12-25 10:06:40 +00:00
Elena Demikhovsky
2d23dc9650 AVX-512: fixed some patterns for MVT::i1
llvm-svn: 197981
2013-12-24 14:24:07 +00:00
Elena Demikhovsky
39275c48ca AVX512: SETCC returns i1 for AVX-512 and i8 for all others
llvm-svn: 197876
2013-12-22 10:13:18 +00:00
Elena Demikhovsky
241694a7bc AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader).
Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1.

llvm-svn: 197482
2013-12-17 08:33:15 +00:00
Elena Demikhovsky
b43ccbc3f7 AVX-512: Added legal type MVT::i1 and VK1 register for it.
Added scalar compare VCMPSS, VCMPSD.
Implemented LowerSELECT for scalar FP operations.
I replaced FSETCCss, FSETCCsd with one node type FSETCCs.
Node extract_vector_elt(v16i1/v8i1, idx) returns an element of type i1.

llvm-svn: 197384
2013-12-16 13:52:35 +00:00
Elena Demikhovsky
154413adc2 AVX-512: Removed "z" suffix from AVX-512 instructions, since it is incompatible with GCC.
I moved a test from avx512-vbroadcast-crash.ll to avx512-vbroadcast.ll
I defined HasAVX512 predicate as AssemblerPredicate. It means that you should invoke llvm-mc with "-mcpu=knl" to get encoding for AVX-512 instructions. I need this to let AsmMatcher to set different encoding for AVX and AVX-512 instructions that have the same mnemonic and operands (all scalar instructions).

llvm-svn: 197041
2013-12-11 14:31:04 +00:00
Elena Demikhovsky
57057960b0 AVX-512: changed intrinsics for mask operations
llvm-svn: 196918
2013-12-10 13:53:10 +00:00
Elena Demikhovsky
b3a0e7bbed AVX-512: Changed intrinsics of VPCONFLICT to match GCC builtin form
llvm-svn: 196914
2013-12-10 11:58:35 +00:00
Cameron McInally
ca9f2bc25b Update AVX512 vector blend intrinsic names.
llvm-svn: 196581
2013-12-06 13:35:35 +00:00
Cameron McInally
675f9245aa Add AVX512 patterns for v16i32 broadcast and v2i64 zero extend load.
Patch by Aleksey Bader.

llvm-svn: 196435
2013-12-05 00:11:25 +00:00