llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-26 22:42:46 +02:00

Author	SHA1	Message	Date
Robert Khasanov	a87de33c61	[AVX512] Bring back vector-shuffle lowering support through broadcasts Ffter commit at rev219046 512-bit broadcasts lowering become non-optimal. Most of tests on broadcasting and embedded broadcasting were changed and they doesn’t produce efficient code. Example below is from commit changes (it’s the first test from test/CodeGen/X86/avx512-vbroadcast.ll): define <16 x i32> @_inreg16xi32(i32 %a) { ; CHECK-LABEL: _inreg16xi32: ; CHECK: ## BB#0: -; CHECK-NEXT: vpbroadcastd %edi, %zmm0 +; CHECK-NEXT: vmovd %edi, %xmm0 +; CHECK-NEXT: vpbroadcastd %xmm0, %ymm0 +; CHECK-NEXT: vinserti64x4 $1, %ymm0, %zmm0, %zmm0 ; CHECK-NEXT: retq %b = insertelement <16 x i32> undef, i32 %a, i32 0 %c = shufflevector <16 x i32> %b, <16 x i32> undef, <16 x i32> zeroinitializer ret <16 x i32> %c } Here, 256-bit broadcast was generated instead of 512-bit one. In this patch 1) I added vector-shuffle lowering through broadcasts 2) Removed asserts and branches likes because this is incorrect - assert(Subtarget->hasDQI() && "We can only lower v8i64 with AVX-512-DQI"); 3) Fixed lowering tests llvm-svn: 220774	2014-10-28 12:28:51 +00:00
Chandler Carruth	5063f25595	[x86] Enable the new vector shuffle lowering by default. Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was extremely little support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and many others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046	2014-10-04 03:52:55 +00:00
Chandler Carruth	1ebea4c8bd	[x86] Regenerate a bunch more avx512 test cases using my script to have tighter, more strict FileCheck assertions. Some of these I really like as they show case exactly what instruction sequences come out of these microscopic functionality tests. llvm-svn: 218936	2014-10-03 00:50:03 +00:00
Robert Khasanov	d60b557e6d	[SKX] Enable lowering of integer CMP operations. Added new types to Legalizer. Fixed getSetCCResultType function Added lowering tests. Reviewed by Elena Demikhovsky. llvm-svn: 216717	2014-08-29 08:46:04 +00:00
Robert Khasanov	35dfdfef2d	[SKX] Enabling load/store instructions: encoding Instructions: VMOVAPD, VMOVAPS, VMOVDQA8, VMOVDQA16, VMOVDQA32,VMOVDQA64, VMOVDQU8, VMOVDQU16, VMOVDQU32,VMOVDQU64, VMOVUPD, VMOVUPS, Reviewed by Elena Demikhovsky <elena.demikhovsky@intel.com> llvm-svn: 214719	2014-08-04 14:35:15 +00:00
Elena Demikhovsky	f8035c205a	AVX-512: optimized icmp -> sext -> icmp pattern llvm-svn: 200849	2014-02-05 16:17:36 +00:00
Elena Demikhovsky	45e0f9e6b6	AVX512: combining setcc and zext is wrong on AVX512 because vector compare instruction puts result in mask register. llvm-svn: 199798	2014-01-22 12:26:19 +00:00
Elena Demikhovsky	241694a7bc	AVX-512: Added implementation of CONCAT_VECTORS for v8i1 vectors (by Alexey Bader). Added implementation of "truncate" from integer type (i64/i32/i16/i8) to i1. llvm-svn: 197482	2013-12-17 08:33:15 +00:00
Juergen Ributzka	4554c8ed29	[X86] Perform VSELECT DAG combines also before DAG type legalization. If the DAG already has only legal types, then the second round of DAG combines is skipped. In this case VSELECT+SETCC patterns that match a more efficient instruction (e.g. min/max) are never recognized. This fix allows VSELECT+SETCC combines if the types are already legal before DAG type legalization. Reviewer: Nadav llvm-svn: 190105	2013-09-05 23:02:56 +00:00
Elena Demikhovsky	f1afd2e4db	AVX-512: added arithmetic and logical operations. ADD, SUB, MUL integer and FP types. OR, AND, XOR. Added embeded broadcast form for these instructions. llvm-svn: 188673	2013-08-19 13:26:14 +00:00
Elena Demikhovsky	42b33ee116	AVX-512: Added CMP and BLEND instructions. Lowering for SETCC. llvm-svn: 188265	2013-08-13 13:24:07 +00:00

11 Commits