llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 03:33:20 +01:00

History

Chandler Carruth 2a4813287b [x86] Teach the vector shuffle lowering to make a more nuanced decision between splitting a vector into 128-bit lanes and recombining them vs. decomposing things into single-input shuffles and a final blend. This handles a large number of cases in AVX1 where the cross-lane shuffles would be much more expensive to represent even though we end up with a fast blend at the root. Instead, we can do a better job of shuffling in a single lane and then inserting it into the other lanes. This fixes the remaining bits of Halide's regression captured in PR21281 for AVX1. However, the bug persists in AVX2 because I've made this change reasonably conservative. The cases where it makes sense in AVX2 to split into 128-bit lanes are much more rare because we can often do full permutations across all elements of the 256-bit vector. However, the particular test case in PR21281 is an example of one of the rare cases where it is always better to work in a single 128-bit lane. I'm going to try to teach the logic to detect and form the good code even in AVX2 next, but it will need to use a separate heuristic. Finally, there is one pesky regression here where we previously would craftily use vpermilps in AVX1 to shuffle both high and low halves at the same time. We no longer pull that off, and not for any really good reason. Ultimately, I think this is just another missing nuance to the selection heuristic that I'll try to add in afterward, but this change already seems strictly worth doing considering the magnitude of the improvements in common matrix math shuffle patterns. As always, please let me know if this causes a surprising regression for you. llvm-svn: 221861		2014-11-13 04:06:10 +00:00
..
Analysis	[X86] Custom lower UINT_TO_FP from v4f32 to v4i32, and for v8f32 to v8i32 if	2014-11-11 02:23:47 +00:00
Assembler
Bindings
Bitcode
BugPoint
CodeGen	[x86] Teach the vector shuffle lowering to make a more nuanced decision	2014-11-13 04:06:10 +00:00
DebugInfo	Add an assert and a test that verify r221709's fix.	2014-11-13 03:20:23 +00:00
ExecutionEngine
Feature
FileCheck
Instrumentation	Move asan-coverage into a separate phase.	2014-11-11 22:14:37 +00:00
Integer
JitListener
Linker
LTO	Add Forward Control-Flow Integrity.	2014-11-11 21:08:02 +00:00
MC	[mips] Add hardware register name "hwr_ulr" ($29)	2014-11-11 11:22:39 +00:00
Object	Object, support both mach-o archive t.o.c file names	2014-11-12 01:37:45 +00:00
Other
SymbolRewriter
TableGen
tools	llvm-readobj: Print out address table when dumping COFF delay-import table	2014-11-13 03:22:54 +00:00
Transforms	Teach ScalarEvolution to sharpen range information.	2014-11-13 00:00:58 +00:00
Unit
Verifier
YAMLParser
.clang-format
CMakeLists.txt
lit.cfg	Only run the gold plugin tests if gold supports the targets we test with.	2014-11-11 05:27:12 +00:00
lit.site.cfg.in
Makefile
Makefile.tests
TestRunner.sh