1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00
llvm-mirror/test/Transforms
Sanjay Patel b4b4a9aeb1 [InstCombine] transform more extract/insert pairs into shuffles (PR2109)
This is an extension of the shuffle combining from r203229:
http://reviews.llvm.org/rL203229

The idea is to widen a short input vector with undef elements so the
existing shuffle transform for extract/insert can kick in.

The motivation is to finally solve PR2109:
https://llvm.org/bugs/show_bug.cgi?id=2109

For that example, the IR becomes:

%1 = bitcast <2 x i32>* %P to <2 x float>*
%ld1 = load <2 x float>, <2 x float>* %1, align 8
%2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef>
%i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5>
ret <4 x float> %i2

And x86 SSE output improves from:

movq	(%rdi), %xmm1           ## xmm1 = mem[0],zero
movdqa	%xmm1, %xmm2
shufps	$229, %xmm2, %xmm2      ## xmm2 = xmm2[1,1,2,3]
shufps	$48, %xmm0, %xmm1       ## xmm1 = xmm1[0,0],xmm0[3,0]
shufps	$132, %xmm1, %xmm0      ## xmm0 = xmm0[0,1],xmm1[0,2]
shufps	$32, %xmm0, %xmm2       ## xmm2 = xmm2[0,0],xmm0[2,0]
shufps	$36, %xmm2, %xmm0       ## xmm0 = xmm0[0,1],xmm2[2,0]
retq

To the almost optimal:

movhpd	(%rdi), %xmm0

Note: There's a tension in the existing transform related to generating
arbitrary shufflevector masks. We avoid that in other places in InstCombine
because we're scared that codegen can't handle strange masks, but it looks
like we're ok with producing those here. I purposely chose weird insert/extract
indexes for the regression tests to see the effect in these cases. 
For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or
better for these examples.

Differential Revision: http://reviews.llvm.org/D15096

llvm-svn: 256394
2015-12-24 21:17:56 +00:00
..
ADCE
AddDiscriminators
AlignmentFromAssumptions
ArgumentPromotion
AtomicExpand [IR] Add support for floating pointer atomic loads and stores 2015-12-16 00:49:36 +00:00
BBVectorize
BDCE
BranchFolding Move branch folding test to a better location. 2015-12-03 19:41:25 +00:00
CodeExtractor
CodeGenPrepare Remove double blanks. NFC. 2015-12-19 18:26:53 +00:00
ConstantHoisting
ConstantMerge
ConstProp IR: Make ConstantDataArray::getFP actually return a ConstantDataArray 2015-12-09 21:21:07 +00:00
CorrelatedValuePropagation
CrossDSOCFI Cross-DSO control flow integrity (LLVM part). 2015-12-15 23:00:08 +00:00
DeadArgElim [OperandBundles] Have DeadArgElim play nice with operand bundles 2015-12-23 09:58:36 +00:00
DeadStoreElimination Revert r255247, r255265, and r255286 due to serious compile-time regressions. 2015-12-11 18:39:41 +00:00
EarlyCSE [EarlyCSE] DSE of atomic unordered stores 2015-12-17 18:50:50 +00:00
EliminateAvailableExternally
Float2Int [Float2Int] Don't operate on vector instructions 2015-12-09 21:08:18 +00:00
FunctionAttrs
FunctionImport [ThinLTO] Metadata linking for imported functions 2015-12-17 17:14:09 +00:00
GCOVProfiling
GlobalDCE
GlobalOpt Also add unnamed_addr to functions. 2015-12-22 20:43:30 +00:00
GVN [IR] Reformulate LLVM's EH funclet IR 2015-12-12 05:38:55 +00:00
IndVarSimplify [IndVars] Have getInsertPointForUses preserve LCSSA 2015-12-08 00:13:21 +00:00
Inline Determine callee's hotness and adjust threshold based on that. NFC. 2015-12-22 00:32:35 +00:00
InstCombine [InstCombine] transform more extract/insert pairs into shuffles (PR2109) 2015-12-24 21:17:56 +00:00
InstMerge
InstSimplify
Internalize
IPConstantProp
IRCE
JumpThreading [BPI] Replace weights by probabilities in BPI. 2015-12-22 18:56:14 +00:00
LCSSA [WinEH] Update LCSSA to handle catchswitch with handlers inside and outside a loop 2015-12-18 18:12:35 +00:00
LICM
LoadCombine
LoopDeletion
LoopDistribute
LoopIdiom
LoopInterchange
LoopLoadElim
LoopReroll
LoopRotate
LoopSimplify
LoopStrengthReduce [IR] Remove terminatepad 2015-12-14 18:34:23 +00:00
LoopUnroll AMDGPU: Switch barrier intrinsics to using convergent 2015-12-19 01:46:41 +00:00
LoopUnswitch [IR] Reformulate LLVM's EH funclet IR 2015-12-12 05:38:55 +00:00
LoopVectorize [LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions. 2015-12-15 22:45:09 +00:00
LowerAtomic
LowerBitSets [cfi] Fix LowerBitSets on 32-bit targets. 2015-12-21 22:14:04 +00:00
LowerExpectIntrinsic
LowerInvoke
LowerSwitch
Mem2Reg [Mem2Reg] Respect optnone 2015-12-11 13:36:59 +00:00
MemCpyOpt
MergeFunc [IR] Remove terminatepad 2015-12-14 18:34:23 +00:00
MetaRenamer
NaryReassociate [NaryReassociate] allow candidate to have a different type 2015-12-18 21:36:30 +00:00
ObjCARC
PartiallyInlineLibCalls
PGOProfile [PGO] make profile prefix even shorter and more readable 2015-12-15 00:32:56 +00:00
PhaseOrdering
PlaceSafepoints
PruneEH [OperandBundles] Have PruneEH work correct with operand bundles. 2015-12-08 23:16:52 +00:00
Reassociate
Reg2Mem
RewriteStatepointsForGC [RS4GC] Fix base pair printing for constants. 2015-12-23 00:19:45 +00:00
SafeStack [safestack] Add option for non-TLS unsafe stack pointer. 2015-12-22 00:13:11 +00:00
SampleProfile SamplePGO - Add initial support for inliner annotations. 2015-11-27 23:14:51 +00:00
Scalarizer
ScalarRepl
SCCP
SeparateConstOffsetFromGEP
SimplifyCFG [SimplifyCFG] Don't create unnecessary PHIs 2015-12-16 14:12:44 +00:00
Sink [IR] Reformulate LLVM's EH funclet IR 2015-12-12 05:38:55 +00:00
SLPVectorizer [NFC] Update horizontal reduction test cases. 2015-12-16 17:22:24 +00:00
SpeculativeExecution
SROA
StraightLineStrengthReduce
StripDeadPrototypes
StripSymbols
StructurizeCFG
TailCallElim [OperandBundles] Have TailCallElim play nice with operand bundles 2015-12-23 09:58:43 +00:00
TailDup
Util Clean up the processing of dbg.value in various places 2015-12-19 02:02:44 +00:00