llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 20:12:56 +02:00

History

Sanjay Patel b4b4a9aeb1 [InstCombine] transform more extract/insert pairs into shuffles (PR2109) This is an extension of the shuffle combining from r203229: http://reviews.llvm.org/rL203229 The idea is to widen a short input vector with undef elements so the existing shuffle transform for extract/insert can kick in. The motivation is to finally solve PR2109: https://llvm.org/bugs/show_bug.cgi?id=2109 For that example, the IR becomes: %1 = bitcast <2 x i32>* %P to <2 x float>* %ld1 = load <2 x float>, <2 x float>* %1, align 8 %2 = shufflevector <2 x float> %ld1, <2 x float> undef, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %i2 = shufflevector <4 x float> %A, <4 x float> %2, <4 x i32> <i32 0, i32 1, i32 4, i32 5> ret <4 x float> %i2 And x86 SSE output improves from: movq (%rdi), %xmm1 ## xmm1 = mem[0],zero movdqa %xmm1, %xmm2 shufps $229, %xmm2, %xmm2 ## xmm2 = xmm2[1,1,2,3] shufps $48, %xmm0, %xmm1 ## xmm1 = xmm1[0,0],xmm0[3,0] shufps $132, %xmm1, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0,2] shufps $32, %xmm0, %xmm2 ## xmm2 = xmm2[0,0],xmm0[2,0] shufps $36, %xmm2, %xmm0 ## xmm0 = xmm0[0,1],xmm2[2,0] retq To the almost optimal: movhpd (%rdi), %xmm0 Note: There's a tension in the existing transform related to generating arbitrary shufflevector masks. We avoid that in other places in InstCombine because we're scared that codegen can't handle strange masks, but it looks like we're ok with producing those here. I purposely chose weird insert/extract indexes for the regression tests to see the effect in these cases. For PowerPC+Altivec, AArch64, and X86+SSE/AVX, I think the codegen is equal or better for these examples. Differential Revision: http://reviews.llvm.org/D15096 llvm-svn: 256394		2015-12-24 21:17:56 +00:00
..
ADCE
AddDiscriminators
AlignmentFromAssumptions
ArgumentPromotion
AtomicExpand	[IR] Add support for floating pointer atomic loads and stores	2015-12-16 00:49:36 +00:00
BBVectorize
BDCE
BranchFolding	Move branch folding test to a better location.	2015-12-03 19:41:25 +00:00
CodeExtractor
CodeGenPrepare	Remove double blanks. NFC.	2015-12-19 18:26:53 +00:00
ConstantHoisting
ConstantMerge
ConstProp	IR: Make ConstantDataArray::getFP actually return a ConstantDataArray	2015-12-09 21:21:07 +00:00
CorrelatedValuePropagation
CrossDSOCFI	Cross-DSO control flow integrity (LLVM part).	2015-12-15 23:00:08 +00:00
DeadArgElim	[OperandBundles] Have DeadArgElim play nice with operand bundles	2015-12-23 09:58:36 +00:00
DeadStoreElimination	Revert r255247, r255265, and r255286 due to serious compile-time regressions.	2015-12-11 18:39:41 +00:00
EarlyCSE	[EarlyCSE] DSE of atomic unordered stores	2015-12-17 18:50:50 +00:00
EliminateAvailableExternally
Float2Int	[Float2Int] Don't operate on vector instructions	2015-12-09 21:08:18 +00:00
FunctionAttrs
FunctionImport	[ThinLTO] Metadata linking for imported functions	2015-12-17 17:14:09 +00:00
GCOVProfiling
GlobalDCE
GlobalOpt	Also add unnamed_addr to functions.	2015-12-22 20:43:30 +00:00
GVN	[IR] Reformulate LLVM's EH funclet IR	2015-12-12 05:38:55 +00:00
IndVarSimplify	[IndVars] Have getInsertPointForUses preserve LCSSA	2015-12-08 00:13:21 +00:00
Inline	Determine callee's hotness and adjust threshold based on that. NFC.	2015-12-22 00:32:35 +00:00
InstCombine	[InstCombine] transform more extract/insert pairs into shuffles (PR2109)	2015-12-24 21:17:56 +00:00
InstMerge
InstSimplify
Internalize
IPConstantProp
IRCE
JumpThreading	[BPI] Replace weights by probabilities in BPI.	2015-12-22 18:56:14 +00:00
LCSSA	[WinEH] Update LCSSA to handle catchswitch with handlers inside and outside a loop	2015-12-18 18:12:35 +00:00
LICM
LoadCombine
LoopDeletion
LoopDistribute
LoopIdiom
LoopInterchange
LoopLoadElim
LoopReroll
LoopRotate
LoopSimplify
LoopStrengthReduce	[IR] Remove terminatepad	2015-12-14 18:34:23 +00:00
LoopUnroll	AMDGPU: Switch barrier intrinsics to using convergent	2015-12-19 01:46:41 +00:00
LoopUnswitch	[IR] Reformulate LLVM's EH funclet IR	2015-12-12 05:38:55 +00:00
LoopVectorize	[LoopVectorizer] Refine loop vectorizer's register usage calculator by ignoring specific instructions.	2015-12-15 22:45:09 +00:00
LowerAtomic
LowerBitSets	[cfi] Fix LowerBitSets on 32-bit targets.	2015-12-21 22:14:04 +00:00
LowerExpectIntrinsic
LowerInvoke
LowerSwitch
Mem2Reg	[Mem2Reg] Respect optnone	2015-12-11 13:36:59 +00:00
MemCpyOpt
MergeFunc	[IR] Remove terminatepad	2015-12-14 18:34:23 +00:00
MetaRenamer
NaryReassociate	[NaryReassociate] allow candidate to have a different type	2015-12-18 21:36:30 +00:00
ObjCARC
PartiallyInlineLibCalls
PGOProfile	[PGO] make profile prefix even shorter and more readable	2015-12-15 00:32:56 +00:00
PhaseOrdering
PlaceSafepoints
PruneEH	[OperandBundles] Have PruneEH work correct with operand bundles.	2015-12-08 23:16:52 +00:00
Reassociate
Reg2Mem
RewriteStatepointsForGC	[RS4GC] Fix base pair printing for constants.	2015-12-23 00:19:45 +00:00
SafeStack	[safestack] Add option for non-TLS unsafe stack pointer.	2015-12-22 00:13:11 +00:00
SampleProfile	SamplePGO - Add initial support for inliner annotations.	2015-11-27 23:14:51 +00:00
Scalarizer
ScalarRepl
SCCP
SeparateConstOffsetFromGEP
SimplifyCFG	[SimplifyCFG] Don't create unnecessary PHIs	2015-12-16 14:12:44 +00:00
Sink	[IR] Reformulate LLVM's EH funclet IR	2015-12-12 05:38:55 +00:00
SLPVectorizer	[NFC] Update horizontal reduction test cases.	2015-12-16 17:22:24 +00:00
SpeculativeExecution
SROA
StraightLineStrengthReduce
StripDeadPrototypes
StripSymbols
StructurizeCFG
TailCallElim	[OperandBundles] Have TailCallElim play nice with operand bundles	2015-12-23 09:58:43 +00:00
TailDup
Util	Clean up the processing of dbg.value in various places	2015-12-19 02:02:44 +00:00