mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-22 18:54:02 +01:00
cdd50ed2ff
Consider the following loop: void foo(float *dst, float *src, int N) { for (int i = 0; i < N; i++) { dst[i] = 0.0; for (int j = 0; j < N; j++) { dst[i] += src[(i * N) + j]; } } } When we are not building with -Ofast we may attempt to vectorise the inner loop using ordered reductions instead. In addition we also try to select an appropriate interleave count for the inner loop. However, when choosing a VF=1 the inner loop will be scalar and there is existing code in selectInterleaveCount that limits the interleave count to 2 for reductions due to concerns about increasing the critical path. For ordered reductions this problem is even worse due to the additional data dependency, and so I've added code to simply disable interleaving for scalar ordered reductions for now. Test added here: Transforms/LoopVectorize/AArch64/strict-fadd-vf1.ll Differential Revision: https://reviews.llvm.org/D106646 |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
LoadStoreVectorizer.cpp | ||
LoopVectorizationLegality.cpp | ||
LoopVectorizationPlanner.h | ||
LoopVectorize.cpp | ||
SLPVectorizer.cpp | ||
VectorCombine.cpp | ||
Vectorize.cpp | ||
VPlan.cpp | ||
VPlan.h | ||
VPlanDominatorTree.h | ||
VPlanHCFGBuilder.cpp | ||
VPlanHCFGBuilder.h | ||
VPlanLoopInfo.h | ||
VPlanPredicator.cpp | ||
VPlanPredicator.h | ||
VPlanSLP.cpp | ||
VPlanTransforms.cpp | ||
VPlanTransforms.h | ||
VPlanValue.h | ||
VPlanVerifier.cpp | ||
VPlanVerifier.h | ||
VPRecipeBuilder.h |