mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2025-02-01 05:01:59 +01:00
[SROA] Fix PR25873, which Andrea Di Biagio analyzed the daylights out
of, and I misdiagnosed for months and months. Andrea has had a patch for this forever, but I just couldn't see how it was fixing the root cause of the problem. It didn't make sense to me, even though the patch was perfectly good and the analysis of the actual failure event was *fantastic*. Well, I came back to it today because the patch has sat for *far* too long and needs attention and decided I wouldn't let it go until I really understood what was going on. After quite some time in the debugger, I finally realized that in fact I had just missed an important case with my previous attempt to fix PR22093 in r225149. Not only do we need to handle loads that won't be split, but stores-of-loads that we won't split. We *do* actually have enough logic in the presplitting to form new slices for split stores.... *unless* we decided not to split them! I'm so sorry that it took me this long to come to the realization that this is the issue. It seems so obvious in hind sight (of course). Anyways, the fix becomes *much* smaller and more focused. The fact that we're left doing integer smashing is related to the FIXME in my original commit: fundamentally, we're not aggressive about pre-splitting for loads and stores to the same alloca. If we want to get aggressive about this, it'll need both what Andrea had put into the proposed fix, but also a *lot* more logic to essentially iteratively pre-split the alloca until we can't do any more. As I said in that commit log, its really unclear that this is the right call. Instead, the integer blending and letting targets lower this to narrower stores seems slightly better. But we definitely shouldn't really go down that path just to fix this bug. Again, tons of thanks are owed to Andrea and others at Sony for working on this bug. I really should have seen what was going on here and re-directed them sooner. =//// llvm-svn: 263121
This commit is contained in:
parent
b2fd0878a4
commit
75dc58b589
@ -3372,11 +3372,15 @@ bool SROA::presplitLoadsAndStores(AllocaInst &AI, AllocaSlices &AS) {
|
||||
for (auto &P : AS.partitions()) {
|
||||
for (Slice &S : P) {
|
||||
Instruction *I = cast<Instruction>(S.getUse()->getUser());
|
||||
if (!S.isSplittable() ||S.endOffset() <= P.endOffset()) {
|
||||
// If this was a load we have to track that it can't participate in any
|
||||
// pre-splitting!
|
||||
if (!S.isSplittable() || S.endOffset() <= P.endOffset()) {
|
||||
// If this is a load we have to track that it can't participate in any
|
||||
// pre-splitting. If this is a store of a load we have to track that
|
||||
// that load also can't participate in any pre-splitting.
|
||||
if (auto *LI = dyn_cast<LoadInst>(I))
|
||||
UnsplittableLoads.insert(LI);
|
||||
else if (auto *SI = dyn_cast<StoreInst>(I))
|
||||
if (auto *LI = dyn_cast<LoadInst>(SI->getValueOperand()))
|
||||
UnsplittableLoads.insert(LI);
|
||||
continue;
|
||||
}
|
||||
assert(P.endOffset() > S.beginOffset() &&
|
||||
|
@ -1633,3 +1633,39 @@ entry:
|
||||
%load = load i16, i16* %bc2
|
||||
ret i16 %load
|
||||
}
|
||||
|
||||
%struct.STest = type { %struct.SPos, %struct.SPos }
|
||||
%struct.SPos = type { float, float }
|
||||
|
||||
define void @PR25873(%struct.STest* %outData) {
|
||||
; CHECK-LABEL: @PR25873(
|
||||
; CHECK: store i32 1123418112
|
||||
; CHECK: store i32 1139015680
|
||||
; CHECK: %[[HIZEXT:.*]] = zext i32 1139015680 to i64
|
||||
; CHECK: %[[HISHL:.*]] = shl i64 %[[HIZEXT]], 32
|
||||
; CHECK: %[[HIMASK:.*]] = and i64 undef, 4294967295
|
||||
; CHECK: %[[HIINSERT:.*]] = or i64 %[[HIMASK]], %[[HISHL]]
|
||||
; CHECK: %[[LOZEXT:.*]] = zext i32 1123418112 to i64
|
||||
; CHECK: %[[LOMASK:.*]] = and i64 %[[HIINSERT]], -4294967296
|
||||
; CHECK: %[[LOINSERT:.*]] = or i64 %[[LOMASK]], %[[LOZEXT]]
|
||||
; CHECK: store i64 %[[LOINSERT]]
|
||||
entry:
|
||||
%tmpData = alloca %struct.STest, align 8
|
||||
%0 = bitcast %struct.STest* %tmpData to i8*
|
||||
call void @llvm.lifetime.start(i64 16, i8* %0)
|
||||
%x = getelementptr inbounds %struct.STest, %struct.STest* %tmpData, i64 0, i32 0, i32 0
|
||||
store float 1.230000e+02, float* %x, align 8
|
||||
%y = getelementptr inbounds %struct.STest, %struct.STest* %tmpData, i64 0, i32 0, i32 1
|
||||
store float 4.560000e+02, float* %y, align 4
|
||||
%m_posB = getelementptr inbounds %struct.STest, %struct.STest* %tmpData, i64 0, i32 1
|
||||
%1 = bitcast %struct.STest* %tmpData to i64*
|
||||
%2 = bitcast %struct.SPos* %m_posB to i64*
|
||||
%3 = load i64, i64* %1, align 8
|
||||
store i64 %3, i64* %2, align 8
|
||||
%4 = bitcast %struct.STest* %outData to i8*
|
||||
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %4, i8* %0, i64 16, i32 4, i1 false)
|
||||
call void @llvm.lifetime.end(i64 16, i8* %0)
|
||||
ret void
|
||||
}
|
||||
|
||||
declare void @llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture, i64, i32, i1) nounwind
|
||||
|
Loading…
x
Reference in New Issue
Block a user