Change Thumb2 jumptable codegen to one that uses two level jumps:
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
llvm-svn: 77024
2009-07-25 02:33:29 +02:00
|
|
|
; RUN: llvm-as < %s | llc -mtriple=thumbv7-apple-darwin | FileCheck %s
|
2009-07-24 08:01:46 +02:00
|
|
|
; RUN: llvm-as < %s | llc -mtriple=thumbv7-apple-darwin -relocation-model=pic | FileCheck %s
|
2009-06-25 01:47:58 +02:00
|
|
|
|
|
|
|
define void @bar(i32 %n.u) {
|
|
|
|
entry:
|
2009-07-24 08:01:46 +02:00
|
|
|
; CHECK: bar:
|
2009-07-29 04:18:14 +02:00
|
|
|
; CHECK: tbb
|
2009-07-31 20:35:56 +02:00
|
|
|
; CHECK: .align 1
|
Change Thumb2 jumptable codegen to one that uses two level jumps:
Before:
adr r12, #LJTI3_0_0
ldr pc, [r12, +r0, lsl #2]
LJTI3_0_0:
.long LBB3_24
.long LBB3_30
.long LBB3_31
.long LBB3_32
After:
adr r12, #LJTI3_0_0
add pc, r12, +r0, lsl #2
LJTI3_0_0:
b.w LBB3_24
b.w LBB3_30
b.w LBB3_31
b.w LBB3_32
This has several advantages.
1. This will make it easier to optimize this to a TBB / TBH instruction +
(smaller) table.
2. This eliminate the need for ugly asm printer hack to force the address
into thumb addresses (bit 0 is one).
3. Same codegen for pic and non-pic.
4. This eliminate the need to align the table so constantpool island pass
won't have to over-estimate the size.
Based on my calculation, the later is probably slightly faster as well since
ldr pc with shifter address is very slow. That is, it should be a win as long
as the HW implementation can do a reasonable job of branch predict the second
branch.
llvm-svn: 77024
2009-07-25 02:33:29 +02:00
|
|
|
|
2009-06-25 01:47:58 +02:00
|
|
|
switch i32 %n.u, label %bb12 [i32 1, label %bb i32 2, label %bb6 i32 4, label %bb7 i32 5, label %bb8 i32 6, label %bb10 i32 7, label %bb1 i32 8, label %bb3 i32 9, label %bb4 i32 10, label %bb9 i32 11, label %bb2 i32 12, label %bb5 i32 13, label %bb11 ]
|
|
|
|
bb:
|
|
|
|
tail call void(...)* @foo1()
|
|
|
|
ret void
|
|
|
|
bb1:
|
|
|
|
tail call void(...)* @foo2()
|
|
|
|
ret void
|
|
|
|
bb2:
|
|
|
|
tail call void(...)* @foo6()
|
|
|
|
ret void
|
|
|
|
bb3:
|
|
|
|
tail call void(...)* @foo3()
|
|
|
|
ret void
|
|
|
|
bb4:
|
|
|
|
tail call void(...)* @foo4()
|
|
|
|
ret void
|
|
|
|
bb5:
|
|
|
|
tail call void(...)* @foo5()
|
|
|
|
ret void
|
|
|
|
bb6:
|
|
|
|
tail call void(...)* @foo1()
|
|
|
|
ret void
|
|
|
|
bb7:
|
|
|
|
tail call void(...)* @foo2()
|
|
|
|
ret void
|
|
|
|
bb8:
|
|
|
|
tail call void(...)* @foo6()
|
|
|
|
ret void
|
|
|
|
bb9:
|
|
|
|
tail call void(...)* @foo3()
|
|
|
|
ret void
|
|
|
|
bb10:
|
|
|
|
tail call void(...)* @foo4()
|
|
|
|
ret void
|
|
|
|
bb11:
|
|
|
|
tail call void(...)* @foo5()
|
|
|
|
ret void
|
|
|
|
bb12:
|
|
|
|
tail call void(...)* @foo6()
|
|
|
|
ret void
|
|
|
|
}
|
|
|
|
|
|
|
|
declare void @foo1(...)
|
|
|
|
declare void @foo2(...)
|
|
|
|
declare void @foo6(...)
|
|
|
|
declare void @foo3(...)
|
|
|
|
declare void @foo4(...)
|
|
|
|
declare void @foo5(...)
|