mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-22 10:42:39 +01:00
1643bee451
In the ARM backend, for historical reasons we have only some targets using Machine Scheduling. The rest use the old list scheduler as they are using itinaries and the list scheduler seems to produce better code (and not crash running out of register on v6m codes). So whether to use the MIScheduler or not is checked at runtime from the subtarget features. This is fine, except for post-ra scheduling. Whether to use the old post-ra list scheduler or the post-ra machine schedule is decided as the pass manager is set up, in arms case from a newly constructed subtarget. Under some situations, like LTO, this won't include the correct cpu so can pick the wrong option. This can have a surprising effect on performance. To fix that, this patch overrides targetSchedulesPostRAScheduling and addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and picking at runtime which to execute. To pick between the two I've had to add a enablePostRAMachineScheduler() method that normally returns enableMachineScheduler() && enablePostRAScheduler(), which can be overridden to enable just one of PostRAMachineScheduler vs PostRAScheduler. Thanks to David Penry for the identifying this problem. Differential Revision: https://reviews.llvm.org/D69775
30 lines
898 B
LLVM
30 lines
898 B
LLVM
; REQUIRES: asserts
|
|
; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-a57 -mattr=use-misched -verify-misched -debug-only=machine-scheduler -o - 2>&1 > /dev/null | FileCheck %s
|
|
; N=3 STMIB should have latency 2cyc
|
|
|
|
; CHECK: ********** MI Scheduling **********
|
|
; We need second, post-ra scheduling to have STM instruction combined from single-stores
|
|
; CHECK: ********** MI Scheduling **********
|
|
; CHECK: schedule starting
|
|
; CHECK: STMIB
|
|
; CHECK: rdefs left
|
|
; CHECK-NEXT: Latency : 2
|
|
|
|
define i32 @test_stm(i32 %v0, i32 %v1, i32* %addr) {
|
|
|
|
%addr.1 = getelementptr i32, i32* %addr, i32 1
|
|
store i32 %v0, i32* %addr.1
|
|
|
|
%addr.2 = getelementptr i32, i32* %addr, i32 2
|
|
store i32 %v1, i32* %addr.2
|
|
|
|
%addr.3 = getelementptr i32, i32* %addr, i32 3
|
|
%val = ptrtoint i32* %addr to i32
|
|
store i32 %val, i32* %addr.3
|
|
|
|
%rv = add i32 %v0, %v1
|
|
|
|
ret i32 %rv
|
|
}
|
|
|