1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00
llvm-mirror/test/CodeGen/ARM/cortex-a57-misched-stm.ll
David Green 1643bee451 [Scheduling][ARM] Consistently enable PostRA Machine scheduling
In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.

This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.

To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.

Thanks to David Penry for the identifying this problem.

Differential Revision: https://reviews.llvm.org/D69775
2019-11-05 10:44:55 +00:00

30 lines
898 B
LLVM

; REQUIRES: asserts
; RUN: llc < %s -mtriple=armv8r-eabi -mcpu=cortex-a57 -mattr=use-misched -verify-misched -debug-only=machine-scheduler -o - 2>&1 > /dev/null | FileCheck %s
; N=3 STMIB should have latency 2cyc
; CHECK: ********** MI Scheduling **********
; We need second, post-ra scheduling to have STM instruction combined from single-stores
; CHECK: ********** MI Scheduling **********
; CHECK: schedule starting
; CHECK: STMIB
; CHECK: rdefs left
; CHECK-NEXT: Latency : 2
define i32 @test_stm(i32 %v0, i32 %v1, i32* %addr) {
%addr.1 = getelementptr i32, i32* %addr, i32 1
store i32 %v0, i32* %addr.1
%addr.2 = getelementptr i32, i32* %addr, i32 2
store i32 %v1, i32* %addr.2
%addr.3 = getelementptr i32, i32* %addr, i32 3
%val = ptrtoint i32* %addr to i32
store i32 %val, i32* %addr.3
%rv = add i32 %v0, %v1
ret i32 %rv
}