llvm-mirror/test/CodeGen/ARM/fpcmp-opt.ll

; RUN: llc -mtriple=arm-eabi -mcpu=cortex-a8 -mattr=+vfp2 -enable-unsafe-fp-math %s -o - \
; RUN:  | FileCheck %s

; rdar://7461510
; rdar://10964603

; Disable this optimization unless we know one of them is zero.
define arm_apcscc i32 @t1(float* %a, float* %b) nounwind {
entry:
; CHECK-LABEL: t1:
; CHECK: vldr [[S0:s[0-9]+]],
; CHECK: vldr [[S1:s[0-9]+]],
; CHECK: vcmpe.f32 [[S1]], [[S0]]
; CHECK: vmrs APSR_nzcv, fpscr
; CHECK: beq
  %0 = load float* %a
  %1 = load float* %b
  %2 = fcmp une float %0, %1
  br i1 %2, label %bb1, label %bb2

bb1:
  %3 = call i32 @bar()
  ret i32 %3

bb2:
  %4 = call i32 @foo()
  ret i32 %4
}

; If one side is zero, the other size sign bit is masked off to allow
; +0.0 == -0.0
define arm_apcscc i32 @t2(double* %a, double* %b) nounwind {
entry:
; CHECK-LABEL: t2:
; CHECK-NOT: vldr
; CHECK: ldrd [[REG1:(r[0-9]+)]], [[REG2:(r[0-9]+)]], [r0]
; CHECK-NOT: b LBB
; CHECK: bfc [[REG2]], #31, #1
; CHECK: cmp [[REG1]], #0
; CHECK: cmpeq [[REG2]], #0
; CHECK-NOT: vcmpe.f32
; CHECK-NOT: vmrs
; CHECK: bne
  %0 = load double* %a
  %1 = fcmp oeq double %0, 0.000000e+00
  br i1 %1, label %bb1, label %bb2

bb1:
  %2 = call i32 @bar()
  ret i32 %2

bb2:
  %3 = call i32 @foo()
  ret i32 %3
}

define arm_apcscc i32 @t3(float* %a, float* %b) nounwind {
entry:
; CHECK-LABEL: t3:
; CHECK-NOT: vldr
; CHECK: ldr [[REG3:(r[0-9]+)]], [r0]
; CHECK: mvn [[REG4:(r[0-9]+)]], #-2147483648
; CHECK: tst [[REG3]], [[REG4]]
; CHECK-NOT: vcmpe.f32
; CHECK-NOT: vmrs
; CHECK: bne
  %0 = load float* %a
  %1 = fcmp oeq float %0, 0.000000e+00
  br i1 %1, label %bb1, label %bb2

bb1:
  %2 = call i32 @bar()
  ret i32 %2

bb2:
  %3 = call i32 @foo()
  ret i32 %3
}

declare i32 @bar()
declare i32 @foo()
ARM: fixup more tests to specify the target more explicitly This changes the tests that were targeting ARM EABI to explicitly specify the environment rather than relying on the default. This breaks with the new Windows on ARM support when running the tests on Windows where the default environment is no longer EABI. Take the opportunity to avoid a pointless redirect (helps when trying to debug with providing a command line invocation which can be copy and pasted) and removing a few greps in favour of FileCheck. llvm-svn: 205541 2014-04-03 18:01:44 +02:00			`; RUN: llc -mtriple=arm-eabi -mcpu=cortex-a8 -mattr=+vfp2 -enable-unsafe-fp-math %s -o - \`
			`; RUN: \| FileCheck %s`

Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met: 1. The arguments are f32. 2. The arguments are loads and they have no uses other than the comparison. 3. The comparison code is EQ or NE. e.g. vldr.32 s0, [r1] vldr.32 s1, [r0] vcmpe.f32 s1, s0 vmrs apsr_nzcv, fpscr beq LBB0_2 => ldr r1, [r1] ldr r0, [r0] cmp r0, r1 beq LBB0_2 More complicated cases will be implemented in subsequent patches. llvm-svn: 107852 2010-07-08 04:08:50 +02:00			`; rdar://7461510`
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; rdar://10964603`
Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met: 1. The arguments are f32. 2. The arguments are loads and they have no uses other than the comparison. 3. The comparison code is EQ or NE. e.g. vldr.32 s0, [r1] vldr.32 s1, [r0] vcmpe.f32 s1, s0 vmrs apsr_nzcv, fpscr beq LBB0_2 => ldr r1, [r1] ldr r0, [r0] cmp r0, r1 beq LBB0_2 More complicated cases will be implemented in subsequent patches. llvm-svn: 107852 2010-07-08 04:08:50 +02:00
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; Disable this optimization unless we know one of them is zero.`
Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met: 1. The arguments are f32. 2. The arguments are loads and they have no uses other than the comparison. 3. The comparison code is EQ or NE. e.g. vldr.32 s0, [r1] vldr.32 s1, [r0] vcmpe.f32 s1, s0 vmrs apsr_nzcv, fpscr beq LBB0_2 => ldr r1, [r1] ldr r0, [r0] cmp r0, r1 beq LBB0_2 More complicated cases will be implemented in subsequent patches. llvm-svn: 107852 2010-07-08 04:08:50 +02:00			`define arm_apcscc i32 @t1(float* %a, float* %b) nounwind {`
			`entry:`
Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@\([A-Za-z0-9_]\)(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;\(.\)\([A-Za-z0-9_-]\):\( \)$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280 2013-07-14 08:24:09 +02:00			`; CHECK-LABEL: t1:`
Fix RA-dependent test. llvm-svn: 151958 2012-03-03 01:26:30 +01:00			`; CHECK: vldr [[S0:s[0-9]+]],`
			`; CHECK: vldr [[S1:s[0-9]+]],`
			`; CHECK: vcmpe.f32 [[S1]], [[S0]]`
ARM case-insensitive checking for APSR_nzcv. rdar://11056591 llvm-svn: 152846 2012-03-15 22:34:14 +01:00			`; CHECK: vmrs APSR_nzcv, fpscr`
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; CHECK: beq`
Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met: 1. The arguments are f32. 2. The arguments are loads and they have no uses other than the comparison. 3. The comparison code is EQ or NE. e.g. vldr.32 s0, [r1] vldr.32 s1, [r0] vcmpe.f32 s1, s0 vmrs apsr_nzcv, fpscr beq LBB0_2 => ldr r1, [r1] ldr r0, [r0] cmp r0, r1 beq LBB0_2 More complicated cases will be implemented in subsequent patches. llvm-svn: 107852 2010-07-08 04:08:50 +02:00			`%0 = load float* %a`
			`%1 = load float* %b`
			`%2 = fcmp une float %0, %1`
			`br i1 %2, label %bb1, label %bb2`

			`bb1:`
			`%3 = call i32 @bar()`
			`ret i32 %3`

			`bb2:`
			`%4 = call i32 @foo()`
			`ret i32 %4`
			`}`

Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; If one side is zero, the other size sign bit is masked off to allow`
			`; +0.0 == -0.0`
Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0. llvm-svn: 108258 2010-07-13 21:27:42 +02:00			`define arm_apcscc i32 @t2(double* %a, double* %b) nounwind {`
			`entry:`
Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@\([A-Za-z0-9_]\)(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;\(.\)\([A-Za-z0-9_-]\):\( \)$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280 2013-07-14 08:24:09 +02:00			`; CHECK-LABEL: t2:`
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; CHECK-NOT: vldr`
Allocate local registers in order for optimal coloring. Also avoid locals evicting locals just because they want a cheaper register. Problem: MI Sched knows exactly how many registers we have and assumes they can be colored. In cases where we have large blocks, usually from unrolled loops, greedy coloring fails. This is a source of "regressions" from the MI Scheduler on x86. I noticed this issue on x86 where we have long chains of two-address defs in the same live range. It's easy to see this in matrix multiplication benchmarks like IRSmk and even the unit test misched-matmul.ll. A fundamental difference between the LLVM register allocator and conventional graph coloring is that in our model a live range can't discover its neighbors, it can only verify its neighbors. That's why we initially went for greedy coloring and added eviction to deal with the hard cases. However, for singly defined and two-address live ranges, we can optimally color without visiting neighbors simply by processing the live ranges in instruction order. Other beneficial side effects: It is much easier to understand and debug regalloc for large blocks when the live ranges are allocated in order. Yes, global allocation is still very confusing, but it's nice to be able to comprehend what happened locally. Heuristics could be added to bias register assignment based on instruction locality (think late register pairing, banks...). Intuituvely this will make some test cases that are on the threshold of register pressure more stable. llvm-svn: 187139 2013-07-25 20:35:14 +02:00			`; CHECK: ldrd [[REG1:(r[0-9]+)]], [[REG2:(r[0-9]+)]], [r0]`
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; CHECK-NOT: b LBB`
			`; CHECK: bfc [[REG2]], #31, #1`
Allocate local registers in order for optimal coloring. Also avoid locals evicting locals just because they want a cheaper register. Problem: MI Sched knows exactly how many registers we have and assumes they can be colored. In cases where we have large blocks, usually from unrolled loops, greedy coloring fails. This is a source of "regressions" from the MI Scheduler on x86. I noticed this issue on x86 where we have long chains of two-address defs in the same live range. It's easy to see this in matrix multiplication benchmarks like IRSmk and even the unit test misched-matmul.ll. A fundamental difference between the LLVM register allocator and conventional graph coloring is that in our model a live range can't discover its neighbors, it can only verify its neighbors. That's why we initially went for greedy coloring and added eviction to deal with the hard cases. However, for singly defined and two-address live ranges, we can optimally color without visiting neighbors simply by processing the live ranges in instruction order. Other beneficial side effects: It is much easier to understand and debug regalloc for large blocks when the live ranges are allocated in order. Yes, global allocation is still very confusing, but it's nice to be able to comprehend what happened locally. Heuristics could be added to bias register assignment based on instruction locality (think late register pairing, banks...). Intuituvely this will make some test cases that are on the threshold of register pressure more stable. llvm-svn: 187139 2013-07-25 20:35:14 +02:00			`; CHECK: cmp [[REG1]], #0`
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; CHECK: cmpeq [[REG2]], #0`
			`; CHECK-NOT: vcmpe.f32`
			`; CHECK-NOT: vmrs`
			`; CHECK: bne`
Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0. llvm-svn: 108258 2010-07-13 21:27:42 +02:00			`%0 = load double* %a`
			`%1 = fcmp oeq double %0, 0.000000e+00`
			`br i1 %1, label %bb1, label %bb2`

			`bb1:`
			`%2 = call i32 @bar()`
			`ret i32 %2`

			`bb2:`
			`%3 = call i32 @foo()`
			`ret i32 %3`
			`}`

			`define arm_apcscc i32 @t3(float* %a, float* %b) nounwind {`
			`entry:`
Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@\([A-Za-z0-9_]\)(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;\(.\)\([A-Za-z0-9_-]\):\( \)$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;\(.\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;\(.\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;\(.\)-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280 2013-07-14 08:24:09 +02:00			`; CHECK-LABEL: t3:`
Neuter the optimization I implemented with r107852 and r108258 which turn some floating point equality comparisons into integer ones with -ffast-math. The issue is the optimization causes +0.0 != -0.0. Now the optimization is only done when one side is known to be 0.0. The other side's sign bit is masked off for the comparison. rdar://10964603 llvm-svn: 151861 2012-03-02 00:27:13 +01:00			`; CHECK-NOT: vldr`
			`; CHECK: ldr [[REG3:(r[0-9]+)]], [r0]`
			`; CHECK: mvn [[REG4:(r[0-9]+)]], #-2147483648`
			`; CHECK: tst [[REG3]], [[REG4]]`
			`; CHECK-NOT: vcmpe.f32`
			`; CHECK-NOT: vmrs`
			`; CHECK: bne`
Extend the r107852 optimization which turns some fp compare to code sequence using only i32 operations. It now optimize some f64 compares when fp compare is exceptionally slow (e.g. cortex-a8). It also catches comparison against 0.0. llvm-svn: 108258 2010-07-13 21:27:42 +02:00			`%0 = load float* %a`
			`%1 = fcmp oeq float %0, 0.000000e+00`
			`br i1 %1, label %bb1, label %bb2`

			`bb1:`
			`%2 = call i32 @bar()`
			`ret i32 %2`

			`bb2:`
			`%3 = call i32 @foo()`
			`ret i32 %3`
			`}`

Optimize some vfp comparisons to integer ones. This patch implements the simplest case when the following conditions are met: 1. The arguments are f32. 2. The arguments are loads and they have no uses other than the comparison. 3. The comparison code is EQ or NE. e.g. vldr.32 s0, [r1] vldr.32 s1, [r0] vcmpe.f32 s1, s0 vmrs apsr_nzcv, fpscr beq LBB0_2 => ldr r1, [r1] ldr r0, [r0] cmp r0, r1 beq LBB0_2 More complicated cases will be implemented in subsequent patches. llvm-svn: 107852 2010-07-08 04:08:50 +02:00			`declare i32 @bar()`
			`declare i32 @foo()`