llvm-mirror/test/CodeGen/PowerPC/swaps-le-3.ll

; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu -O3 < %s | FileCheck %s

; This test verifies that VSX swap optimization works for the
; doubleword splat idiom.

@a = external global <2 x double>, align 16
@b = external global <2 x double>, align 16

define void @test(double %s) {
entry:
  %0 = insertelement <2 x double> undef, double %s, i32 0
  %1 = shufflevector <2 x double> %0, <2 x double> undef, <2 x i32> zeroinitializer
  %2 = load <2 x double>, <2 x double>* @a, align 16
  %3 = fadd <2 x double> %0, %2
  store <2 x double> %3, <2 x double>* @b, align 16
  ret void
}

; CHECK-LABEL: @test
; CHECK-DAG: xxspltd
; CHECK-DAG: lxvd2x
; CHECK: xvadddp
; CHECK: stxvd2x
; CHECK-NOT: xxswapd
[PPC64LE] Teach swap optimization about the doubleword splat idiom With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285 2015-07-02 19:03:06 +02:00			`; RUN: llc -mcpu=pwr8 -mtriple=powerpc64le-unknown-linux-gnu -O3 < %s \| FileCheck %s`

			`; This test verifies that VSX swap optimization works for the`
			`; doubleword splat idiom.`

			`@a = external global <2 x double>, align 16`
			`@b = external global <2 x double>, align 16`

			`define void @test(double %s) {`
			`entry:`
			`%0 = insertelement <2 x double> undef, double %s, i32 0`
			`%1 = shufflevector <2 x double> %0, <2 x double> undef, <2 x i32> zeroinitializer`
			`%2 = load <2 x double>, <2 x double>* @a, align 16`
			`%3 = fadd <2 x double> %0, %2`
			`store <2 x double> %3, <2 x double>* @b, align 16`
			`ret void`
			`}`

			`; CHECK-LABEL: @test`
[PPC64LE] More improvements to VSX swap optimization This patch allows VSX swap optimization to succeed more frequently. Specifically, it is concerned with common code sequences that occur when copying a scalar floating-point value to a vector register. This patch currently handles cases where the floating-point value is already in a register, but does not yet handle loads (such as via an LXSDX scalar floating-point VSX load). That will be dealt with later. A typical case is when a scalar value comes in as a floating-point parameter. The value is copied into a virtual VSFRC register, and then a sequence of SUBREG_TO_REG and/or COPY operations will convert it to a full vector register of the class required by the context. If this vector register is then used as part of a lane-permuted computation, the original scalar value will be in the wrong lane. We can fix this by adding a swap operation following any widening SUBREG_TO_REG operation. Additional COPY operations may be needed around the swap operation in order to keep register assignment happy, but these are pro forma operations that will be removed by coalescing. If a scalar value is otherwise directly referenced in a computation (such as by one of the many XS* vector-scalar operations), we currently disable swap optimization. These operations are lane-sensitive by definition. A MentionsPartialVR flag is added for use in each swap table entry that mentions a scalar floating-point register without having special handling defined. A common idiom for PPC64LE is to convert a double-precision scalar to a vector by performing a splat operation. This ensures that the value can be referenced as V[0], as it would be for big endian, whereas just converting the scalar to a vector with a SUBREG_TO_REG operation leaves this value only in V[1]. A doubleword splat operation is one form of an XXPERMDI instruction, which takes one doubleword from a first operand and another doubleword from a second operand, with a two-bit selector operand indicating which doublewords are chosen. In the general case, an XXPERMDI can be permitted in a lane-swapped region provided that it is properly transformed to select the corresponding swapped values. This transformation is to reverse the order of the two input operands, and to reverse and complement the bits of the selector operand (derivation left as an exercise to the reader ;). A new test case that exercises the scalar-to-vector and generalized XXPERMDI transformations is added as CodeGen/PowerPC/swaps-le-5.ll. The patch also requires a change to CodeGen/PowerPC/swaps-le-3.ll to use CHECK-DAG instead of CHECK for two independent instructions that now appear in reverse order. There are two small unrelated changes that are added with this patch. First, the XXSLDWI instruction was incorrectly omitted from the list of lane-sensitive instructions; this is now fixed. Second, I observed that the same webs were being rejected over and over again for different reasons. Since it's sufficient to reject a web only once, I added a check for this to speed up the compilation time slightly. llvm-svn: 242081 2015-07-14 00:58:19 +02:00			`; CHECK-DAG: xxspltd`
			`; CHECK-DAG: lxvd2x`
[PPC64LE] Teach swap optimization about the doubleword splat idiom With a previous patch, the VSX swap optimization is able to recognize the doubleword load-splat idiom that can be implemented using lxvdsx. However, that does not cover a doubleword splat where the source is a register. We can implement this using xxspltd (a special form of xxpermdi). This patch teaches the swap optimization pass about this idiom. As a prerequisite, it also permits swap optimization to succeed for all forms of SUBREG_TO_REG. Previously we were conservative and only allowed SUBREG_TO_REG when it copied a full register. However, on reflection any form of SUBREG_TO_REG is safe in and of itself, so long as an unsafe operation is not performed on its result. In particular, a widening SUBREG_TO_REG often occurs as an input to a doubleword splat idiom, particularly in auto-vectorized code. The doubleword splat idiom is an XXPERMDI operation where both source registers are identical, and the selection mask is either 0 (splat the first element) or 3 (splat the second element). To determine whether the registers are identical, we use the existing mechanism for looking through "copy-like" operations. That mechanism has a side effect of marking the XXPERMDI operation as using a physical register, which would invalidate its presence in a swap-optimized region. This is correct for the form of XXPERMDI that performs a swap and hence would be removed, but is not what we want for a doubleword-splat variety of XXPERMDI. Therefore we reset the physical-register flag on the XXPERMDI when it represents a splat. A simple test case is added to verify that we generate the splat and that we also remove the xxswapd instructions that would otherwise be associated with the load and store of another operand. llvm-svn: 241285 2015-07-02 19:03:06 +02:00			`; CHECK: xvadddp`
			`; CHECK: stxvd2x`
			`; CHECK-NOT: xxswapd`