1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-25 04:02:41 +01:00
Commit Graph

20 Commits

Author SHA1 Message Date
Florian Hahn
06a6d27ef3 [ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB
Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is
broken due to 2 separate bugs:

1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing
   rules to expand them to non pseudo MIR. These are selected for
   ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD.

2) Selection of the right VLD/VST instruction is broken for load and
   store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called
   with MIR opcode for fixed writeback (ie increment is access size)
   and call getVLDSTRegisterUpdateOpcode() to select an opcode with
   register writeback if base register update is of a different size.
   Since getVLDSTRegisterUpdateOpcode() only knows about
   VLD1/VLD2/VST1/VST2 the call is currently conditional on the number
   of element in the vector.

   However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for
   load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not
   updated which later lead to a fixed writeback instruction being
   constructed with an extra operand for the register writeback.

This patch addresses the two issues as follows:
- it adds the necessary mapping from VLD1d64TPseudoWB_register and
  VLD1d64QPseudoWB_register to VLD1d64Twb_register and
  VLD1d64Qwb_register respectively. Like for the existing _fixed
  variants, the cost of these is bumped for unaligned access.
- it changes the logic in SelectVLD and SelectVSD to call isVLDfixed
  and isVSTfixed respectively to decide whether the opcode should be
  updated. It also reworks the logic and comments for pushing the
  writeback offset operand and r0 operand to clarify the logic:
  writeback offset needs to be pushed if it's a register writeback,
  r0 needs to be pushed if not and the instruction is a
  VLD1/VLD2/VST1/VST2.

Reviewers: rengolin, t.p.northover, samparker

Reviewed By: samparker

Patch by Thomas Preud'homme <thomas.preudhomme@arm.com>

Differential Revision: https://reviews.llvm.org/D42970

llvm-svn: 326570
2018-03-02 13:02:55 +00:00
Jeroen Ketema
b9ecf8a3ee [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
David Blaikie
ab043ff680 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
0d99339102 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Saleem Abdulrasool
53e8c5a4af ARM: fixup more tests to specify the target more explicitly
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default.  This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.

Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.

llvm-svn: 205541
2014-04-03 16:01:44 +00:00
Jiangning Liu
47e6e27d8b For ARM, fix assertuib failures for some ld/st 3/4 instruction with wirteback.
llvm-svn: 199369
2014-01-16 09:16:13 +00:00
Stephen Lin
7e501cf4c3 Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Kristof Beyls
a686678676 Make ARMAsmPrinter generate the correct alignment specifier syntax in instructions.
The Printer will now print instructions with the correct alignment specifier syntax, like
    vld1.8  {d16}, [r0:64]

llvm-svn: 175884
2013-02-22 10:01:33 +00:00
Chad Rosier
72bd34ca71 [fast-isel] Remove -disable-arm-fast-isel option. -fast-isel=0 suffices. Minor cleanup.
llvm-svn: 156632
2012-05-11 19:40:25 +00:00
Eric Christopher
2fbd7a6280 Make this test disable fast isel as it's not needed.
llvm-svn: 130165
2011-04-25 22:39:46 +00:00
Bob Wilson
65f4a70b82 Add codegen support for using post-increment NEON load/store instructions.
The vld1-lane, vld1-dup and vst1-lane instructions do not yet support using
post-increment versions, but all the rest of the NEON load/store instructions
should be handled now.

llvm-svn: 125014
2011-02-07 17:43:21 +00:00
Bob Wilson
ff8139baa0 Set alignment operand for NEON VST instructions.
llvm-svn: 114709
2010-09-23 23:42:37 +00:00
Bob Wilson
31d487d235 Change ARM VFP VLDM/VSTM instructions to use addressing mode #4, just like
all the other LDM/STM instructions.  This fixes asm printer crashes when
compiling with -O0.  I've changed one of the NEON tests (vst3.ll) to run
with -O0 to check this in the future.

Prior to this change VLDM/VSTM used addressing mode #5, but not really.
The offset field was used to hold a count of the number of registers being
loaded or stored, and the AM5 opcode field was expanded to specify the IA
or DB mode, instead of the standard ADD/SUB specifier.  Much of the backend
was not aware of these special cases.  The crashes occured when rewriting
a frameindex caused the AM5 offset field to be changed so that it did not
have a valid submode.  I don't know exactly what changed to expose this now.
Maybe we've never done much with -O0 and NEON.  Regardless, there's no longer
any reason to keep a count of the VLDM/VSTM registers, so we can use
addressing mode #4 and clean things up in a lot of places.

llvm-svn: 112322
2010-08-27 23:18:17 +00:00
Bob Wilson
c01101e76c Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Bob Wilson
2e6cd50a50 Fix tests for Neon load/store intrinsics to match the i8* types expected by
the intrinsics.  The reason for those i8* types is that the intrinsics are
overloaded on the vector type and we don't have a way to declare an intrinsic
where one argument is an overloaded vector type and another argument is a
pointer to the vector element type.  The bitcasts added here will match what
the frontend will typically generate when these intrinsics are used.

llvm-svn: 101840
2010-04-20 00:17:16 +00:00
Bob Wilson
8aa1d328b5 Add codegen support for NEON vst3 intrinsics with <1 x i64> vectors.
llvm-svn: 83518
2009-10-08 00:28:28 +00:00
Bob Wilson
af14187764 Add codegen support for NEON vst3 intrinsics with 128-bit vectors.
llvm-svn: 83484
2009-10-07 20:30:08 +00:00
Dan Gohman
142428ce64 Eliminate more uses of llvm-as and llvm-dis.
llvm-svn: 81293
2009-09-09 00:09:15 +00:00
Bob Wilson
d64e304671 Use vAny type to get rid of Neon intrinsics that differed only in whether
the overloaded vector types allowed floating-point or integer vector elements.
Most of these operations actually depend on the element type, so bitcasting
was not an option.

If you include the vpadd intrinsics that I updated earlier, this gets rid
of 20 intrinsics.

llvm-svn: 78646
2009-08-11 05:39:44 +00:00
Bob Wilson
bd7627b23e Implement Neon VST[234] operations.
llvm-svn: 78330
2009-08-06 18:47:44 +00:00