1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-24 05:23:45 +02:00
Commit Graph

41 Commits

Author SHA1 Message Date
Jeroen Ketema
b9ecf8a3ee [ARM][NEON] Use address space in vld([1234]|[234]lane) and vst([1234]|[234]lane) instructions
This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234],
vst[234]lane ARM neon intrinsics and associates an address space with the
pointer that these intrinsics take. This changes, e.g.,

<2 x i32> @llvm.arm.neon.vld1.v2i32(i8*, i32)

to

<2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8*, i32)

This change ensures that address spaces are fully taken into account in the ARM
target during lowering of interleaved loads and stores.

Differential Revision: http://reviews.llvm.org/D12985

llvm-svn: 248887
2015-09-30 10:56:37 +00:00
David Blaikie
ab043ff680 [opaque pointer type] Add textual IR support for explicit type parameter to load instruction
Essentially the same as the GEP change in r230786.

A similar migration script can be used to update test cases, though a few more
test case improvements/changes were required this time around: (r229269-r229278)

import fileinput
import sys
import re

pat = re.compile(r"((?:=|:|^)\s*load (?:atomic )?(?:volatile )?(.*?))(| addrspace\(\d+\) *)\*($| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$)")

for line in sys.stdin:
  sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7649

llvm-svn: 230794
2015-02-27 21:17:42 +00:00
David Blaikie
0d99339102 [opaque pointer type] Add textual IR support for explicit type parameter to getelementptr instruction
One of several parallel first steps to remove the target type of pointers,
replacing them with a single opaque pointer type.

This adds an explicit type parameter to the gep instruction so that when the
first parameter becomes an opaque pointer type, the type to gep through is
still available to the instructions.

* This doesn't modify gep operators, only instructions (operators will be
  handled separately)

* Textual IR changes only. Bitcode (including upgrade) and changing the
  in-memory representation will be in separate changes.

* geps of vectors are transformed as:
    getelementptr <4 x float*> %x, ...
  ->getelementptr float, <4 x float*> %x, ...
  Then, once the opaque pointer type is introduced, this will ultimately look
  like:
    getelementptr float, <4 x ptr> %x
  with the unambiguous interpretation that it is a vector of pointers to float.

* address spaces remain on the pointer, not the type:
    getelementptr float addrspace(1)* %x
  ->getelementptr float, float addrspace(1)* %x
  Then, eventually:
    getelementptr float, ptr addrspace(1) %x

Importantly, the massive amount of test case churn has been automated by
same crappy python code. I had to manually update a few test cases that
wouldn't fit the script's model (r228970,r229196,r229197,r229198). The
python script just massages stdin and writes the result to stdout, I
then wrapped that in a shell script to handle replacing files, then
using the usual find+xargs to migrate all the files.

update.py:
import fileinput
import sys
import re

ibrep = re.compile(r"(^.*?[^%\w]getelementptr inbounds )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")
normrep = re.compile(       r"(^.*?[^%\w]getelementptr )(((?:<\d* x )?)(.*?)(| addrspace\(\d\)) *\*(|>)(?:$| *(?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|\[\[[a-zA-Z]|\{\{).*$))")

def conv(match, line):
  if not match:
    return line
  line = match.groups()[0]
  if len(match.groups()[5]) == 0:
    line += match.groups()[2]
  line += match.groups()[3]
  line += ", "
  line += match.groups()[1]
  line += "\n"
  return line

for line in sys.stdin:
  if line.find("getelementptr ") == line.find("getelementptr inbounds"):
    if line.find("getelementptr inbounds") != line.find("getelementptr inbounds ("):
      line = conv(re.match(ibrep, line), line)
  elif line.find("getelementptr ") != line.find("getelementptr ("):
    line = conv(re.match(normrep, line), line)
  sys.stdout.write(line)

apply.sh:
for name in "$@"
do
  python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name"
  rm -f "$name.tmp"
done

The actual commands:
From llvm/src:
find test/ -name *.ll | xargs ./apply.sh
From llvm/src/tools/clang:
find test/ -name *.mm -o -name *.m -o -name *.cpp -o -name *.c | xargs -I '{}' ../../apply.sh "{}"
From llvm/src/tools/polly:
find test/ -name *.ll | xargs ./apply.sh

After that, check-all (with llvm, clang, clang-tools-extra, lld,
compiler-rt, and polly all checked out).

The extra 'rm' in the apply.sh script is due to a few files in clang's test
suite using interesting unicode stuff that my python script was throwing
exceptions on. None of those files needed to be migrated, so it seemed
sufficient to ignore those cases.

Reviewers: rafael, dexonsmith, grosser

Differential Revision: http://reviews.llvm.org/D7636

llvm-svn: 230786
2015-02-27 19:29:02 +00:00
Tim Northover
3bb84c9bcc ARM & AArch64: make use of common cmpxchg idioms after expansion
The C and C++ semantics for compare_exchange require it to return a bool
indicating success. This gets mapped to LLVM IR which follows each cmpxchg with
an icmp of the value loaded against the desired value.

When lowered to ldxr/stxr loops, this extra comparison is redundant: its
results are implicit in the control-flow of the function.

This commit makes two changes: it replaces that icmp with appropriate PHI
nodes, and then makes sure earlyCSE is called after expansion to actually make
use of the opportunities revealed.

I've also added -{arm,aarch64}-enable-atomic-tidy options, so that
existing fragile tests aren't perturbed too much by the change. Many
of them either rely on undef/unreachable too pervasively to be
restored to something well-defined (particularly while making sure
they test the same obscure assert from many years ago), or depend on a
particular CFG shape, which is disrupted by SimplifyCFG.

rdar://problem/16227836

llvm-svn: 209883
2014-05-30 10:09:59 +00:00
Tim Northover
f94bdee15e ARM: use LLVM IR to represent the vshrn operation
vshrn is just the combination of a right shift and a truncate (and the limits
on the immediate value actually mean the signedness of the shift doesn't
matter). Using that representation allows us to get rid of an ARM-specific
intrinsic, share more code with AArch64 and hopefully get better code out of
the mid-end optimisers.

llvm-svn: 201085
2014-02-10 14:04:07 +00:00
Matthias Braun
434fbd854b Revert "Tests: Be less dependent on a specific schedule/regalloc"
This reverts r192454

Apparently FileCheck isn't as smart as I though and does not enforce a
topological order between variable defs+uses.

llvm-svn: 192472
2013-10-11 18:09:19 +00:00
Matthias Braun
4beef11e35 Tests: Be less dependent on a specific schedule/regalloc
llvm-svn: 192454
2013-10-11 15:40:12 +00:00
Tim Northover
cec1079024 ARM: implement some simple f64 materializations.
Previously we used a const-pool load for virtually all 64-bit floating values.
Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov"
instructions of one stripe or another.

llvm-svn: 188773
2013-08-20 08:57:11 +00:00
Stephen Lin
7e501cf4c3 Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Kristof Beyls
a686678676 Make ARMAsmPrinter generate the correct alignment specifier syntax in instructions.
The Printer will now print instructions with the correct alignment specifier syntax, like
    vld1.8  {d16}, [r0:64]

llvm-svn: 175884
2013-02-22 10:01:33 +00:00
Jakob Stoklund Olesen
6f2b596e57 Enable the new coalescer algorithm by default.
The new coalescer is better at merging values into unused vector lanes,
improving NEON code.

llvm-svn: 164794
2012-09-27 21:06:02 +00:00
Evan Cheng
959ad65636 Try to make these tests more portable.
llvm-svn: 164320
2012-09-20 21:35:21 +00:00
Evan Cheng
82c85585f9 Use vld1 / vst2 for unaligned v2f64 load / store. e.g. Use vld1.16 for 2-byte
aligned address. Based on patch by David Peixotto.

Also use vld1.64 / vst1.64 with 128-bit alignment to take advantage of alignment
hints. rdar://12090772, rdar://12238782

llvm-svn: 164089
2012-09-18 01:42:45 +00:00
Nadav Rotem
2729f54295 This commit contains a few changes that had to go in together.
1. Simplify xor/and/or (bitcast(A), bitcast(B)) -> bitcast(op (A,B))
   (and also scalar_to_vector).

2. Xor/and/or are indifferent to the swizzle operation (shuffle of one src).
   Simplify xor/and/or (shuff(A), shuff(B)) -> shuff(op (A, B))

3. Optimize swizzles of shuffles:  shuff(shuff(x, y), undef) -> shuff(x, y).

4. Fix an X86ISelLowering optimization which was very bitcast-sensitive.

Code which was previously compiled to this:

movd    (%rsi), %xmm0
movdqa  .LCPI0_0(%rip), %xmm2
pshufb  %xmm2, %xmm0
movd    (%rdi), %xmm1
pshufb  %xmm2, %xmm1
pxor    %xmm0, %xmm1
pshufb  .LCPI0_1(%rip), %xmm1
movd    %xmm1, (%rdi)
ret

Now compiles to this:

movl    (%rsi), %eax
xorl    %eax, (%rdi)
ret

llvm-svn: 153848
2012-04-01 19:31:22 +00:00
Jim Grosbach
4a2f107b04 ARM VLDR/VSTR instructions don't need a size suffix.
Canonicallize on the non-suffixed form, but continue to accept assembly that
has any correctly sized type suffix.

llvm-svn: 144583
2011-11-14 23:03:21 +00:00
Benjamin Kramer
89ebc7ab4b Simplify some uses of utohexstr.
As a side effect hex is printed lowercase instead of uppercase now.

llvm-svn: 144013
2011-11-07 21:00:59 +00:00
Owen Anderson
7a380bac06 Remove VMOVDneon and VMOVQ, which are just aliases for VORR. This continues to simplify the path towards an auto-generated disassembler.
llvm-svn: 135290
2011-07-15 18:46:47 +00:00
Jakob Stoklund Olesen
33f01d005c Fix ARM tests to be register allocator independent.
llvm-svn: 128680
2011-03-31 22:14:03 +00:00
Evan Cheng
fc78767730 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960
2010-12-05 22:04:16 +00:00
Evan Cheng
67db408634 Two sets of changes. Sorry they are intermingled.
1. Fix pre-ra scheduler so it doesn't try to push instructions above calls to
   "optimize for latency". Call instructions don't have the right latency and
   this is more likely to use introduce spills.
2. Fix if-converter cost function. For ARM, it should use instruction latencies,
   not # of micro-ops since multi-latency instructions is completely executed
   even when the predicate is false. Also, some instruction will be "slower"
   when they are predicated due to the register def becoming implicit input.
   rdar://8598427

llvm-svn: 118135
2010-11-03 00:45:17 +00:00
Andrew Trick
4a3b819c1f putback r116983 and fix simple-fp-encoding.ll tests
llvm-svn: 116992
2010-10-21 03:40:16 +00:00
Owen Anderson
7da515c665 Revert r116983, which is breaking all the buildbots.
llvm-svn: 116987
2010-10-21 03:11:16 +00:00
Evan Cheng
0b9eaaf45d Add missing scheduling itineraries for transfers between core registers and VFP registers.
llvm-svn: 116983
2010-10-21 01:12:00 +00:00
Evan Cheng
15fc769cf2 Correct some load / store instruction itinerary mistakes:
1. Cortex-A8 load / store multiplies can only issue on ALU0.
2. Eliminate A8_Issue, A8_LSPipe will correctly limit the load / store issues.
3. Correctly model all vld1 and vld2 variants.

llvm-svn: 116134
2010-10-09 01:03:04 +00:00
Bob Wilson
8689a52c10 Change register allocation order for ARM VFP and NEON registers to put the
callee-saved registers at the end of the lists.  Also prefer to avoid using
the low registers that are in register subclasses required by certain
instructions, so that those registers will more likely be available when needed.
This change makes a huge improvement in spilling in some cases.  Thanks to
Jakob for helping me realize the problem.

Most of this patch is fixing the testsuite.  There are quite a few places
where we're checking for specific registers.  I changed those to wildcards
in places where that doesn't weaken the tests.  The spill-q.ll and
thumb2-spill-q.ll tests stopped spilling with this change, so I added a bunch
of live values to force spills on those tests.

llvm-svn: 116055
2010-10-08 06:15:13 +00:00
Bob Wilson
8951c7592c Convert VLD1 and VLD2 instructions to use pseudo-instructions until
after regalloc.

llvm-svn: 112825
2010-09-02 16:00:54 +00:00
Bob Wilson
c01101e76c Add alignment arguments to all the NEON load/store intrinsics.
Update all the tests using those intrinsics and add support for
auto-upgrading bitcode files with the old versions of the intrinsics.

llvm-svn: 112271
2010-08-27 17:13:24 +00:00
Bob Wilson
c3856a5130 Replace some NEON vmovl intrinsic that I missed earlier.
llvm-svn: 111696
2010-08-20 23:22:43 +00:00
Bob Wilson
7feb850d36 Use a target-specific VMOVIMM DAG node instead of BUILD_VECTOR to represent
NEON VMOV-immediate instructions.  This simplifies some things.

llvm-svn: 108275
2010-07-13 21:16:48 +00:00
Bob Wilson
f15e542bdc Print "dregpair" NEON operands with a space between them, for readability and
consistency with other instructions that have lists of register operands.

llvm-svn: 107944
2010-07-09 00:47:20 +00:00
Bob Wilson
dd02fe62a2 Reenable DAG combining for vector shuffles. It looks like it was temporarily
disabled and then never turned back on again.  Adjust some tests, one because
this change avoids an unnecessary instruction, and the other to make it
continue testing what it was intended to test.

llvm-svn: 107941
2010-07-09 00:38:12 +00:00
Dan Gohman
d79ac4a097 Eliminate the other half of the BRCOND optimization, and update
as many tests as possible.

llvm-svn: 106749
2010-06-24 15:24:03 +00:00
Rafael Espindola
d7a63bead9 Remove arm_apcscc from the test files. It is the default and doing this
matches what llvm-gcc and clang now produce.

llvm-svn: 106221
2010-06-17 15:18:27 +00:00
Evan Cheng
849bca1ab6 Fix some latency computation bugs: if the use is not a machine opcode do not just return zero.
llvm-svn: 105061
2010-05-28 23:26:21 +00:00
Evan Cheng
6397a77e16 Change ARM scheduling default to list-hybrid if the target supports floating point instructions (and is not using soft float).
llvm-svn: 104307
2010-05-21 00:43:17 +00:00
Jakob Stoklund Olesen
6a2bfde3c8 TwoAddressInstructionPass doesn't really know how to merge live intervals when
lowering REG_SEQUENCE instructions.

Insert copies for REG_SEQUENCE sources not killed to avoid breaking later passes.

llvm-svn: 104146
2010-05-19 20:08:00 +00:00
Evan Cheng
9fc34e676d Fix PR7162: Use source register classes and sub-indices to determine the correct register class of the definitions of REG_SEQUENCE.
llvm-svn: 104050
2010-05-18 20:03:28 +00:00
Evan Cheng
8aa900cf16 Fix PR7175. Insert copies of a REG_SEQUENCE source if it is used by other REG_SEQUENCE instructions.
llvm-svn: 103994
2010-05-17 23:24:12 +00:00
Evan Cheng
378d6c5d76 Fix PR7156. If the sources of a REG_SEQUENCE are all IMPLICIT_DEF's. Replace it with an IMPLICIT_DEF rather than deleting it or else it would be left without a def.
llvm-svn: 103984
2010-05-17 22:09:49 +00:00
Evan Cheng
bb0a4fbe13 Careful with reg_sequence coalescing to not to overwrite sub-register indices.
llvm-svn: 103971
2010-05-17 20:57:12 +00:00
Evan Cheng
3bce87c79f Turn on -neon-reg-sequence by default.
Using NEON load / store multiple instructions will no longer create gobs of vmov of D registers!

llvm-svn: 103960
2010-05-17 19:51:20 +00:00