1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

19 Commits

Author SHA1 Message Date
Sjoerd Meijer
dbb2ea77e4 [ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2)
This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs
for performance reasons on the Cortex-M4 and Cortex-M33.  This is a serie of 2
patches, that is trying to achieve the same for VFMA.  The second column in the
table below shows what we were generating before rL342874, the third column
what changed with rL342874, and the last column what we want to achieve with
these 2 patches:

 --------------------------------------------------------
 | Opt   |  < rL342874   |  >= rL342874   |             |
 |------------------------------------------------------|
 |-O3    |     vmla      |      vmul      |     vmul    |
 |       |               |      vadd      |     vadd    |
 |------------------------------------------------------|
 |-Ofast |     vfma      |      vfma      |     vmul    |
 |       |               |                |     vadd    |
 |------------------------------------------------------|
 |-Oz    |     vmla      |      vmla      |     vmla    |
 --------------------------------------------------------

This patch 1/2, is a cleanup of the spaghetti predicate logic on the different
VMLA and VFMA codegen rules, so that we can make the final functional change in
patch 2/2.  This also fixes a typo in the regression test added in rL342874.

Differential revision: https://reviews.llvm.org/D53314

llvm-svn: 344671
2018-10-17 07:26:35 +00:00
Sjoerd Meijer
d5015b6840 [ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33
A sequence of VMUL and VADD instructions always give the same or better
performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33.
Executing the VMUL and VADD back-to-back requires the same cycles, but
having separate instructions allows scheduling to avoid the hazard between
these 2 instructions.

Differential Revision: https://reviews.llvm.org/D52289

llvm-svn: 342874
2018-09-24 12:02:50 +00:00
Sjoerd Meijer
e5082955c1 [ARM] fixed some tabs/whitespaces in test. NFC.
llvm-svn: 324074
2018-02-02 11:51:06 +00:00
Saleem Abdulrasool
53e8c5a4af ARM: fixup more tests to specify the target more explicitly
This changes the tests that were targeting ARM EABI to explicitly specify the
environment rather than relying on the default.  This breaks with the new
Windows on ARM support when running the tests on Windows where the default
environment is no longer EABI.

Take the opportunity to avoid a pointless redirect (helps when trying to debug
with providing a command line invocation which can be copy and pasted) and
removing a few greps in favour of FileCheck.

llvm-svn: 205541
2014-04-03 16:01:44 +00:00
Stephen Lin
7e501cf4c3 Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally.
This update was done with the following bash script:

  find test/CodeGen -name "*.ll" | \
  while read NAME; do
    echo "$NAME"
    if ! grep -q "^; *RUN: *llc.*debug" $NAME; then
      TEMP=`mktemp -t temp`
      cp $NAME $TEMP
      sed -n "s/^define [^@]*@\([A-Za-z0-9_]*\)(.*$/\1/p" < $NAME | \
      while read FUNC; do
        sed -i '' "s/;\(.*\)\([A-Za-z0-9_-]*\):\( *\)$FUNC: *\$/;\1\2-LABEL:\3$FUNC:/g" $TEMP
      done
      sed -i '' "s/;\(.*\)-LABEL-LABEL:/;\1-LABEL:/" $TEMP
      sed -i '' "s/;\(.*\)-NEXT-LABEL:/;\1-NEXT:/" $TEMP
      sed -i '' "s/;\(.*\)-NOT-LABEL:/;\1-NOT:/" $TEMP
      sed -i '' "s/;\(.*\)-DAG-LABEL:/;\1-DAG:/" $TEMP
      mv $TEMP $NAME
    fi
  done

llvm-svn: 186280
2013-07-14 06:24:09 +00:00
Bob Wilson
3daeb462cb This patch combines several changes from Evan Cheng for rdar://8659675.
Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Enable these fp vmlx codegen changes for Cortex-A9.

llvm-svn: 129775
2011-04-19 18:11:57 +00:00
Evan Cheng
b565d1acf9 Add some missing isel predicates on def : pat patterns to avoid generating VFP vmla / vmls (they cause stalls). Disabling them in isel is properly not a right solution, I'll look into a proper solution next.
llvm-svn: 118922
2010-11-12 20:32:20 +00:00
Evan Cheng
bc4588c439 Re-commit 117518 and 117519 now that ARM MC test failures are out of the way.
llvm-svn: 117531
2010-10-28 06:47:08 +00:00
Evan Cheng
fdc80a0316 Revert 117518 and 117519 for now. They changed scheduling and cause MC tests to fail. Ugh.
llvm-svn: 117520
2010-10-28 02:00:25 +00:00
Evan Cheng
5c358e02ea - Assign load / store with shifter op address modes the right itinerary classes.
- For now, loads of [r, r] addressing mode is the same as the
  [r, r lsl/lsr/asr #] variants. ARMBaseInstrInfo::getOperandLatency() should
  identify the former case and reduce the output latency by 1.
- Also identify [r, r << 2] case. This special form of shifter addressing mode
  is "free".

llvm-svn: 117519
2010-10-28 01:49:06 +00:00
Evan Cheng
6397a77e16 Change ARM scheduling default to list-hybrid if the target supports floating point instructions (and is not using soft float).
llvm-svn: 104307
2010-05-21 00:43:17 +00:00
Jim Grosbach
2a0b14a387 switch the flag for using NEON for SP floating point to a subtarget 'feature'.
Re-commit. This time complete with testsuite updates.

llvm-svn: 99570
2010-03-25 23:47:34 +00:00
Edward O'Callaghan
d1c7b40bb5 Convert ARM tests to FileCheck for PR5307.
llvm-svn: 89593
2009-11-22 14:23:33 +00:00
Jim Grosbach
ea6c9c17f5 Use Unified Assembly Syntax for the ARM backend.
llvm-svn: 86494
2009-11-09 00:11:35 +00:00
Jim Grosbach
5b094f3b36 vml[as].f32 cause stalls in following advanced SIMD instructions. Avoid using
them for scalar floating point operations for now.

llvm-svn: 85697
2009-10-31 22:57:36 +00:00
David Goodwin
a4b73e486e Remove neonfp attribute and instead set default based on CPU string. Add -arm-use-neon-fp to override the default.
llvm-svn: 83218
2009-10-01 22:19:57 +00:00
Dan Gohman
142428ce64 Eliminate more uses of llvm-as and llvm-dis.
llvm-svn: 81293
2009-09-09 00:09:15 +00:00
David Goodwin
c0fe95d8ce Make NEON single-precision FP support the default for cortex-a8 (again).
llvm-svn: 78430
2009-08-07 23:32:33 +00:00
David Goodwin
99adffe5f2 Initial support for single-precision FP using NEON. Added "neonfp" attribute to enable. Added patterns for some binary FP operations.
llvm-svn: 78081
2009-08-04 17:53:06 +00:00