1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 19:42:54 +02:00
Commit Graph

58 Commits

Author SHA1 Message Date
Evan Cheng
64850406cf Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

llvm-svn: 128665
2011-03-31 19:38:48 +00:00
Bob Wilson
438a9a1367 Add Neon VCVT instructions for f32 <-> f16 conversions.
Clang is now providing intrinsics for these and so we need to support them
in the backend.  Radar 8068427.

llvm-svn: 121902
2010-12-15 22:14:12 +00:00
Evan Cheng
12561e250d Code clean up.
llvm-svn: 120965
2010-12-05 23:03:45 +00:00
Evan Cheng
fc78767730 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960
2010-12-05 22:04:16 +00:00
Evan Cheng
b565d1acf9 Add some missing isel predicates on def : pat patterns to avoid generating VFP vmla / vmls (they cause stalls). Disabling them in isel is properly not a right solution, I'll look into a proper solution next.
llvm-svn: 118922
2010-11-12 20:32:20 +00:00
Evan Cheng
eab7251695 Fix preload instruction isel. Only v7 supports pli, and only v7 with mp extension supports pldw. Add subtarget attribute to denote mp extension support and legalize illegal ones to nothing.
llvm-svn: 118160
2010-11-03 06:34:55 +00:00
Bob Wilson
bbb91c6a1c PR8359: The ARM backend may end up allocating registers D16 to D31 when
"-mattr=+vfp3" is specified. However, this will not work for hardware that
only supports 16 registers.  Add a new flag to support -"mattr=+vfp3,+d16".
Patch by Jan Voung!

llvm-svn: 116310
2010-10-12 16:22:47 +00:00
Jim Grosbach
efad965653 Nuke it from orbit. It's the only way to be sure.
(Kill the dead non-MC asm printer for the ARM target.)

llvm-svn: 115127
2010-09-30 01:57:53 +00:00
Evan Cheng
c9cb37516d Teach if-converter to be more careful with predicating instructions that would
take multiple cycles to decode.
For the current if-converter clients (actually only ARM), the instructions that
are predicated on false are not nops. They would still take machine cycles to
decode. Micro-coded instructions such as LDM / STM can potentially take multiple
cycles to decode. If-converter should take treat them as non-micro-coded
simple instructions.

llvm-svn: 113570
2010-09-10 01:29:16 +00:00
Jim Grosbach
1d9631950f 80 column cleanup.
llvm-svn: 111266
2010-08-17 18:39:16 +00:00
Chris Lattner
a556264e06 fix emacs language spec's, patch by Edmund Grimley-Evans!
llvm-svn: 111241
2010-08-17 16:20:04 +00:00
Jim Grosbach
1128a47289 cortex m4 has floating point support, but only single precision.
llvm-svn: 110810
2010-08-11 15:44:15 +00:00
Evan Cheng
f8604b772e Report error if codegen tries to instantiate a ARM target when the cpu does support it. e.g. cortex-m* processors.
llvm-svn: 110798
2010-08-11 07:17:46 +00:00
Evan Cheng
4929ba9d20 ArchV7M implies HW division instructions.
llvm-svn: 110797
2010-08-11 07:00:16 +00:00
Evan Cheng
31e15214c6 ArchV6T2, V7A, and V7M implies Thumb2; Archv7A implies NEON.
llvm-svn: 110796
2010-08-11 06:57:53 +00:00
Evan Cheng
273160895e Add ARM Archv6M and let it implies FeatureDB (having dmb, etc.)
llvm-svn: 110795
2010-08-11 06:51:54 +00:00
Evan Cheng
e5bab36c75 Add Cortex-M0 support. It's a ARMv6m device (no ARM mode) with some 32-bit
instructions: dmb, dsb, isb, msr, and mrs.

llvm-svn: 110786
2010-08-11 06:30:38 +00:00
Evan Cheng
5fca4ca5f9 - Add subtarget feature -mattr=+db which determine whether an ARM cpu has the
memory and synchronization barrier dmb and dsb instructions.
- Change instruction names to something more sensible (matching name of actual
  instructions).
- Added tests for memory barrier codegen.

llvm-svn: 110785
2010-08-11 06:22:01 +00:00
Evan Cheng
15d23d4966 Change -prefer-32bit-thumb to attribute -mattr=+32bit instead to disable more 32-bit to 16-bit optimizations.
llvm-svn: 110584
2010-08-09 18:35:19 +00:00
Evan Cheng
67743f2057 Add an ARM "feature". Cortex-a8 fp comparison is very slow (> 20 cycles).
llvm-svn: 108256
2010-07-13 19:21:50 +00:00
Jim Grosbach
e04cc6cb43 Cleanup of ARMv7M support. Move hardware divide and Thumb2 extract/pack
instructions to subtarget features and update tests to reflect.
PR5717.

llvm-svn: 103136
2010-05-05 23:44:43 +00:00
Jim Grosbach
3630aff780 Add initial support for ARMv7M subtarget and cortex-m3 cpu. Patch by
Jordy <snhjordy@gmail.com>.

Followup patches will add some tests and adjust to use Subtarget features
for the instructions.

llvm-svn: 103119
2010-05-05 20:44:35 +00:00
Anton Korobeynikov
7a3b393469 Some bits of A9 scheduling: VFP
llvm-svn: 100643
2010-04-07 18:19:18 +00:00
Jakob Stoklund Olesen
4c043c50fd Replace TSFlagsFields and TSFlagsShifts with a simpler TSFlags field.
When a target instruction wants to set target-specific flags, it should simply
set bits in the TSFlags bit vector defined in the Instruction TableGen class.

This works well because TableGen resolves member references late:

class I : Instruction {
  AddrMode AM = AddrModeNone;
  let TSFlags{3-0} = AM.Value;
}

let AM = AddrMode4 in
def ADD : I;

TSFlags gets the expected bits from AddrMode4 in this example.

llvm-svn: 100384
2010-04-05 03:10:20 +00:00
Jim Grosbach
8ed11c8bed vml[as] are slow on 1136jf-s also.
llvm-svn: 100066
2010-04-01 00:13:43 +00:00
Jim Grosbach
2a0b14a387 switch the flag for using NEON for SP floating point to a subtarget 'feature'.
Re-commit. This time complete with testsuite updates.

llvm-svn: 99570
2010-03-25 23:47:34 +00:00
Jim Grosbach
97d5bc2b86 need to fix 'make check' tests first. revert for a moment.
llvm-svn: 99569
2010-03-25 23:34:05 +00:00
Jim Grosbach
7e87ba79e6 switch the flag for using NEON for SP floating point to a subtarget 'feature'
llvm-svn: 99568
2010-03-25 23:32:19 +00:00
Jim Grosbach
b97ff2a4c1 switch the use-vml[as] instructions flag to a subtarget 'feature'
llvm-svn: 99565
2010-03-25 23:11:16 +00:00
Anton Korobeynikov
90fcfccc91 Add substarget feature for FP16
llvm-svn: 98503
2010-03-14 18:42:38 +00:00
David Goodwin
6b56e77397 Add ARMv6 itineraries.
llvm-svn: 89218
2009-11-18 18:39:57 +00:00
Anton Korobeynikov
3ba3789153 Use NEON reg-reg moves, where profitable. This reduces "domain-cross" stalls, when we used to mix vfp and neon code (the former were used for reg-reg moves)
llvm-svn: 85764
2009-11-02 00:10:38 +00:00
David Goodwin
a4b73e486e Remove neonfp attribute and instead set default based on CPU string. Add -arm-use-neon-fp to override the default.
llvm-svn: 83218
2009-10-01 22:19:57 +00:00
David Goodwin
d0edce4c0d Restore the -post-RA-scheduler flag as an override for the target specification. Remove -mattr for setting PostRAScheduler enable and instead use CPU string.
llvm-svn: 83215
2009-10-01 21:46:35 +00:00
David Goodwin
a282690f82 Remove -post-RA-schedule flag and add a TargetSubtarget method to enable post-register-allocation scheduling. By default it is off. For ARM, enable/disable with -mattr=+/-postrasched. Enable by default for cortex-a8.
llvm-svn: 83122
2009-09-30 00:10:16 +00:00
David Goodwin
1d72b88015 Checkpoint NEON scheduling itineraries.
llvm-svn: 82657
2009-09-23 21:38:08 +00:00
David Goodwin
2686178489 Allow a zero cycle stage to reserve/require a FU without advancing the cycle counter.
llvm-svn: 78736
2009-08-11 22:38:43 +00:00
David Goodwin
c0fe95d8ce Make NEON single-precision FP support the default for cortex-a8 (again).
llvm-svn: 78430
2009-08-07 23:32:33 +00:00
David Goodwin
9013115518 Disable NEON single-precision FP support for Cortex-A8, for now...
llvm-svn: 78209
2009-08-05 16:40:57 +00:00
David Goodwin
47064aa1c6 By default, for cortex-a8 use NEON for single-precision FP.
llvm-svn: 78200
2009-08-05 16:01:19 +00:00
David Goodwin
99adffe5f2 Initial support for single-precision FP using NEON. Added "neonfp" attribute to enable. Added patterns for some binary FP operations.
llvm-svn: 78081
2009-08-04 17:53:06 +00:00
Evan Cheng
66f1d7a0e1 Add fake v7 itineraries for now.
llvm-svn: 76612
2009-07-21 18:54:14 +00:00
Evan Cheng
18d70ca551 Add a Thumb2 instruction flag to that indicates whether the instruction can be transformed to 16-bit variant.
llvm-svn: 74988
2009-07-08 01:46:35 +00:00
Evan Cheng
f671ce4eba Latency information for ARM v6. It's rough and not yet hooked up. Right now we are only using branch latency to determine if-conversion limits.
llvm-svn: 73747
2009-06-19 01:51:50 +00:00
Anton Korobeynikov
52b2898854 Separate V6 from V6T2 since the latter has some extra nice instructions
llvm-svn: 73085
2009-06-08 21:20:36 +00:00
Anton Korobeynikov
4f5b1ef545 Add placeholder for thumb2 stuff
llvm-svn: 72593
2009-05-29 23:41:08 +00:00
Anton Korobeynikov
1fe3d6d47b Add ARMv7 architecture, Cortex processors and different FPU modes handling.
llvm-svn: 72337
2009-05-23 19:51:43 +00:00
Bob Wilson
b8756b00cd Use CallConvLower.h and TableGen descriptions of the calling conventions
for ARM.  Patch by Sandeep Patel.

llvm-svn: 69371
2009-04-17 19:07:39 +00:00
Evan Cheng
a19ef59d6c Move target independent td files from lib/Target/ to include/llvm/Target so they can be distributed along with the header files.
llvm-svn: 59953
2008-11-24 07:34:46 +00:00
Evan Cheng
aa24d19533 Remove opcode from instruction TS flags; add MOVCC support; fix addrmode3 encoding bug.
llvm-svn: 58800
2008-11-06 08:47:38 +00:00