1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-31 16:02:52 +01:00
Commit Graph

12 Commits

Author SHA1 Message Date
Chad Rosier
26513932a2 Typos.
llvm-svn: 133128
2011-06-16 01:24:24 +00:00
Chad Rosier
66fa658a4b Revision r128665 added an optimization to make use of NEON multiplier
accumulator forwarding.  Specifically (from SVN log entry):

Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

Make sure it catches cases where operand 1 is add/fadd/sub/fsub, which was
intended in the original revision.

llvm-svn: 133127
2011-06-16 01:21:54 +00:00
Evan Cheng
64850406cf Distribute (A + B) * C to (A * C) + (B * C) to make use of NEON multiplier
accumulator forwarding:
vadd d3, d0, d1
vmul d3, d3, d2
=>
vmul d3, d0, d2
vmla d3, d1, d2

llvm-svn: 128665
2011-03-31 19:38:48 +00:00
Evan Cheng
ed09135349 Add intrinsics @llvm.arm.neon.vmulls and @llvm.arm.neon.vmullu.* back. Frontends
was lowering them to sext / uxt + mul instructions. Unfortunately the
optimization passes may hoist the extensions out of the loop and separate them.
When that happens, the long multiplication instructions can be broken into
several scalar instructions, causing significant performance issue.

Note the vmla and vmls intrinsics are not added back. Frontend will codegen them
as intrinsics vmull* + add / sub. Also note the isel optimizations for catching
mul + sext / zext are not changed either.

First part of rdar://8832507, rdar://9203134

llvm-svn: 128502
2011-03-29 23:06:19 +00:00
Evan Cheng
5bcaef9cc9 Optimizing (zext A + zext B) * C, to (VMULL A, C) + (VMULL B, C) during
isel lowering to fold the zero-extend's and take advantage of no-stall
back to back vmul + vmla:
 vmull q0, d4, d6
 vmlal q0, d5, d6
is faster than
 vaddl q0, d4, d5
 vmovl q1, d6                                                                                                                                                                             
 vmul  q0, q0, q1

This allows us to vmull + vmlal for:
    f = vmull_u8(   vget_high_u8(s), c);
    f = vmlal_u8(f, vget_low_u8(s),  c);

rdar://9197392

llvm-svn: 128444
2011-03-29 01:56:09 +00:00
Bob Wilson
ff0e6b1252 Recognize sign/zero-extended constant BUILD_VECTORs for VMULL operations.
We need to check if the individual vector elements are sign/zero-extended
values.  For now this only handles constants values.  Radar 8687140.

llvm-svn: 120034
2010-11-23 19:38:38 +00:00
Bob Wilson
3348d2eb50 Remove NEON vmull, vmlal, and vmlsl intrinsics, replacing them with multiply,
add, and subtract operations with zero-extended or sign-extended vectors.
Update tests.  Add auto-upgrade support for the old intrinsics.

llvm-svn: 112773
2010-09-01 23:50:19 +00:00
Dan Gohman
c024fd43c9 Fix tests to use fadd, fsub, and fmul, instead of add, sub, and mul,
when the type is floating-point.

llvm-svn: 102969
2010-05-03 22:36:46 +00:00
Bob Wilson
011e458c11 Merge a bunch of NEON tests into larger files so they run faster.
llvm-svn: 83667
2009-10-09 20:20:54 +00:00
Bob Wilson
c7baa2832f Convert more NEON tests to use FileCheck.
llvm-svn: 83507
2009-10-07 23:47:21 +00:00
Dan Gohman
142428ce64 Eliminate more uses of llvm-as and llvm-dis.
llvm-svn: 81293
2009-09-09 00:09:15 +00:00
Bob Wilson
6db76aaf10 Add support for ARM's Advanced SIMD (NEON) instruction set.
This is still a work in progress but most of the NEON instruction set
is supported.

llvm-svn: 73919
2009-06-22 23:27:02 +00:00