1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-20 03:23:01 +02:00
llvm-mirror/lib
Evan Cheng fc78767730 Making use of VFP / NEON floating point multiply-accumulate / subtraction is
difficult on current ARM implementations for a few reasons.
1. Even though a single vmla has latency that is one cycle shorter than a pair
   of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause
   additional pipeline stall. So it's frequently better to single codegen
   vmul + vadd.
2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to
   stall for 4 cycles. We need to schedule them apart.
3. A vmla followed vmla is a special case. Obvious issuing back to back RAW
   vmla + vmla is very bad. But this isn't ideal either:
     vmul
     vadd
     vmla
   Instead, we want to expand the second vmla:
     vmla
     vmul
     vadd
   Even with the 4 cycle vmul stall, the second sequence is still 2 cycles
   faster.

Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough
but it isn't the optimial solution. This patch attempts to make it possible to
use vmla / vmls in cases where it is profitable.

A. Add missing isel predicates which cause vmla to be codegen'ed.
B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to
   compute a fmul and a fmla.
C. Add additional isel checks for vmla, avoid cases where vmla is feeding into
   fp instructions (except for the #3 exceptional case).
D. Add ARM hazard recognizer to model the vmla / vmls hazards.
E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the
   vmla / vmls will trigger one of the special hazards.

Work in progress, only A+B are enabled.

llvm-svn: 120960
2010-12-05 22:04:16 +00:00
..
Analysis Also inore '()' while creating mdnode name from ObjC symbol name. 2010-12-03 23:40:45 +00:00
Archive Merge System into Support. 2010-11-29 18:16:10 +00:00
AsmParser Add a new 'hotpatch' attribute. This attribute will insert a two-byte no-op 2010-10-25 15:37:09 +00:00
Bitcode Generalize the darwin wrapper hack to work with generic macho triples as well as darwin ones. 2010-11-29 23:29:54 +00:00
CodeGen Remove the PHIElimination.h header, as it is no longer needed. 2010-12-05 21:39:42 +00:00
CompilerDriver Now to chant the magical incantation that will exorcise the System library 2010-11-29 19:44:50 +00:00
ExecutionEngine Remove unneeded zero arrays. 2010-12-04 15:28:22 +00:00
Linker Merge System into Support. 2010-11-29 18:16:10 +00:00
MC Once the layout is done we don't need to keep updating which fragments are 2010-12-04 22:47:22 +00:00
Object Merge System into Support. 2010-11-29 18:16:10 +00:00
Support Silence 'may be used uninitialized in this function' warnings. Static analysis 2010-12-04 20:20:34 +00:00
Target Making use of VFP / NEON floating point multiply-accumulate / subtraction is 2010-12-05 22:04:16 +00:00
Transforms Refactor jump threading. 2010-12-05 19:06:41 +00:00
VMCore Fix PR 4170 by having ExtractValueInst::getIndexedType() reject out-of-bounds indexing. 2010-12-05 20:50:26 +00:00
Makefile Add LLVMObject Library. 2010-11-15 03:21:41 +00:00