llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-02-01 13:11:39 +01:00

Author	SHA1	Message	Date
Sjoerd Meijer	dbb2ea77e4	[ARM][NFCI] Do not fuse VADD and VMUL, continued (1/2) This is a follow up of rL342874, which stopped fusing muls and adds into VMLAs for performance reasons on the Cortex-M4 and Cortex-M33. This is a serie of 2 patches, that is trying to achieve the same for VFMA. The second column in the table below shows what we were generating before rL342874, the third column what changed with rL342874, and the last column what we want to achieve with these 2 patches: -------------------------------------------------------- \| Opt \| < rL342874 \| >= rL342874 \| \| \|------------------------------------------------------\| \|-O3 \| vmla \| vmul \| vmul \| \| \| \| vadd \| vadd \| \|------------------------------------------------------\| \|-Ofast \| vfma \| vfma \| vmul \| \| \| \| \| vadd \| \|------------------------------------------------------\| \|-Oz \| vmla \| vmla \| vmla \| -------------------------------------------------------- This patch 1/2, is a cleanup of the spaghetti predicate logic on the different VMLA and VFMA codegen rules, so that we can make the final functional change in patch 2/2. This also fixes a typo in the regression test added in rL342874. Differential revision: https://reviews.llvm.org/D53314 llvm-svn: 344671	2018-10-17 07:26:35 +00:00
Sjoerd Meijer	d5015b6840	[ARM] Do not fuse VADD and VMUL on the Cortex-M4 and Cortex-M33 A sequence of VMUL and VADD instructions always give the same or better performance than a fused VMLA instruction on the Cortex-M4 and Cortex-M33. Executing the VMUL and VADD back-to-back requires the same cycles, but having separate instructions allows scheduling to avoid the hazard between these 2 instructions. Differential Revision: https://reviews.llvm.org/D52289 llvm-svn: 342874	2018-09-24 12:02:50 +00:00
Sjoerd Meijer	e5082955c1	[ARM] fixed some tabs/whitespaces in test. NFC. llvm-svn: 324074	2018-02-02 11:51:06 +00:00
Saleem Abdulrasool	53e8c5a4af	ARM: fixup more tests to specify the target more explicitly This changes the tests that were targeting ARM EABI to explicitly specify the environment rather than relying on the default. This breaks with the new Windows on ARM support when running the tests on Windows where the default environment is no longer EABI. Take the opportunity to avoid a pointless redirect (helps when trying to debug with providing a command line invocation which can be copy and pasted) and removing a few greps in favour of FileCheck. llvm-svn: 205541	2014-04-03 16:01:44 +00:00
Stephen Lin	7e501cf4c3	Mass update to CodeGen tests to use CHECK-LABEL for labels corresponding to function definitions for more informative error messages. No functionality change and all updated tests passed locally. This update was done with the following bash script: find test/CodeGen -name ".ll" \| \ while read NAME; do echo "$NAME" if ! grep -q "^; RUN: llc.debug" $NAME; then TEMP=`mktemp -t temp` cp $NAME $TEMP sed -n "s/^define [^@]@$[A-Za-z0-9_]$(.$/\1/p" < $NAME \| \ while read FUNC; do sed -i '' "s/;$.$$[A-Za-z0-9_-]$:$ $$FUNC: \$/;\1\2-LABEL:\3$FUNC:/g" $TEMP done sed -i '' "s/;$.$-LABEL-LABEL:/;\1-LABEL:/" $TEMP sed -i '' "s/;$.$-NEXT-LABEL:/;\1-NEXT:/" $TEMP sed -i '' "s/;$.$-NOT-LABEL:/;\1-NOT:/" $TEMP sed -i '' "s/;$.*$-DAG-LABEL:/;\1-DAG:/" $TEMP mv $TEMP $NAME fi done llvm-svn: 186280	2013-07-14 06:24:09 +00:00
Bob Wilson	3daeb462cb	This patch combines several changes from Evan Cheng for rdar://8659675. Making use of VFP / NEON floating point multiply-accumulate / subtraction is difficult on current ARM implementations for a few reasons. 1. Even though a single vmla has latency that is one cycle shorter than a pair of vmul + vadd, a RAW hazard during the first (4? on Cortex-a8) can cause additional pipeline stall. So it's frequently better to single codegen vmul + vadd. 2. A vmla folowed by a vmul, vmadd, or vsub causes the second fp instruction to stall for 4 cycles. We need to schedule them apart. 3. A vmla followed vmla is a special case. Obvious issuing back to back RAW vmla + vmla is very bad. But this isn't ideal either: vmul vadd vmla Instead, we want to expand the second vmla: vmla vmul vadd Even with the 4 cycle vmul stall, the second sequence is still 2 cycles faster. Up to now, isel simply avoid codegen'ing fp vmla / vmls. This works well enough but it isn't the optimial solution. This patch attempts to make it possible to use vmla / vmls in cases where it is profitable. A. Add missing isel predicates which cause vmla to be codegen'ed. B. Make sure the fmul in (fadd (fmul)) has a single use. We don't want to compute a fmul and a fmla. C. Add additional isel checks for vmla, avoid cases where vmla is feeding into fp instructions (except for the #3 exceptional case). D. Add ARM hazard recognizer to model the vmla / vmls hazards. E. Add a special pre-regalloc case to expand vmla / vmls when it's likely the vmla / vmls will trigger one of the special hazards. Enable these fp vmlx codegen changes for Cortex-A9. llvm-svn: 129775	2011-04-19 18:11:57 +00:00
Evan Cheng	b565d1acf9	Add some missing isel predicates on def : pat patterns to avoid generating VFP vmla / vmls (they cause stalls). Disabling them in isel is properly not a right solution, I'll look into a proper solution next. llvm-svn: 118922	2010-11-12 20:32:20 +00:00
Evan Cheng	bc4588c439	Re-commit 117518 and 117519 now that ARM MC test failures are out of the way. llvm-svn: 117531	2010-10-28 06:47:08 +00:00
Evan Cheng	fdc80a0316	Revert 117518 and 117519 for now. They changed scheduling and cause MC tests to fail. Ugh. llvm-svn: 117520	2010-10-28 02:00:25 +00:00
Evan Cheng	5c358e02ea	- Assign load / store with shifter op address modes the right itinerary classes. - For now, loads of [r, r] addressing mode is the same as the [r, r lsl/lsr/asr #] variants. ARMBaseInstrInfo::getOperandLatency() should identify the former case and reduce the output latency by 1. - Also identify [r, r << 2] case. This special form of shifter addressing mode is "free". llvm-svn: 117519	2010-10-28 01:49:06 +00:00
Evan Cheng	6397a77e16	Change ARM scheduling default to list-hybrid if the target supports floating point instructions (and is not using soft float). llvm-svn: 104307	2010-05-21 00:43:17 +00:00
Jim Grosbach	2a0b14a387	switch the flag for using NEON for SP floating point to a subtarget 'feature'. Re-commit. This time complete with testsuite updates. llvm-svn: 99570	2010-03-25 23:47:34 +00:00
Edward O'Callaghan	d1c7b40bb5	Convert ARM tests to FileCheck for PR5307. llvm-svn: 89593	2009-11-22 14:23:33 +00:00
Jim Grosbach	ea6c9c17f5	Use Unified Assembly Syntax for the ARM backend. llvm-svn: 86494	2009-11-09 00:11:35 +00:00
Jim Grosbach	5b094f3b36	vml[as].f32 cause stalls in following advanced SIMD instructions. Avoid using them for scalar floating point operations for now. llvm-svn: 85697	2009-10-31 22:57:36 +00:00
David Goodwin	a4b73e486e	Remove neonfp attribute and instead set default based on CPU string. Add -arm-use-neon-fp to override the default. llvm-svn: 83218	2009-10-01 22:19:57 +00:00
Dan Gohman	142428ce64	Eliminate more uses of llvm-as and llvm-dis. llvm-svn: 81293	2009-09-09 00:09:15 +00:00
David Goodwin	c0fe95d8ce	Make NEON single-precision FP support the default for cortex-a8 (again). llvm-svn: 78430	2009-08-07 23:32:33 +00:00
David Goodwin	99adffe5f2	Initial support for single-precision FP using NEON. Added "neonfp" attribute to enable. Added patterns for some binary FP operations. llvm-svn: 78081	2009-08-04 17:53:06 +00:00

19 Commits