1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-21 12:02:58 +02:00
Commit Graph

22 Commits

Author SHA1 Message Date
Chandler Carruth
5063f25595 [x86] Enable the new vector shuffle lowering by default.
Update the entire regression test suite for the new shuffles. Remove
most of the old testing which was devoted to the old shuffle lowering
path and is no longer relevant really. Also remove a few other random
tests that only really exercised shuffles and only incidently or without
any interesting aspects to them.

Benchmarking that I have done shows a few small regressions with this on
LNT, zero measurable regressions on real, large applications, and for
several benchmarks where the loop vectorizer fires in the hot path it
shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy
Bridge machines. Running on AMD machines shows even more dramatic
improvements.

When using newer ISA vector extensions the gains are much more modest,
but the code is still better on the whole. There are a few regressions
being tracked (PR21137, PR21138, PR21139) but by and large this is
expected to be a win for x86 generated code performance.

It is also more correct than the code it replaces. I have fuzz tested
this extensively with ISA extensions up through AVX2 and found no
crashes or miscompiles (yet...). The old lowering had a few miscompiles
and crashers after a somewhat smaller amount of fuzz testing.

There is one significant area where the new code path lags behind and
that is in AVX-512 support. However, there was *extremely little*
support for that already and so this isn't a significant step backwards
and the new framework will probably make it easier to implement lowering
that uses the full power of AVX-512's table-based shuffle+blend (IMO).

Many thanks to Quentin, Andrea, Robert, and others for benchmarking
assistance. Thanks to Adam and others for help with AVX-512. Thanks to
Hal, Eric, and *many* others for answering my incessant questions about
how the backend actually works. =]

I will leave the old code path in the tree until the 3 PRs above are at
least resolved to folks' satisfaction. Then I will rip it (and 1000s of
lines of code) out. =] I don't expect this flag to stay around for very
long. It may not survive next week.

llvm-svn: 219046
2014-10-04 03:52:55 +00:00
Chandler Carruth
e2d1c38de7 [x86] Finish switching from CHECK to ALL. This was mistakenly included
in r214007 and then reverted when I backed that (very misguided) patch
out. This recovers the test case cleanup which was good.

llvm-svn: 214010
2014-07-26 03:46:54 +00:00
Chandler Carruth
f3c6b4647a [x86] Revert r214007: Fix PR20355 ...
The clever way to implement signed multiplication with unsigned *is
already implemented* and tested and working correctly. The bug is
somewhere else. Re-investigating.

This will teach me to not scroll far enough to read the code that did
what I thought needed to be done.

llvm-svn: 214009
2014-07-26 02:14:54 +00:00
Chandler Carruth
ebd26ebcfa [x86] Fix PR20355 (and dups) by not using unsigned multiplication when
signed multiplication is requested. While there is not a difference in
the *low* half of the result, the *high* half (used specifically to
implement the signed division by these constants) certainly is used. The
test case I've nuked was actively asserting wrong code.

There is a delightful solution to doing signed multiplication even when
we don't have it that Richard Smith has crafted, but I'll add the
machinery back and implement that in a follow-up patch. This at least
restores correctness.

llvm-svn: 214007
2014-07-26 01:52:13 +00:00
Chandler Carruth
89c953149a [x86] Add coverage for PMUL* instruction testing on SSE2 as well as
SSE4.1.

llvm-svn: 214001
2014-07-26 01:11:10 +00:00
Chandler Carruth
1b734015ea [x86] More cleanup for this test -- simplify the command line.
llvm-svn: 213991
2014-07-26 00:21:52 +00:00
Chandler Carruth
17b233c9a0 [x86] FileCheck-ize this test.
llvm-svn: 213988
2014-07-25 23:59:20 +00:00
Andrew Trick
e3e67d4a0a Enable MI Sched for x86.
This changes the SelectionDAG scheduling preference to source
order. Soon, the SelectionDAG scheduler can be bypassed saving
a nice chunk of compile time.

Performance differences that result from this change are often a
consequence of register coalescing. The register coalescer is far from
perfect. Bugs can be filed for deficiencies.

On x86 SandyBridge/Haswell, the source order schedule is often
preserved, particularly for small blocks.

Register pressure is generally improved over the SD scheduler's ILP
mode. However, we are still able to handle large blocks that require
latency hiding, unlike the SD scheduler's BURR mode. MI scheduler also
attempts to discover the critical path in single-block loops and
adjust heuristics accordingly.

The MI scheduler relies on the new machine model. This is currently
unimplemented for AVX, so we may not be generating the best code yet.

Unit tests are updated so they don't depend on SD scheduling heuristics.

llvm-svn: 192750
2013-10-15 23:33:07 +00:00
Rafael Espindola
b9807cdcf1 Replace more uses of sse41 with sse4.1.
llc using the host cpu features and *waning* on unknown features is probably
not a good thing :-(

llvm-svn: 189144
2013-08-23 20:39:19 +00:00
Andrew Trick
18751012bb Revert "Temporarily enable MI-Sched on X86."
This reverts commit 98a9b72e8c56dc13a2617de84503a3d78352789c.

llvm-svn: 184823
2013-06-25 02:48:58 +00:00
Andrew Trick
716b547d13 Temporarily enable MI-Sched on X86.
Sorry for the unit test churn. I'll try to make the change permanently
next time.

llvm-svn: 184705
2013-06-24 09:13:20 +00:00
Jakob Stoklund Olesen
b3487aa334 Remove -join-physregs from the test suite.
This option has been disabled for a while, and it is going away so I can
clean up the coalescer code.

The tests that required physreg joining to be enabled were almost all of
the form "tiny function with interference between arguments and return
value". Such functions are usually inlined in the real world.

The problem exposed by phys_subreg_coalesce-3.ll is real, but fairly
rare.

llvm-svn: 157027
2012-05-17 23:44:19 +00:00
Craig Topper
b53325d70e Add mcpu to tests to prevent them from using AVX instructions on Sandy Bridge after r155618.
llvm-svn: 155696
2012-04-27 07:11:58 +00:00
Jakob Stoklund Olesen
f27731bf40 Prepare remaining tests for -join-physreg going away.
llvm-svn: 130893
2011-05-04 23:54:59 +00:00
Eric Christopher
4c3a3208e3 Remove the pmulld intrinsic and autoupdate it as a vector multiply.
Rewrite the pmulld patterns, and make sure that they fold in loads of
arguments into the instruction.

llvm-svn: 99910
2010-03-30 18:49:01 +00:00
Dan Gohman
df2896d609 Eliminate more uses of llvm-as and llvm-dis.
llvm-svn: 81290
2009-09-08 23:54:48 +00:00
Dan Gohman
d252329703 Don't use special heuristics for nodes with no data predecessors
unless they actually have data successors, and likewise for nodes
with no data successors unless they actually have data precessors.

llvm-svn: 64327
2009-02-11 21:29:39 +00:00
Evan Cheng
4ebe9b79fa Teach 2addr pass to be do more commuting. If both uses of a two-address instruction are killed, but the first operand has a use before and after the def, commute if the second operand does not suffer from the same issue.
%reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1                                                                                                                                     
%reg1029<def> = MOV8rr %reg1028                                                                                                                                                      
%reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>                                                                                                                            
insert => %reg1030<def> = MOV8rr %reg1028                                                                                                                                            
%reg1030<def> = ADD8rr %reg1028<kill>, %reg1029<kill>, %EFLAGS<imp-def,dead>                                                                                                         

In this case, it might not be possible to coalesce the second MOV8rr                                                                                                                 
instruction if the first one is coalesced. So it would be profitable to                                                                                                              
commute it:                                                                                                                                                                          
%reg1028<def> = EXTRACT_SUBREG %reg1027<kill>, 1                                                                                                                                     
%reg1029<def> = MOV8rr %reg1028                                                                                                                                                      
%reg1029<def> = SHR8ri %reg1029, 7, %EFLAGS<imp-def,dead>                                                                                                                            
insert => %reg1030<def> = MOV8rr %reg1029                                                                                                                                            
%reg1030<def> = ADD8rr %reg1029<kill>, %reg1028<kill>, %EFLAGS<imp-def,dead>

llvm-svn: 62954
2009-01-25 03:53:59 +00:00
Mon P Wang
4e9022582d Fix test to account for generating some vector code for mul v2i64 instead
of incorrectly generating pmuldq

llvm-svn: 61228
2008-12-18 23:42:37 +00:00
Dan Gohman
3ba9d77adb Make this test independent of the target-triple; the stack alignment
is specifically what this test depends on.

llvm-svn: 51599
2008-05-27 17:44:23 +00:00
Nick Lewycky
c096899392 The Linux ABI emits an extra "movl %esp, %ebp" in function prologue and
sometimes a "mov %ebp, %esp" in the epilogue.

Force these tests that rely on counting 'mov' to use i686-apple-darwin8.8.0
where they were written.

llvm-svn: 51568
2008-05-26 20:18:56 +00:00
Dan Gohman
6cc0b4f262 Use PMULDQ for v2i64 multiplies when SSE4.1 is available. And add
load-folding table entries for PMULDQ and PMULLD.

llvm-svn: 51489
2008-05-23 17:49:40 +00:00