1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 12:33:33 +02:00
Commit Graph

10783 Commits

Author SHA1 Message Date
Alkis Evlogimenos
2caa729f02 Remove asssert since it is breaking cases that it shouldn't.
llvm-svn: 11841
2004-02-25 22:01:06 +00:00
Alkis Evlogimenos
f1516015af Add DenseMap template and actually use it for for mapping virtual regs
to objects.

llvm-svn: 11840
2004-02-25 21:55:45 +00:00
Chris Lattner
24c4dd56b5 Add a new pass, run internalize first
llvm-svn: 11839
2004-02-25 21:35:13 +00:00
Chris Lattner
0dfe43a53a Add a new pass
llvm-svn: 11838
2004-02-25 21:35:02 +00:00
Chris Lattner
64cbbad38e Add prototype
llvm-svn: 11837
2004-02-25 21:34:51 +00:00
Chris Lattner
2a13dd5706 My faith in programmers has been found to be totally misplaced. One would
assume that if they don't intend to write to a global variable, that they
would mark it as constant.  However, there are people that don't understand
that the compiler can do nice things for them if they give it the information
it needs.

This pass looks for blatently obvious globals that are only ever read from.
Though it uses a trivially simple "alias analysis" of sorts, it is still able
to do amazing things to important benchmarks.  253.perlbmk, for example,
contains several ***GIANT*** function pointer tables that are not marked
constant and should be.  Marking them constant allows the optimizer to turn
a whole bunch of indirect calls into direct calls.  Note that only a link-time
optimizer can do this transformation, but perlbmk does have several strings
and other minor globals that can be marked constant by this pass when run
from GCCAS.

176.gcc has a ton of strings and large tables that are marked constant, both
at compile time (38 of them) and at link time (48 more).  Other benchmarks
give similar results, though it seems like big ones have disproportionally
more than small ones.

This pass is extremely quick and does good things.  I'm going to enable it
in gccas & gccld.  Not bad for 50 SLOC.

llvm-svn: 11836
2004-02-25 21:34:36 +00:00
Misha Brukman
6a13621948 SparcV8 regs are really 32-bit, not 64! Thanks, Chris.
llvm-svn: 11835
2004-02-25 21:03:02 +00:00
Misha Brukman
f12c1e5a55 Clean up the tablegen descriptions for SparcV8.
llvm-svn: 11834
2004-02-25 21:02:21 +00:00
Misha Brukman
c8801eb5be Fix the SparcV8 register definitions that were imported from PPC template.
llvm-svn: 11833
2004-02-25 21:00:05 +00:00
Misha Brukman
a4b3e0f01b SparcV8 has different types of instructions, but F1 is only used for CALL.
llvm-svn: 11832
2004-02-25 20:52:20 +00:00
Brian Gaeke
0f7bfb8ef5 Note that this test is currently expected to fail.
llvm-svn: 11831
2004-02-25 20:34:02 +00:00
Chris Lattner
ccae3f6d60 Add an assertion
llvm-svn: 11830
2004-02-25 19:37:44 +00:00
Chris Lattner
7c05e5d4d8 Fix failures in 099.go due to the cfgsimplify pass creating switch instructions
where there did not used to be any before

llvm-svn: 11829
2004-02-25 19:30:19 +00:00
Brian Gaeke
5166390fd2 SparcV8 skeleton
llvm-svn: 11828
2004-02-25 19:28:19 +00:00
Brian Gaeke
c6de948cd1 Great renaming part II: Sparc --> SparcV9 (also includes command-line options and Makefiles)
llvm-svn: 11827
2004-02-25 19:08:12 +00:00
Brian Gaeke
965df0b91b Great renaming: Sparc --> SparcV9
llvm-svn: 11826
2004-02-25 18:44:15 +00:00
Chris Lattner
4f09004dff Add a bunch more functions used by perlbmk
llvm-svn: 11824
2004-02-25 17:43:20 +00:00
John Criswell
cf4474a67b Updated to use llc to generate CBE code.
llvm-svn: 11823
2004-02-25 17:15:02 +00:00
Chris Lattner
1dfc1b9629 Substantial improvements and cleanups for the release notes. We were missing
a bunch of stuff!  :)

llvm-svn: 11822
2004-02-25 16:36:51 +00:00
Chris Lattner
04f116953d Fix incorrect debug code
llvm-svn: 11821
2004-02-25 15:15:04 +00:00
Chris Lattner
ab9628ad18 Teach the instruction selector how to transform 'array' GEP computations into X86
scaled indexes.  This allows us to compile GEP's like this:

int* %test([10 x { int, { int } }]* %X, int %Idx) {
        %Idx = cast int %Idx to long
        %X = getelementptr [10 x { int, { int } }]* %X, long 0, long %Idx, ubyte 1, ubyte 0
        ret int* %X
}

Into a single address computation:

test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        lea %EAX, DWORD PTR [%EAX + 8*%ECX + 4]
        ret

Before it generated:
test:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        shl %ECX, 3
        add %EAX, %ECX
        lea %EAX, DWORD PTR [%EAX + 4]
        ret

This is useful for things like int/float/double arrays, as the indexing can be folded into
the loads&stores, reducing register pressure and decreasing the pressure on the decode unit.
With these changes, I expect our performance on 256.bzip2 and gzip to improve a lot.  On
bzip2 for example, we go from this:

10665 asm-printer           - Number of machine instrs printed
   40 ra-local              - Number of loads/stores folded into instructions
 1708 ra-local              - Number of loads added
 1532 ra-local              - Number of stores added
 1354 twoaddressinstruction - Number of instructions added
 1354 twoaddressinstruction - Number of two-address instructions
 2794 x86-peephole          - Number of peephole optimization performed

to this:
9873 asm-printer           - Number of machine instrs printed
  41 ra-local              - Number of loads/stores folded into instructions
1710 ra-local              - Number of loads added
1521 ra-local              - Number of stores added
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
2142 x86-peephole          - Number of peephole optimization performed

... and these types of instructions are often in tight loops.

Linear scan is also helped, but not as much.  It goes from:

8787 asm-printer           - Number of machine instrs printed
2389 liveintervals         - Number of identity moves eliminated after coalescing
2288 liveintervals         - Number of interval joins performed
3522 liveintervals         - Number of intervals after coalescing
5810 liveintervals         - Number of original intervals
 700 spiller               - Number of loads added
 487 spiller               - Number of stores added
 303 spiller               - Number of register spills
1354 twoaddressinstruction - Number of instructions added
1354 twoaddressinstruction - Number of two-address instructions
 363 x86-peephole          - Number of peephole optimization performed

to:

7982 asm-printer           - Number of machine instrs printed
1759 liveintervals         - Number of identity moves eliminated after coalescing
1658 liveintervals         - Number of interval joins performed
3282 liveintervals         - Number of intervals after coalescing
4940 liveintervals         - Number of original intervals
 635 spiller               - Number of loads added
 452 spiller               - Number of stores added
 288 spiller               - Number of register spills
 789 twoaddressinstruction - Number of instructions added
 789 twoaddressinstruction - Number of two-address instructions
 258 x86-peephole          - Number of peephole optimization performed

Though I'm not complaining about the drop in the number of intervals.  :)

llvm-svn: 11820
2004-02-25 07:00:55 +00:00
Chris Lattner
dccf14825c * Make the previous patch more efficient by not allocating a temporary MachineInstr
to do analysis.

*** FOLD getelementptr instructions into loads and stores when possible,
    making use of some of the crazy X86 addressing modes.

For example, the following C++ program fragment:

struct complex {
    double re, im;
    complex(double r, double i) : re(r), im(i) {}
};
inline complex operator+(const complex& a, const complex& b) {
    return complex(a.re+b.re, a.im+b.im);
}
complex addone(const complex& arg) {
    return arg + complex(1,0);
}

Used to be compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
***     mov %EDX, %ECX
        fld QWORD PTR [%EDX]
        fld1
        faddp %ST(1)
***     add %ECX, 8
        fld QWORD PTR [%ECX]
        fldz
        faddp %ST(1)
***     mov %ECX, %EAX
        fxch %ST(1)
        fstp QWORD PTR [%ECX]
***     add %EAX, 8
        fstp QWORD PTR [%EAX]
        ret

Now it is compiled to:
_Z6addoneRK7complex:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, DWORD PTR [%ESP + 8]
        fld QWORD PTR [%ECX]
        fld1
        faddp %ST(1)
        fld QWORD PTR [%ECX + 8]
        fldz
        faddp %ST(1)
        fxch %ST(1)
        fstp QWORD PTR [%EAX]
        fstp QWORD PTR [%EAX + 8]
        ret

Other programs should see similar improvements, across the board.  Note that
in addition to reducing instruction count, this also reduces register pressure
a lot, always a good thing on X86.  :)

llvm-svn: 11819
2004-02-25 06:13:04 +00:00
Chris Lattner
10d08a2955 Add a helper to create an addressing mode given all of the pieces.
llvm-svn: 11818
2004-02-25 06:01:07 +00:00
Chris Lattner
c0e2bc0250 add an inefficient way of folding structure and constant array indexes together
into a single LEA instruction.  This should improve the code generated for
things like X->A.B.C[12].D.

The bigger benefit is still coming though.  Note that this uses an LEA instruction
instead of an add, giving the register allocator more freedom.  We should probably
never generate ADDri32's.

llvm-svn: 11817
2004-02-25 03:45:50 +00:00
Chris Lattner
969f90db77 Implement special case for storing an immediate into memory so that we don't need
an intermediate register.

llvm-svn: 11816
2004-02-25 02:56:58 +00:00
Brian Gaeke
5d29845f19 Cygwin defines log2 as a macro. Undef it here IFF it has already been defined,
so that we always get the inline function instead. Remember, kids, like it says
in the GCC manual, "An Inline Function is As Fast As a Macro."

llvm-svn: 11815
2004-02-25 01:53:45 +00:00
Brian Gaeke
8bf1c4c026 small portability fix.
llvm-svn: 11814
2004-02-24 22:58:31 +00:00
Chris Lattner
9036c86b14 Add support for 'rename'
llvm-svn: 11813
2004-02-24 22:17:00 +00:00
Chris Lattner
57ee51ae0b Make the verifier a little more explicit about this problem.
llvm-svn: 11811
2004-02-24 22:06:07 +00:00
Chris Lattner
d9652be664 Add support for remove, fwrite, and fread
Also fix problem where we didn't check to see if a node pointer was null.
Though fclose(null) doesn't make a lot of sense, 300.twolf does it.

llvm-svn: 11810
2004-02-24 22:02:48 +00:00
John Criswell
0158db84e0 Added the VTune tests.
llvm-svn: 11809
2004-02-24 21:43:38 +00:00
Brian Gaeke
eae0364189 FunctionLiveVarInfo.h moved: include/llvm/CodeGen -> lib/Target/Sparc/LiveVar
llvm-svn: 11804
2004-02-24 19:46:00 +00:00
Chris Lattner
9da41150e8 Fix some unexpected fallout from the config.h changes. Because the CBE no
longer was getting this #include, it always fell back on the less precise
floating point initializer values, causing some testsuite failures.

llvm-svn: 11803
2004-02-24 18:34:10 +00:00
Chris Lattner
fc15346b60 Fix a faulty optimization on FP values
llvm-svn: 11801
2004-02-24 18:10:14 +00:00
John Criswell
b086ac893e Fixed minor typos.
llvm-svn: 11800
2004-02-24 16:13:56 +00:00
Chris Lattner
7845e4f7f0 If a block is made dead, make sure to promptly remove it.
llvm-svn: 11799
2004-02-24 16:09:21 +00:00
Alkis Evlogimenos
6d7150e9bb Move machine code rewriter and spiller outside the register
allocator.

The implementation is completely rewritten and now employs several
optimizations not exercised before. For example for 164.gzip we have
997 loads and 699 stores vs the 1221 loads and 880 stores we have
before.

llvm-svn: 11798
2004-02-24 08:58:30 +00:00
Chris Lattner
d678669018 Implement SimplifyCFG/switch_switch_fold.ll
This case occurs many times in various benchmarks, especially when combined
with the previous patch.  This allows it to get stuff like:
  if (X == 4 || X == 3)
    if (X == 5 || X == 8)

and

switch (X) {
case 4: case 5: case 6:
  if (X == 4 || X == 5)

llvm-svn: 11797
2004-02-24 07:23:58 +00:00
Chris Lattner
d75c0eef9f New testcase. Switch instructions that go to switch instructions should be
merged.

llvm-svn: 11796
2004-02-24 07:21:09 +00:00
Alkis Evlogimenos
042f01039b Add predicates for checking if a virtual register has a physical
register mapping or a stack slot mapping.

llvm-svn: 11795
2004-02-24 06:30:36 +00:00
Chris Lattner
9f2c8c7ea5 Add some helpful methods for dealing with switch instructions
llvm-svn: 11794
2004-02-24 06:26:00 +00:00
Chris Lattner
1293e1d00c Rearrange code a bit
llvm-svn: 11793
2004-02-24 05:54:22 +00:00
Chris Lattner
e5db7dc4c6 Implement: test/Regression/Transforms/SimplifyCFG/switch_create.ll
This turns code like this:
  if (X == 4 | X == 7)
and
  if (X != 4 & X != 7)
into switch instructions.

llvm-svn: 11792
2004-02-24 05:38:11 +00:00
Chris Lattner
90da2d674f The simplifycfg pass should be able to turn stuff like:
if (X == 4 || X == 7)
and
  if (X != 4 && X != 7)

into switch instructions.

llvm-svn: 11791
2004-02-24 05:34:44 +00:00
Chris Lattner
2b7ba0ef67 Wow, the description of the 'switch' instruction was out of date.
llvm-svn: 11790
2004-02-24 04:54:45 +00:00
Chris Lattner
dc65f68a98 we no longer include boost
llvm-svn: 11789
2004-02-24 04:02:20 +00:00
Chris Lattner
da760e3e77 Hrm, my find must have been faulty. It didn't remove these as well.
llvm-svn: 11788
2004-02-24 03:54:22 +00:00
Chris Lattner
8138c7e4a2 Boost is now unneeded, thanks to the fix for PR253, contributed by Reid Spencer!
llvm-svn: 11787
2004-02-24 03:53:00 +00:00
Chris Lattner
4c3548fcb9 Now that's a new feature!
llvm-svn: 11786
2004-02-24 03:50:24 +00:00
Chris Lattner
e532a181c7 Use the new LLVM is_class template instead of the boost one, allowing us to
remove our dependency on boost!  Thanks to Reid Spencer for making this possible!

llvm-svn: 11785
2004-02-24 03:50:05 +00:00