1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 20:43:44 +02:00
Commit Graph

7804 Commits

Author SHA1 Message Date
Chris Lattner
1ac1e54bf9 Fix bug: 2004-10-08-SelectSetCCFold.llx. Normally this is hidden by the
instcombine xform, which is why we didn't notice it before.

llvm-svn: 16840
2004-10-08 16:34:13 +00:00
Chris Lattner
5839d93b51 Instcombine (X & FF00) + xx00 -> (X+xx00) & FF00, implementing and.ll:test27
This comes up when doing adds to bitfield elements.

llvm-svn: 16836
2004-10-08 05:07:56 +00:00
Chris Lattner
87259c3ce9 Little patch to turn (shl (add X, 123), 4) -> (add (shl X, 4), 123 << 4)
This triggers in cases of bitfield additions, opening opportunities for
future improvements.

llvm-svn: 16834
2004-10-08 03:46:20 +00:00
Nate Begeman
dfefd2f3fc Implement logical and with an immediate that consists of a contiguous block
of one or more 1 bits (may wrap from least significant bit to most
significant bit) as the rlwinm rather than andi., andis., or some longer
instructons sequence.

int andn4(int z) { return z & -4; }
int clearhi(int z) { return z & 0x0000FFFF; }
int clearlo(int z) { return z & 0xFFFF0000; }
int clearmid(int z) { return z & 0x00FFFF00; }
int clearwrap(int z) { return z & 0xFF0000FF; }

_andn4:
        rlwinm r3, r3, 0, 0, 29
        blr

_clearhi:
        rlwinm r3, r3, 0, 16, 31
        blr

_clearlo:
        rlwinm r3, r3, 0, 0, 15
        blr

_clearmid:
        rlwinm r3, r3, 0, 8, 23
        blr

_clearwrap:
        rlwinm r3, r3, 0, 24, 7
        blr

llvm-svn: 16832
2004-10-08 02:49:24 +00:00
Nate Begeman
370b1b7a9a Several fixes and enhancements to the PPC32 backend.
1. Fix an illegal argument to getClassB when deciding whether or not to
   sign extend a byte load.

2. Initial addition of isLoad and isStore flags to the instruction .td file
   for eventual use in a scheduler.

3. Rewrite of how constants are handled in emitSimpleBinaryOperation so
   that we can emit the PowerPC shifted immediate instructions far more
   often.  This allows us to emit the following code:

int foo(int x) { return x | 0x00F0000; }

_foo:
.LBB_foo_0:     ; entry
        ; IMPLICIT_DEF
        oris r3, r3, 15
        blr

llvm-svn: 16826
2004-10-07 22:30:03 +00:00
Nate Begeman
76d2a77998 Add ori reg, reg, 0 as a move instruction. This can be generated from
loading a 32bit constant into a register whose low halfword is all zeroes.

We now omit the ori after the lis for the following C code:

int bar(int y) { return y * 0x00F0000; }

_bar:
.LBB_bar_0:     ; entry
        ; IMPLICIT_DEF
        lis r2, 15
        mullw r3, r3, r2
        blr

llvm-svn: 16825
2004-10-07 22:26:12 +00:00
Nate Begeman
f60feea650 Remove unnecessary header include
llvm-svn: 16824
2004-10-07 22:24:32 +00:00
Chris Lattner
7882b54197 Improve comments, no functionality changes
llvm-svn: 16814
2004-10-07 21:30:30 +00:00
Chris Lattner
d15e144241 Fix a nasty dangling pointer problem, due to a free'd pointer being left in
a map.  This caused problems if a later object happened to be allocated at
the free'd object's address.

llvm-svn: 16813
2004-10-07 20:01:31 +00:00
Chris Lattner
50e55bcdb0 Unfortunately the fix for the previous bug introduced the previous
exponential behavior (bork!).  This patch processes stuff with an
explicit SCC finder, allowing the algorithm to be more clear,
efficient, and also (as a bonus) correct!  This gets us back to taking
0.6s to disassemble my horrible .bc file that previously took something
> 30 mins.

llvm-svn: 16811
2004-10-07 19:20:48 +00:00
Chris Lattner
dfdbd62d37 Fix a bug in my previous change. Unfortunately this reverts most of the
speedup, but has the advantage of not breaking a bunch of programs!

llvm-svn: 16806
2004-10-07 16:19:40 +00:00
Chris Lattner
e1d5d599bd Fix a bug in the safety analysis routine
llvm-svn: 16804
2004-10-07 06:01:25 +00:00
Chris Lattner
e7ec24c63e Comment cleanups
llvm-svn: 16803
2004-10-07 06:00:24 +00:00
Chris Lattner
ad9fe72e72 * Rename pass to globalopt, since we do more than just constify
* Instead of handling dead functions specially, just nuke them.
* Be more aggressive about cleaning up after constification, in
  particular, handle getelementptr instructions and constantexprs.
* Be a little bit more structured about how we process globals.

*** Delete globals that are only stored to, and never read.  These are
    clearly not useful, so they should go.  This implements deadglobal.llx

This last one triggers quite a few times.  In particular, 2208 in the
external tests, 1865 of which are in 252.eon.  This shrinks eon from
1995094 to 1732341 bytes of bytecode.

llvm-svn: 16802
2004-10-07 04:16:33 +00:00
Chris Lattner
4a19983f2d Implement GlobalConstifier/trivialstore.llx, and also do some
simplifications of the resultant program to avoid making later passes
do it all.

This allows us to constify globals that just have the same constant that
they are initialized stored into them.

Suprisingly this comes up ALL of the freaking time, dozens of times in
SPEC, 30 times in vortex alone.

For example, on 256.bzip2, it allows us to constify these two globals:

%smallMode = internal global ubyte 0             ; <ubyte*> [#uses=8]
%verbosity = internal global int 0               ; <int*> [#uses=49]

Which (with later optimizations) results in the bytecode file shrinking
from 82286 to 69686 bytes!  Lets hear it for IPO :)

For the record, it's nuking lots of "if (verbosity > 2) { do lots of stuff }"
code.

llvm-svn: 16793
2004-10-06 20:57:02 +00:00
Chris Lattner
e412d10cc0 Dont' let null nodes sneak past cast instructions
llvm-svn: 16779
2004-10-06 19:29:13 +00:00
Chris Lattner
c2563bf614 Change Type::isAbstract to have better comments, a more correct name
(PromoteAbstractToConcrete), and to use a set to avoid recomputation.
In particular, this set eliminates the potentially exponential cases
from this little recursive algorithm.

On a particularly nasty testcase, llvm-dis on the .bc file went from 34
minutes (which is when I killed it, it still hadn't finished) to 0.57s.
Remember kids, exponential algorithms are bad.

llvm-svn: 16772
2004-10-06 16:36:46 +00:00
Chris Lattner
38fbf09104 Correct some typeos
llvm-svn: 16770
2004-10-06 16:28:24 +00:00
Chris Lattner
ff8cbd01e7 Instcombine: -(X sdiv C) -> (X sdiv -C), tested by sub.ll:test16
llvm-svn: 16769
2004-10-06 15:08:25 +00:00
Chris Lattner
82aa8544a5 Remove debugging code, fix encoding problem. This fixes the problems
the JIT had last night.

llvm-svn: 16766
2004-10-06 14:31:50 +00:00
Nate Begeman
79d42a185e Turning on fsel code gen now that we can do so would be good.
llvm-svn: 16765
2004-10-06 11:03:30 +00:00
Nate Begeman
7b4fe83ba8 Implement floating point select for lt, gt, le, ge using the powerpc fsel
instruction.

Now, rather than emitting the following loop out of bisect:
.LBB_main_19:	; no_exit.0.i
	rlwinm r3, r2, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f2, f2, f1
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fcmpu cr0, f1, f4
	bge .LBB_main_64	; no_exit.0.i
.LBB_main_63:	; no_exit.0.i
	b .LBB_main_65	; no_exit.0.i
.LBB_main_64:	; no_exit.0.i
	fmr f2, f1
.LBB_main_65:	; no_exit.0.i
	addi r3, r2, 1
	rlwinm r3, r3, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f4, f4, f1
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f5, lo16(.CPI_main_1-"L00000$pb")(r3)
	fcmpu cr0, f1, f5
	bge .LBB_main_67	; no_exit.0.i
.LBB_main_66:	; no_exit.0.i
	b .LBB_main_68	; no_exit.0.i
.LBB_main_67:	; no_exit.0.i
	fmr f4, f1
.LBB_main_68:	; no_exit.0.i
	fadd f1, f2, f4
	addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
	lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
	fmul f1, f1, f2
	rlwinm r3, r2, 3, 0, 28
	lfdx f2, r3, r28
	fadd f4, f2, f1
	fcmpu cr0, f4, f0
	bgt .LBB_main_70	; no_exit.0.i
.LBB_main_69:	; no_exit.0.i
	b .LBB_main_71	; no_exit.0.i
.LBB_main_70:	; no_exit.0.i
	fmr f0, f4
.LBB_main_71:	; no_exit.0.i
	fsub f1, f2, f1
	addi r2, r2, -1
	fcmpu cr0, f1, f3
	blt .LBB_main_73	; no_exit.0.i
.LBB_main_72:	; no_exit.0.i
	b .LBB_main_74	; no_exit.0.i
.LBB_main_73:	; no_exit.0.i
	fmr f3, f1
.LBB_main_74:	; no_exit.0.i
	cmpwi cr0, r2, -1
	fmr f16, f0
	fmr f17, f3
	bgt .LBB_main_19	; no_exit.0.i

We emit this instead:
.LBB_main_19:	; no_exit.0.i
	rlwinm r3, r2, 3, 0, 28
	lfdx f1, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f2, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f2, f2, f1
	fsel f1, f1, f1, f2
	addi r3, r2, 1
	rlwinm r3, r3, 3, 0, 28
	lfdx f2, r3, r27
	addis r3, r30, ha16(.CPI_main_1-"L00000$pb")
	lfd f4, lo16(.CPI_main_1-"L00000$pb")(r3)
	fsub f4, f4, f2
	fsel f2, f2, f2, f4
	fadd f1, f1, f2
	addis r3, r30, ha16(.CPI_main_2-"L00000$pb")
	lfd f2, lo16(.CPI_main_2-"L00000$pb")(r3)
	fmul f1, f1, f2
	rlwinm r3, r2, 3, 0, 28
	lfdx f2, r3, r28
	fadd f4, f2, f1
	fsub f5, f0, f4
	fsel f0, f5, f0, f4
	fsub f1, f2, f1
	addi r2, r2, -1
	fsub f2, f1, f3
	fsel f3, f2, f3, f1
	cmpwi cr0, r2, -1
	fmr f16, f0
	fmr f17, f3
	bgt .LBB_main_19	; no_exit.0.i

llvm-svn: 16764
2004-10-06 09:53:04 +00:00
Chris Lattner
b0e465f0cb Codegen signed mod by 2 or -2 more efficiently. Instead of generating:
t:
        mov %EDX, DWORD PTR [%ESP + 4]
        mov %ECX, 2
        mov %EAX, %EDX
        sar %EDX, 31
        idiv %ECX
        mov %EAX, %EDX
        ret

Generate:
t:
        mov %ECX, DWORD PTR [%ESP + 4]
***     mov %EAX, %ECX
        cdq
        and %ECX, 1
        xor %ECX, %EDX
        sub %ECX, %EDX
***     mov %EAX, %ECX
        ret

Note that the two marked moves are redundant, and should be eliminated by the
register allocator, but aren't.

Compare this to GCC, which generates:

t:
        mov     %eax, DWORD PTR [%esp+4]
        mov     %edx, %eax
        shr     %edx, 31
        lea     %ecx, [%edx+%eax]
        and     %ecx, -2
        sub     %eax, %ecx
        ret

or ICC 8.0, which generates:

t:
        movl      4(%esp), %ecx                                 #3.5
        movl      $-2147483647, %eax                            #3.25
        imull     %ecx                                          #3.25
        movl      %ecx, %eax                                    #3.25
        sarl      $31, %eax                                     #3.25
        addl      %ecx, %edx                                    #3.25
        subl      %edx, %eax                                    #3.25
        addl      %eax, %eax                                    #3.25
        negl      %eax                                          #3.25
        subl      %eax, %ecx                                    #3.25
        movl      %ecx, %eax                                    #3.25
        ret                                                     #3.25

We would be in great shape if not for the moves.

llvm-svn: 16763
2004-10-06 05:01:07 +00:00
Chris Lattner
c959314701 Really fix FreeBSD, which apparently doesn't tolerate the extern.
Thanks to Jeff Cohen for pointing out my goof.

llvm-svn: 16762
2004-10-06 04:21:52 +00:00
Chris Lattner
09b6b3f514 Fix a scary bug with signed division by a power of two. We used to generate:
s:   ;; X / 4
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 1
        shr %ECX, 30
        mov %EDX, %EAX
        add %EDX, %ECX
        sar %EAX, 2
        ret

When we really meant:

s:
        mov %EAX, DWORD PTR [%ESP + 4]
        mov %ECX, %EAX
        sar %ECX, 1
        shr %ECX, 30
        add %EAX, %ECX
        sar %EAX, 2
        ret

Hey, this also reduces register pressure too :)

llvm-svn: 16761
2004-10-06 04:19:43 +00:00
Chris Lattner
9258948b08 Codegen signed divides by 2 and -2 more efficiently. In particular
instead of:

s:   ;; X / 2
        movl 4(%esp), %eax
        movl %eax, %ecx
        shrl $31, %ecx
        movl %eax, %edx
        addl %ecx, %edx
        sarl $1, %eax
        ret

t:   ;; X / -2
        movl 4(%esp), %eax
        movl %eax, %ecx
        shrl $31, %ecx
        movl %eax, %edx
        addl %ecx, %edx
        sarl $1, %eax
        negl %eax
        ret

Emit:

s:
        movl 4(%esp), %eax
        cmpl $-2147483648, %eax
        sbbl $-1, %eax
        sarl $1, %eax
        ret

t:
        movl 4(%esp), %eax
        cmpl $-2147483648, %eax
        sbbl $-1, %eax
        sarl $1, %eax
        negl %eax
        ret

llvm-svn: 16760
2004-10-06 04:02:39 +00:00
Chris Lattner
acd213fba3 Add some new instructions. Fix the asm string for sbb32rr
llvm-svn: 16759
2004-10-06 04:01:02 +00:00
Chris Lattner
5f0c904ec0 Reduce code growth implied by the tail duplication pass by not duplicating
an instruction if it can be hoisted to a common dominator of the block.
This implements: test/Regression/Transforms/TailDup/MergeTest.ll

llvm-svn: 16758
2004-10-06 03:27:37 +00:00
Chris Lattner
b2e8fdc431 FreeBSD uses GCC. Patch contributed by Jeff Cohen!
llvm-svn: 16756
2004-10-06 03:15:44 +00:00
Brian Gaeke
38641114a3 Must include sys/stat.h before declaring a 'struct stat'
llvm-svn: 16728
2004-10-05 18:46:59 +00:00
Chris Lattner
6023408a6e Make sure the const bit gets inherited correctly when linking declarations
of disagreeing constness.  This fixes
test/Regression/Linker/ConstantGlobals[123].ll

llvm-svn: 16692
2004-10-05 02:28:11 +00:00
Reid Spencer
bba26329ab Adjust sys/stat.h inclusion so its only for SunOS.
llvm-svn: 16686
2004-10-05 00:56:46 +00:00
Tanya Lattner
7198953962 Added a couple of includes to get this to compile on Sparc.
llvm-svn: 16685
2004-10-05 00:51:26 +00:00
Chris Lattner
6547451f32 Solaris doesn't have MAP_FILE.
llvm-svn: 16682
2004-10-05 00:46:21 +00:00
Reid Spencer
079b225788 Excise the ill-advised RLCOMP compression algorithm and simply leave the
previously temporary NULLCOMP implementation that merely copies the data
verbatim without compression. Also, don't warn if there's no compression
library as that is taken care of during configuration time.

llvm-svn: 16654
2004-10-04 17:45:44 +00:00
Reid Spencer
49089d64c2 Add a context for the callback so different compression scenarios can be
distinguished. Tidy up documentation.  Thanks, Chris.

llvm-svn: 16652
2004-10-04 17:29:25 +00:00
Chris Lattner
9f6c72d660 Fix build if not HAVE_BZIP2
llvm-svn: 16650
2004-10-04 16:33:25 +00:00
Reid Spencer
da2e8b9943 First version of the MappedFile abstraction for operating system idependent
mapping of files. This first version uses mmap where its available. The
class needs to implement an alternate mechanism based on malloc'd memory
and file reading/writing for platforms without virtual memory.

llvm-svn: 16649
2004-10-04 11:08:32 +00:00
Reid Spencer
d2bedc512d First version of a support utility to provide generalized compression in
LLVM that handles availability and unavailability of bzip2 and zlib.

llvm-svn: 16648
2004-10-04 10:49:41 +00:00
Chris Lattner
0228f228df * Prune #includes
* Update comments
* Rearrange code a bit
* Finally ELIMINATE the GAS workaround emitter for Intel mode.  woot!

llvm-svn: 16647
2004-10-04 07:31:08 +00:00
Chris Lattner
581948c8f6 Add support for emitting AT&T style .s files, and make it the default. Users
may now choose their output format with the -x86-asm-syntax={intel|att} flag.

llvm-svn: 16646
2004-10-04 07:24:48 +00:00
Chris Lattner
5959f4a108 Convert some missed patterns to support AT&T style
llvm-svn: 16645
2004-10-04 07:23:07 +00:00
Chris Lattner
a05d9f53bb Apparently the GNU assembler has a HUGE hack to be compatible with really
old and broken AT&T syntax assemblers.  The problem with this hack is that
*SOME* forms of the fdiv and fsub instructions have the 'r' bit inverted.
This was a real pain to figure out, but is trivially easy to support: thus
we are now bug compatible with gas and gcc.

llvm-svn: 16644
2004-10-04 07:08:46 +00:00
Chris Lattner
08098895db Fix incorrect suffix
llvm-svn: 16642
2004-10-04 05:20:16 +00:00
Chris Lattner
c2fc9597bd Fix some more missed suffixes and swapped operands
llvm-svn: 16641
2004-10-04 01:38:10 +00:00
Chris Lattner
7b15a84728 Add missing suffixes to FP instructions for AT&T mode
llvm-svn: 16640
2004-10-04 00:43:31 +00:00
Chris Lattner
8d44dcca97 Add support for the -x86-asm-syntax flag, which can be used to choose between
Intel and AT&T style assembly language.  The ultimate goal of this is to
eliminate the GasBugWorkaroundEmitter class, but for now AT&T style emission
is not fully operational.

llvm-svn: 16639
2004-10-03 20:36:57 +00:00
Chris Lattner
94780713a8 Add support to the instruction patterns for AT&T style output, which will
hopefully lead to the death of the 'GasBugWorkaroundEmitter'.  This also
includes changes to wrap the whole file to 80 columns! Woot! :)

Note that the AT&T style output has not been tested at all.

llvm-svn: 16638
2004-10-03 20:35:00 +00:00
Chris Lattner
30b5b79aa0 Add initial support for variants
llvm-svn: 16635
2004-10-03 19:34:18 +00:00
Chris Lattner
815b635639 Do not repeat the map lookup
llvm-svn: 16633
2004-10-01 23:16:43 +00:00