1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-22 04:22:57 +02:00
Commit Graph

68772 Commits

Author SHA1 Message Date
Nick Lewycky
a8e288b52d Further expand what a call graph pass may do.
The rationale is that after analyzing a function in the SCC, we may want to
modify it in a way that requires us to update its uses (f.e. to replace the
call with a constant) or its users (f.e. to call it with fewer arguments).

llvm-svn: 122739
2011-01-03 06:16:07 +00:00
Chris Lattner
c1ebe702b1 earlycse can do trivial with-a-block dead store
elimination as well.  This deletes 60 stores in 176.gcc
that largely come from bitfield code.

llvm-svn: 122736
2011-01-03 04:17:24 +00:00
Cameron Zwarich
60ec113434 Use a RecyclingAllocator to allocate values for MachineCSE's ScopedHashTable for
a 28% speedup of MachineCSE time on 403.gcc.

llvm-svn: 122735
2011-01-03 04:07:46 +00:00
Nick Lewycky
310c4820f8 Permit CallGraphSCCPasses readonly access to the direct callers of the functions
in their SCC as they already have with the direct callees.

llvm-svn: 122734
2011-01-03 04:01:44 +00:00
Chris Lattner
e44a99ac89 switch the load table to use a recycling bump pointer allocator,
speeding earlycse up by 6%.

llvm-svn: 122733
2011-01-03 03:53:50 +00:00
Chris Lattner
d19ae32f2f now that loads are in their own table, we can implement
store->load forwarding.  This allows EarlyCSE to zap 600 more
loads from 176.gcc.

llvm-svn: 122732
2011-01-03 03:46:34 +00:00
Chris Lattner
b242caa491 split loads and calls into separate tables. Loads are now just indexed
by their pointer instead of using MemoryValue to wrap it.

llvm-svn: 122731
2011-01-03 03:41:27 +00:00
Chris Lattner
57b02a342e add a testcase for readonly call CSE
llvm-svn: 122730
2011-01-03 03:33:47 +00:00
Chris Lattner
3d56e5f6d5 various cleanups, no functionality change.
llvm-svn: 122729
2011-01-03 03:28:23 +00:00
Nick Lewycky
4840065424 Add spliceFunction to the CallGraph interface. This allows users to efficiently
update a callGraph when performing the common operation of splicing the body to
a new function and updating all callers (such as via RAUW).

No users yet, though this is intended for DeadArgumentElimination as part of
PR8887.

llvm-svn: 122728
2011-01-03 03:19:35 +00:00
Chris Lattner
4cfdaa3f02 Teach EarlyCSE to do trivial CSE of loads and read-only calls.
On 176.gcc, this catches 13090 loads and calls, and increases the
number of simple instructions CSE'd from 29658 to 36208.

llvm-svn: 122727
2011-01-03 03:18:43 +00:00
Chris Lattner
c6ccec7faf add a handy typedef.
llvm-svn: 122726
2011-01-03 03:16:20 +00:00
Chris Lattner
4ad0cdebbb rename InstValue to SimpleValue, add some comments.
llvm-svn: 122725
2011-01-03 02:20:48 +00:00
Michael J. Spencer
1c649efb52 CMake: Add missing source file.
llvm-svn: 122724
2011-01-03 02:13:05 +00:00
Chris Lattner
e89f2f9078 Allocate nodes for the scoped hash table from a recyling bump pointer
allocator.  This speeds up early cse by about 20%

llvm-svn: 122723
2011-01-03 01:42:46 +00:00
Chris Lattner
c375dcd5f5 really get this working with a custom allocator.
llvm-svn: 122722
2011-01-03 01:38:29 +00:00
Chris Lattner
85f8315219 Enhance ScopedHashTable to allow it to take an allocator argument.
llvm-svn: 122721
2011-01-03 01:29:37 +00:00
Chris Lattner
f6a71c6cd5 reduce redundancy in the hashing code and other misc cleanups.
llvm-svn: 122720
2011-01-03 01:10:08 +00:00
Cameron Zwarich
a4f2efdd41 Add a new loop-instsimplify pass, with the intention of replacing the instance
of instcombine that is currently in the middle of the loop pass pipeline. This
commit only checks in the pass; it will hopefully be enabled by default later.

llvm-svn: 122719
2011-01-03 00:25:16 +00:00
Chris Lattner
506c2deff3 fix some pastos
llvm-svn: 122718
2011-01-02 23:29:58 +00:00
Chris Lattner
39d1fb3320 add DEBUG and -stats output to earlycse.
Teach it to CSE the rest of the non-side-effecting instructions.

llvm-svn: 122716
2011-01-02 23:19:45 +00:00
Chris Lattner
dbad0b5e40 Enhance earlycse to do CSE of casts, instsimplify and die.
Add a testcase.

llvm-svn: 122715
2011-01-02 23:04:14 +00:00
Chris Lattner
e396e846b4 split dom frontier handling stuff out to its own DominanceFrontier header,
so that Dominators.h is *just* domtree.  Also prune #includes a bit.

llvm-svn: 122714
2011-01-02 22:09:33 +00:00
Chris Lattner
688675a0be sketch out a new early cse pass. No functionality yet.
llvm-svn: 122713
2011-01-02 21:47:05 +00:00
Chris Lattner
0c2cfcf430 fix a miscompilation of tramp3d-v4: when forming a memcpy, we have to make
sure that the loop we're promoting into a memcpy doesn't mutate the input
of the memcpy.  Before we were just checking that the dest of the memcpy
wasn't mod/ref'd by the loop.

llvm-svn: 122712
2011-01-02 21:14:18 +00:00
Chris Lattner
c655681b32 If a loop iterates exactly once (has backedge count = 0) then don't
mess with it.  We'd rather peel/unroll it than convert all of its 
stores into memsets.

llvm-svn: 122711
2011-01-02 20:24:21 +00:00
Benjamin Kramer
a58b69aa9d Try to reuse the value when lowering memset.
This allows us to compile:
  void test(char *s, int a) {
    __builtin_memset(s, a, 15);
  }
into 1 mul + 3 stores instead of 3 muls + 3 stores.

llvm-svn: 122710
2011-01-02 19:57:05 +00:00
Benjamin Kramer
38491f47ce Lower the i8 extension in memset to a multiply instead of a potentially long series of shifts and ors.
We could implement a DAGCombine to turn x * 0x0101 back into logic operations
on targets that doesn't support the multiply or it is slow (p4) if someone cares
enough.

Example code:
  void test(char *s, int a) {
      __builtin_memset(s, a, 4);
  }
before:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    movl  %eax, %ecx
    shll  $8, %ecx
    orl %eax, %ecx
    movl  %ecx, %eax
    shll  $16, %eax
    orl %ecx, %eax
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret
after:
  _test:                                  ## @test
    movzbl  8(%esp), %eax
    imull $16843009, %eax, %eax   ## imm = 0x1010101
    movl  4(%esp), %ecx
    movl  %eax, 4(%ecx)
    movl  %eax, (%ecx)
    ret

llvm-svn: 122707
2011-01-02 19:44:58 +00:00
Oscar Fuentes
6514b0ac68 A workaround for a bug in cmake 2.8.3 diagnosed on PR 8885.
llvm-svn: 122706
2011-01-02 19:32:31 +00:00
Nick Lewycky
06b94a5e5b Also remove functions that use complex constant expressions in terms of
another function.

llvm-svn: 122705
2011-01-02 19:16:44 +00:00
Chris Lattner
c78e4bc366 enhance loop idiom recognition to scan *all* unconditionally executed
blocks in a loop, instead of just the header block.  This makes it more
aggressive, able to handle Duncan's Ada examples.

llvm-svn: 122704
2011-01-02 19:01:03 +00:00
Chris Lattner
3bb2e83433 make inSubLoop much more efficient.
llvm-svn: 122703
2011-01-02 18:53:08 +00:00
Chris Lattner
2bcd2564d6 rip out isExitBlockDominatedByBlockInLoop, calling DomTree::dominates instead.
isExitBlockDominatedByBlockInLoop is a relic of the days when domtree was 
*just* a tree and didn't have DFS numbers.  Checking DFS numbers is faster
and easier than "limiting the search of the tree".

llvm-svn: 122702
2011-01-02 18:45:39 +00:00
Chris Lattner
bbae3ddf12 add a list of opportunities for future improvement.
llvm-svn: 122701
2011-01-02 18:32:09 +00:00
Chris Lattner
222b24e2de update a bunch of entries.
llvm-svn: 122700
2011-01-02 18:31:38 +00:00
Duncan Sands
2d1c116071 Fix PR8702 by not having LoopSimplify claim to preserve LCSSA form. As described
in the PR, the pass could break LCSSA form when inserting preheaders.  It probably
would be easy enough to fix this, but since currently we always go into LCSSA form
after running this pass, doing so is not urgent.

llvm-svn: 122695
2011-01-02 13:38:21 +00:00
Cameron Zwarich
482eeb4c8e Remove an unused member function.
llvm-svn: 122693
2011-01-02 12:37:22 +00:00
Oscar Fuentes
c3becc8af6 Propagate to parent scope changes made to CMAKE_CXX_FLAGS.
llvm-svn: 122692
2011-01-02 12:30:18 +00:00
Cameron Zwarich
8444a578bb Fix a typo in a variable name.
llvm-svn: 122691
2011-01-02 12:17:10 +00:00
Cameron Zwarich
72a49f9271 Move a load into the only branch where it is used and eliminate a temporary.
llvm-svn: 122690
2011-01-02 10:50:14 +00:00
Cameron Zwarich
0a0e69ca0d Add the explanatory comment from r122680's commit message to the code itself.
llvm-svn: 122689
2011-01-02 10:40:14 +00:00
Cameron Zwarich
e522fd8efe Tidy up indentation.
llvm-svn: 122688
2011-01-02 10:10:02 +00:00
Cameron Zwarich
7becf43554 Fix a typo, which should also fix the failure on llvm-x86_64-linux-checks.
llvm-svn: 122687
2011-01-02 10:06:44 +00:00
Chris Lattner
f669d6a901 Allow loop-idiom to run on multiple BB loops, but still only scan the loop
header for now for memset/memcpy opportunities.  It turns out that loop-rotate
is successfully rotating loops, but *DOESN'T MERGE THE BLOCKS*, turning "for 
loops" into 2 basic block loops that loop-idiom was ignoring.

With this fix, we form many *many* more memcpy and memsets than before, including
on the "history" loops in the viterbi benchmark, which look like this:

        for (j=0; j<MAX_history; ++j) {
          history_new[i][j+1] = history[2*i][j];
        }

Transforming these loops into memcpy's speeds up the viterbi benchmark from
11.98s to 3.55s on my machine.  Woo.

llvm-svn: 122685
2011-01-02 07:58:36 +00:00
Cameron Zwarich
25272921bb Remove the #ifdef'd code for balancing the eval-link data structure. It doesn't
compile, and everyone's tests have shown it to be slower in practice, even for
quite large graphs.

I also hope to do an optimization that is only correct with the simpler data
structure, which would break this even further.

llvm-svn: 122684
2011-01-02 07:53:49 +00:00
Chris Lattner
8a72a8f315 remove debugging code.
llvm-svn: 122683
2011-01-02 07:37:13 +00:00
Chris Lattner
bbd22e0c3c add some -stats output.
llvm-svn: 122682
2011-01-02 07:36:44 +00:00
Chris Lattner
2afc3c0dc4 improve loop rotation to use CodeMetrics to analyze the
size of a loop header instead of its own code size estimator.
This allows it to handle bitcasts etc more precisely.

llvm-svn: 122681
2011-01-02 07:35:53 +00:00
Cameron Zwarich
c8a0461c46 Speed up dominator computation some more by optimizing bucket processing. When
naively implemented, the Lengauer-Tarjan algorithm requires a separate bucket
for each vertex. However, this is unnecessary, because each vertex is only
placed into a single bucket (that of its semidominator), and each vertex's
bucket is processed before it is added to any bucket itself.

Instead of using a bucket per vertex, we use a single array Buckets that has two
purposes. Before the vertex V with DFS number i is processed, Buckets[i] stores
the index of the first element in V's bucket. After V's bucket is processed,
Buckets[i] stores the index of the next element in the bucket to which V now
belongs, if any.

Reading from the buckets can also be optimized. Instead of processing the bucket
of V's parent at the end of processing V, we process the bucket of V itself at
the beginning of processing V. This means that the case of the root vertex can
be simplified somewhat. It also means that we don't need to look up the DFS
number of the semidominator of every node in the bucket we are processing,
since we know it is the current index being processed.

This is a 6.5% speedup running -domtree on test-suite + SPEC2000/2006, with
larger speedups of around 12% on the larger benchmarks like GCC.

llvm-svn: 122680
2011-01-02 07:03:00 +00:00
Chris Lattner
34a61ab676 teach loop idiom recognition to form memcpy's from simple loops.
llvm-svn: 122678
2011-01-02 03:37:56 +00:00