llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00

History

Lang Hames bee25fbe59 [ORC] Improve computeLocalDeps / computeNamedSymbolDependencies performance. The computeNamedSymbolDependencies and computeLocalDeps methods on ObjectLinkingLayerJITLinkContext are responsible for computing, for each symbol in the current MaterializationResponsibility, the set of non-locally-scoped symbols that are depended on. To calculate this we have to consider the effect of chains of dependence through locally scoped symbols in the LinkGraph. E.g. .text .globl foo foo: callq bar ## foo depneds on external 'bar' movq Ltmp1(%rip), %rcx ## foo depends on locally scoped 'Ltmp1' addl (%rcx), %eax retq .data Ltmp1: .quad x ## Ltmp1 depends on external 'x' In this example symbol 'foo' depends directly on 'bar', and indirectly on 'x' via 'Ltmp1', which is locally scoped. Performance of the existing implementations appears to have been mediocre: Based on flame graphs posted by @drmeister (in #jit on the LLVM discord server) the computeLocalDeps function was taking up a substantial amount of time when starting up Clasp (https://github.com/clasp-developers/clasp). This commit attempts to address the performance problems in three ways: 1. Using jitlink::Blocks instead of jitlink::Symbols as the nodes of the dependencies-introduced-by-locally-scoped-symbols graph. Using either Blocks or Symbols as nodes provides the same information, but since there may be more than one locally scoped symbol per block the block-based version of the dependence graph should always be a subgraph of the Symbol-based version, and so faster to operate on. 2. Improved worklist management. The older version of computeLocalDeps used a fixed worklist containing all nodes, and iterated over this list propagating dependencies until no further changes were required. The worklist was not sorted into a useful order before the loop started. The new version uses a variable work-stack, visiting nodes in DFS order and only adding nodes when there is meaningful work to do on them. Compared to the old version the new version avoids revisiting nodes which haven't changed, and I suspect it converges more quickly (due to the DFS ordering). 3. Laziness and caching. Mappings of... jitlink::Symbol* -> Interned Name (as SymbolStringPtr) jitlink::Block* -> Immediate dependencies (as SymbolNameSet) jitlink::Block* -> Transitive dependencies (as SymbolNameSet) are all built lazily and cached while running computeNamedSymbolDependencies. According to @drmeister these changes reduced Clasp startup time in his test setup (averaged over a handful of starts) from 4.8 to 2.8 seconds (with ORC/JITLink linking ~11,000 object files in that time), which seems like enough to justify switching to the new algorithm in the absence of any other perf numbers.	2021-07-08 16:31:59 +10:00
..
llvm	[ORC] Improve computeLocalDeps / computeNamedSymbolDependencies performance.	2021-07-08 16:31:59 +10:00
llvm-c	[Orc] At CBindings for LazyRexports	2021-07-01 21:52:05 +02:00

Lang Hames bee25fbe59 [ORC] Improve computeLocalDeps / computeNamedSymbolDependencies performance.

The computeNamedSymbolDependencies and computeLocalDeps methods on
ObjectLinkingLayerJITLinkContext are responsible for computing, for each symbol
in the current MaterializationResponsibility, the set of non-locally-scoped
symbols that are depended on. To calculate this we have to consider the effect
of chains of dependence through locally scoped symbols in the LinkGraph. E.g.

        .text
        .globl  foo
foo:
        callq   bar                    ## foo depneds on external 'bar'
        movq    Ltmp1(%rip), %rcx      ## foo depends on locally scoped 'Ltmp1'
        addl    (%rcx), %eax
        retq

        .data
Ltmp1:
        .quad   x                      ## Ltmp1 depends on external 'x'

In this example symbol 'foo' depends directly on 'bar', and indirectly on 'x'
via 'Ltmp1', which is locally scoped.

Performance of the existing implementations appears to have been mediocre:
Based on flame graphs posted by @drmeister (in #jit on the LLVM discord server)
the computeLocalDeps function was taking up a substantial amount of time when
starting up Clasp (https://github.com/clasp-developers/clasp).

This commit attempts to address the performance problems in three ways:

1. Using jitlink::Blocks instead of jitlink::Symbols as the nodes of the
dependencies-introduced-by-locally-scoped-symbols graph.

Using either Blocks or Symbols as nodes provides the same information, but since
there may be more than one locally scoped symbol per block the block-based
version of the dependence graph should always be a subgraph of the Symbol-based
version, and so faster to operate on.

2. Improved worklist management.

The older version of computeLocalDeps used a fixed worklist containing all
nodes, and iterated over this list propagating dependencies until no further
changes were required. The worklist was not sorted into a useful order before
the loop started.

The new version uses a variable work-stack, visiting nodes in DFS order and
only adding nodes when there is meaningful work to do on them.

Compared to the old version the new version avoids revisiting nodes which
haven't changed, and I suspect it converges more quickly (due to the DFS
ordering).

3. Laziness and caching.

Mappings of...

jitlink::Symbol* -> Interned Name (as SymbolStringPtr)
jitlink::Block* -> Immediate dependencies (as SymbolNameSet)
jitlink::Block* -> Transitive dependencies (as SymbolNameSet)

are all built lazily and cached while running computeNamedSymbolDependencies.

According to @drmeister these changes reduced Clasp startup time in his test
setup (averaged over a handful of starts) from 4.8 to 2.8 seconds (with
ORC/JITLink linking ~11,000 object files in that time), which seems like
enough to justify switching to the new algorithm in the absence of any other
perf numbers.

2021-07-08 16:31:59 +10:00

llvm

[ORC] Improve computeLocalDeps / computeNamedSymbolDependencies performance.

2021-07-08 16:31:59 +10:00

llvm-c

[Orc] At CBindings for LazyRexports

2021-07-01 21:52:05 +02:00