llvm-mirror

mirror of https://github.com/RPCS3/llvm-mirror.git synced 2025-01-31 20:51:52 +01:00

History

Roman Lebedev 2088bfe3c4 [InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block

Apparently, we don't do this, neither in EarlyCSE, nor in InstSimplify,
nor in (old) GVN, but do in NewGVN and SimplifyCFG of all places..

While i could teach EarlyCSE how to hash PHI nodes,
we can't really do much (anything?) even if we find two identical
PHI nodes in different basic blocks, same-BB case is the interesting one,
and if we teach InstSimplify about it (which is what i wanted originally,
https://reviews.llvm.org/D86530), we get EarlyCSE support for free.

So i would think this is pretty uncontroversial.

On vanilla llvm test-suite + RawSpeed, this has the following effects:
```
| statistic name                                     | baseline  | proposed  |      Δ |        % |    \|%\| |
|----------------------------------------------------|-----------|-----------|-------:|---------:|---------:|
| instsimplify.NumPHICSE                             | 0         | 23779     |  23779 |    0.00% |    0.00% |
| asm-printer.EmittedInsts                           | 7942328   | 7942392   |     64 |    0.00% |    0.00% |
| assembler.ObjectBytes                              | 273069192 | 273084704 |  15512 |    0.01% |    0.01% |
| correlated-value-propagation.NumPhis               | 18412     | 18539     |    127 |    0.69% |    0.69% |
| early-cse.NumCSE                                   | 2183283   | 2183227   |    -56 |    0.00% |    0.00% |
| early-cse.NumSimplify                              | 550105    | 542090    |  -8015 |   -1.46% |    1.46% |
| instcombine.NumAggregateReconstructionsSimplified  | 73        | 4506      |   4433 | 6072.60% | 6072.60% |
| instcombine.NumCombined                            | 3640264   | 3664769   |  24505 |    0.67% |    0.67% |
| instcombine.NumDeadInst                            | 1778193   | 1783183   |   4990 |    0.28% |    0.28% |
| instcount.NumCallInst                              | 1758401   | 1758799   |    398 |    0.02% |    0.02% |
| instcount.NumInvokeInst                            | 59478     | 59502     |     24 |    0.04% |    0.04% |
| instcount.NumPHIInst                               | 330557    | 330533    |    -24 |   -0.01% |    0.01% |
| instcount.TotalInsts                               | 8831952   | 8832286   |    334 |    0.00% |    0.00% |
| simplifycfg.NumInvokes                             | 4300      | 4410      |    110 |    2.56% |    2.56% |
| simplifycfg.NumSimpl                               | 1019808   | 999607    | -20201 |   -1.98% |    1.98% |
```
I.e. it fires ~24k times, causes +110 (+2.56%) more `invoke` -> `call`
transforms, and counter-intuitively results in *more* instructions total.

That being said, the PHI count doesn't decrease that much,
and looking at some examples, it seems at least some of them
were previously getting PHI CSE'd in SimplifyCFG of all places..

I'm adjusting `Instruction::isIdenticalToWhenDefined()` at the same time.
As a comment in `InstCombinerImpl::visitPHINode()` already stated,
there are no guarantees on the ordering of the operands of a PHI node,
so if we just naively compare them, we may false-negatively say that
the nodes are not equal when the only difference is operand order,
which is especially important since the fold is in InstSimplify,
so we can't rely on InstCombine sorting them beforehand.

Fixing this for the general case is costly (geomean +0.02%),
and does not appear to catch anything in test-suite, but for
the same-BB case, it's trivial, so let's fix at least that.

As per http://llvm-compile-time-tracker.com/compare.php?from=04879086b44348cad600a0a1ccbe1f7776cc3cf9&to=82bdedb888b945df1e9f130dd3ac4dd3c96e2925&stat=instructions
this appears to cause geomean +0.03% compile time increase (regression),
but geomean -0.01%..-0.04% code size decrease (improvement).

2020-08-27 18:47:04 +03:00

2008-11-27-EntryMunge.ll

…

2010-08-26-and.ll

…

2011-04-02-SimplifyDeadBlock.ll

…

2011-04-14-InfLoop.ll

…

2012-07-19-NoSuccessorIndirectBr.ll

…

and-and-cond.ll

…

and-cond.ll

…

assume-edge-dom.ll

…

assume.ll

…

basic.ll

…

bb-unreachable-from-entry.ll

…

branch-debug-info.ll

…

branch-no-const.ll

…

callbr-edge-split.ll

…

codesize-loop.ll

[JumpThreading] Half the duplicate threshold at Oz

2020-02-03 08:40:20 +00:00

combine-metadata.ll

[JumpThreading] Make test more robust (NFC)

2020-06-20 13:05:42 +02:00

compare.ll

…

conservative-lvi.ll

…

crash.ll

…

ddt-crash2.ll

…

ddt-crash3.ll

Migrate function attribute "no-frame-pointer-elim"="false" to "frame-pointer"="none" as cleanups after D56351

2019-12-24 16:27:51 -08:00

ddt-crash4.ll

…

ddt-crash.ll

…

degenerate-phi.ll

…

fold-not-thread.ll

…

freeze-lvi-edgevaluelocal.ll

[LazyValueInfo] Let getEdgeValueLocal look into freeze instructions

2020-08-11 16:39:34 +09:00

freeze.ll

[JumpThreading] Update test freeze.ll; NFC

2020-08-04 20:27:54 +09:00

guards.ll

…

header-succ.ll

…

implied-cond.ll

…

indirectbr.ll

…

induction.ll

…

is_constant.ll

[Intrinsic] Give "is.constant" the "convergent" attribute

2020-03-30 11:47:12 -07:00

landing-pad.ll

…

loop-phi.ll

[InstSimplify][EarlyCSE] Try to CSE PHI nodes in the same basic block

2020-08-27 18:47:04 +03:00

lvi-load.ll

…

lvi-tristate.ll

…

ne-undef.ll

[ValueLattice] Add new state for undef constants.

2020-03-14 17:19:59 +00:00

no-irreducible-loops.ll

…

or-undef.ll

…

phi-copy-to-pred.ll

[JumpThreading] Allow duplicating a basic block into preds when its branch condition is freeze(phi)

2020-08-06 09:51:17 +09:00

phi-eq.ll

…

phi-known.ll

…

pr9331.ll

…

pr15851_hang.ll

…

pr22086.ll

…

pr26096.ll

…

pr27840.ll

…

pr33605.ll

[llvm] Fix broken cases of 'CHECK[^:]*$' in tests

2020-01-28 09:52:59 -07:00

pr33917.ll

…

pr36133.ll

…

pr40992-indirectbr-folding.ll

…

pr46857-callbr.ll

[JumpThreading] ProcessBranchOnXOR(): bailout if any pred ends in indirect branch (PR46857)

2020-07-27 15:39:03 +03:00

PR33357-lvi-recursion.ll

…

PR37745.ll

…

PR44611-across-header-hang.ll

[JumpThreading] Fix infinite loop (PR44611)

2020-03-19 12:49:36 -07:00

pre-load.ll

[JumpThreading] Let SimplifyPartiallyRedundantLoad look into freeze

2020-07-31 15:28:24 +09:00

range-compare.ll

…

redundant-dbg-info.ll

Reapply "[DebugInfo] Prevent explosion of debug intrinsics during jump threading"

2020-02-12 12:39:54 +00:00

removed-use.ll

…

select-unfold-msan.ll

Fix MSan false positive due to select folding.

2020-03-31 15:25:42 -07:00

select.ll

[JumpThreading] add a miscompile test based on discussion in D76332; NFC

2020-03-18 16:46:18 -04:00

stale-loop-info-after-unfold-select.ll

…

static-profile.ll

…

thread-cmp.ll

…

thread-loads.ll

Infer alignment of unmarked loads in IR/bitcode parsing.

2020-05-14 13:03:50 -07:00

thread-two-bbs-cuda.ll

[JumpThreading] Merge/rename thread-two-bbsN.ll tests; NFC

2020-08-04 17:07:28 +09:00

thread-two-bbs-msvc.ll

[JumpThreading] Merge/rename thread-two-bbsN.ll tests; NFC

2020-08-04 17:07:28 +09:00

thread-two-bbs-threshold.ll

[JumpThreading] Consider freeze as a zero-cost instruction

2020-08-05 14:42:36 +09:00

thread-two-bbs.ll

[JumpThreading] Merge/rename thread-two-bbsN.ll tests; NFC

2020-08-04 17:07:28 +09:00

threadable-edge-cast.ll

[JumpThreading] Remove cast's constraint

2020-08-04 19:09:25 +09:00

threading_prof1.ll

…

threading_prof2.ll

…

unreachable-loops.ll

…

update-edge-weight.ll

[JumpThreading] Use profile data even with the new pass manager

2019-11-22 08:21:48 -08:00