1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-24 11:42:57 +01:00
Commit Graph

52 Commits

Author SHA1 Message Date
Roman Lebedev
94946e5da5 [llvm-exegesis] benchmarkMain(): less cryptic error if built w/o libpfm
Wanted to check if inablility to measure latency of CMOV32rm
is a regression from D60041 / D60138, but unable to do that
because the llvm-exegesis-{8,9} from debian sid fails
with that cryptic, unhelpful error.

I suspect this will be a better error.

llvm-svn: 357900
2019-04-08 10:50:31 +00:00
Guillaume Chatelet
a28f67c075 Add an option do not dump the generated object on disk
Reviewers: courbet

Subscribers: llvm-commits, bdb

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D60317

llvm-svn: 357769
2019-04-05 15:18:59 +00:00
Roman Lebedev
079fdb41d4 [llvm-exegesis] Introduce a 'naive' clustering algorithm (PR40880)
Summary:
This is an alternative to D59539.

Let's suppose we have measured 4 different opcodes, and got: `0.5`, `1.0`, `1.5`, `2.0`.
Let's suppose we are using `-analysis-clustering-epsilon=0.5`.
By default now we will start processing the `0.5` point, find that `1.0` is it's neighbor, add them to a new cluster.
Then we will notice that `1.5` is a neighbor of `1.0` and add it to that same cluster.
Then we will notice that `2.0` is a neighbor of `1.5` and add it to that same cluster.
So all these points ended up in the same cluster.
This may or may not be a correct implementation of dbscan clustering algorithm.

But this is rather horribly broken for the reasons of comparing the clusters with the LLVM sched data.
Let's suppose all those opcodes are currently in the same sched cluster.
If i specify `-analysis-inconsistency-epsilon=0.5`, then no matter
the LLVM values this cluster will **never** match the LLVM values,
and thus this cluster will **always** be displayed as inconsistent.

The solution is obviously to split off some of these opcodes into different sched cluster.
But how do i do that? Out of 4 opcodes displayed in the inconsistency report,
which ones are the "bad ones"? Which ones are the most different from the checked-in data?
I'd need to go in to the `.yaml` and look it up manually.

The trivial solution is to, when creating clusters, don't use the full dbscan algorithm,
but instead "pick some unclustered point, pick all unclustered points that are it's neighbor,
put them all into a new cluster, repeat". And just so as it happens, we can arrive
at that algorithm by not performing the "add neighbors of a neighbor to the cluster" step.

But that won't work well once we teach analyze mode to operate in on-1D mode
(i.e. on more than a single measurement type at a time), because the clustering would
depend on the order of the measurements.

Instead, let's just create a single cluster per opcode, and put all the points of that opcode into said cluster.
And simultaneously check that every point in that cluster is a neighbor of every other point in the cluster,
and if they are not, the cluster (==opcode) is unstable.

This is //yet another// step to bring me closer to being able to continue cleanup of bdver2 sched model..

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40880 | PR40880 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D59820

llvm-svn: 357152
2019-03-28 08:55:01 +00:00
Roman Lebedev
5d84c488b2 [llvm-exegesis] Separate tool options into three categories.
Results in much nicer -help output:
```
$ ./bin/llvm-exegesis -help
USAGE: llvm-exegesis [options]

OPTIONS:

Color Options:

  -color                                         - Use colors in output (default=autodetect)

General options:

  -enable-cse-in-irtranslator                    - Should enable CSE in irtranslator
  -enable-cse-in-legalizer                       - Should enable CSE in Legalizer

Generic Options:

  -help                                          - Display available options (-help-hidden for more)
  -help-list                                     - Display list of available options (-help-list-hidden for more)
  -version                                       - Display the version of this program

llvm-exegesis analysis options:

  -analysis-clustering-epsilon=<number>          - dbscan epsilon for benchmark point clustering
  -analysis-clusters-output-file=<string>        -
  -analysis-display-unstable-clusters            - if there is more than one benchmark for an opcode, said benchmarks may end up not being clustered into the same cluster if the measured performance characteristics are different. by default all such opcodes are filtered out. this flag will instead show only such unstable opcodes
  -analysis-inconsistencies-output-file=<string> -
  -analysis-inconsistency-epsilon=<number>       - epsilon for detection of when the cluster is different from the LLVM schedule profile values
  -analysis-numpoints=<uint>                     - minimum number of points in an analysis cluster

llvm-exegesis benchmark options:

  -ignore-invalid-sched-class                    - ignore instructions that do not define a sched class
  -mode=<value>                                  - the mode to run
    =latency                                     -   Instruction Latency
    =inverse_throughput                          -   Instruction Inverse Throughput
    =uops                                        -   Uop Decomposition
    =analysis                                    -   Analysis
  -num-repetitions=<uint>                        - number of time to repeat the asm snippet
  -opcode-index=<int>                            - opcode to measure, by index
  -opcode-name=<string>                          - comma-separated list of opcodes to measure, by name
  -snippets-file=<string>                        - code snippets to measure

llvm-exegesis options:

  -benchmarks-file=<string>                      - File to read (analysis mode) or write (latency/uops/inverse_throughput modes) benchmark results. “-” uses stdin/stdout.
  -mcpu=<string>                                 - cpu name to use for pfm counters, leave empty to autodetect
```

llvm-svn: 356364
2019-03-18 11:32:37 +00:00
Roman Lebedev
8691a62a19 [llvm-exegesis] Split Epsilon param into two (PR40787)
Summary:
This eps param is used for two distinct things:
* initial point clusterization
* checking clusters against the llvm values

What if one wants to only look at highly different clusters, without changing
the clustering itself? In particular, this helps to weed out noisy measurements
(since the clusterization epsilon is still small, so there is a better chance
that noisy measurements from the same opcode will go into different clusters)

By splitting it into two params it is now possible.

This is nearly-free performance-wise:
Old:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

            390.01 msec task-clock                #    0.998 CPUs utilized            ( +-  0.25% )
                12      context-switches          #   31.735 M/sec                    ( +- 27.38% )
                 0      cpu-migrations            #    0.000 K/sec
              4745      page-faults               # 12183.732 M/sec                   ( +-  0.54% )
        1562711900      cycles                    # 4012303.327 GHz                   ( +-  0.24% )  (82.90%)
         185567822      stalled-cycles-frontend   #   11.87% frontend cycles idle     ( +-  0.52% )  (83.30%)
         392106234      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.31% )  (33.79%)
        1839236666      instructions              #    1.18  insn per cycle
                                                  #    0.21  stalled cycles per insn  ( +-  0.15% )  (50.37%)
         407035764      branches                  # 1045074878.710 M/sec              ( +-  0.12% )  (66.80%)
          10896459      branch-misses             #    2.68% of all branches          ( +-  0.17% )  (83.20%)

          0.390629 +- 0.000972 seconds time elapsed  ( +-  0.25% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           6803.36 msec task-clock                #    0.999 CPUs utilized            ( +-  0.96% )
               262      context-switches          #   38.546 M/sec                    ( +- 23.06% )
                 0      cpu-migrations            #    0.065 M/sec                    ( +- 76.03% )
             13287      page-faults               # 1953.206 M/sec                    ( +-  0.32% )
       27252537904      cycles                    # 4006024.257 GHz                   ( +-  0.95% )  (83.31%)
        1496314935      stalled-cycles-frontend   #    5.49% frontend cycles idle     ( +-  0.97% )  (83.32%)
       16128404524      stalled-cycles-backend    #   59.18% backend cycles idle      ( +-  0.30% )  (33.37%)
       17611143370      instructions              #    0.65  insn per cycle
                                                  #    0.92  stalled cycles per insn  ( +-  0.05% )  (50.04%)
        3894906599      branches                  # 572537147.437 M/sec               ( +-  0.03% )  (66.69%)
         116314514      branch-misses             #    2.99% of all branches          ( +-  0.20% )  (83.35%)

            6.8118 +- 0.0689 seconds time elapsed  ( +-  1.01%)
```
New:
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 10099 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency-1.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (25 runs):

            400.14 msec task-clock                #    0.998 CPUs utilized            ( +-  0.66% )
                12      context-switches          #   29.429 M/sec                    ( +- 25.95% )
                 0      cpu-migrations            #    0.100 M/sec                    ( +-100.00% )
              4714      page-faults               # 11796.496 M/sec                   ( +-  0.55% )
        1603131306      cycles                    # 4011840.105 GHz                   ( +-  0.66% )  (82.85%)
         199538509      stalled-cycles-frontend   #   12.45% frontend cycles idle     ( +-  2.40% )  (83.10%)
         402249109      stalled-cycles-backend    #   25.09% backend cycles idle      ( +-  1.19% )  (34.05%)
        1847783963      instructions              #    1.15  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.18% )  (50.64%)
         407162722      branches                  # 1018925730.631 M/sec              ( +-  0.12% )  (67.02%)
          10932779      branch-misses             #    2.69% of all branches          ( +-  0.51% )  (83.28%)

           0.40077 +- 0.00267 seconds time elapsed  ( +-  0.67% )

lebedevri@pini-pini:/build/llvm-build-Clang-release$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 50572 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new.html'
...
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-latency.yml -analysis-inconsistencies-output-file=/tmp/clusters-new.html' (9 runs):

           6947.79 msec task-clock                #    1.000 CPUs utilized            ( +-  0.90% )
               217      context-switches          #   31.236 M/sec                    ( +- 36.16% )
                 1      cpu-migrations            #    0.096 M/sec                    ( +- 50.00% )
             13258      page-faults               # 1908.389 M/sec                    ( +-  0.34% )
       27830796523      cycles                    # 4006032.286 GHz                   ( +-  0.89% )  (83.30%)
        1504554006      stalled-cycles-frontend   #    5.41% frontend cycles idle     ( +-  2.10% )  (83.32%)
       16716574843      stalled-cycles-backend    #   60.07% backend cycles idle      ( +-  0.65% )  (33.38%)
       17755545931      instructions              #    0.64  insn per cycle
                                                  #    0.94  stalled cycles per insn  ( +-  0.09% )  (50.04%)
        3897255686      branches                  # 560980426.597 M/sec               ( +-  0.06% )  (66.70%)
         117045395      branch-misses             #    3.00% of all branches          ( +-  0.47% )  (83.34%)

            6.9507 +- 0.0627 seconds time elapsed  ( +-  0.90% )
```

I.e. it's +2.6% slowdown for one whole sweep, or +2% for 5 whole sweeps.
Within noise i'd say.

Should help with [[ https://bugs.llvm.org/show_bug.cgi?id=40787 | PR40787 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58476

llvm-svn: 354767
2019-02-25 09:36:12 +00:00
Roman Lebedev
2a88b46050 [llvm-exegesis] Opcode stabilization / reclusterization (PR40715)
Summary:
Given an instruction `Opcode`, we can make benchmarks (measurements) of the
instruction characteristics/performance. Then, to facilitate further analysis
we group the benchmarks with *similar* characteristics into clusters.
Now, this is all not entirely deterministic. Some instructions have variable
characteristics, depending on their arguments. And thus, if we do several
benchmarks of the same instruction `Opcode`, we may end up with *different*
performance characteristics measurements. And when we then do clustering,
these several benchmarks of the same instruction `Opcode` may end up being
clustered into *different* clusters. This is not great for further analysis.

We shall find every `Opcode` with benchmarks not in just one cluster, and move
*all* the benchmarks of said `Opcode` into one new unstable cluster per `Opcode`.

I have solved this by making `ClusterId` a bit field, adding a `IsUnstable` bit,
and introducing `-analysis-display-unstable-clusters` switch to toggle between
displaying stable-only clusters and unstable-only clusters.

The reclusterization is deterministically stable, produces identical reports
between runs. (Or at least that is what i'm seeing, maybe it isn't)

Timings/comparisons:
old (current trunk/head) {F8303582}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (25 runs):

           6624.73 msec task-clock                #    0.999 CPUs utilized            ( +-  0.53% )
               172      context-switches          #   25.965 M/sec                    ( +- 29.89% )
                 0      cpu-migrations            #    0.042 M/sec                    ( +- 56.54% )
             31073      page-faults               # 4690.754 M/sec                    ( +-  0.08% )
       26538711696      cycles                    # 4006230.292 GHz                   ( +-  0.53% )  (83.31%)
        2017496807      stalled-cycles-frontend   #    7.60% frontend cycles idle     ( +-  0.93% )  (83.32%)
       13403650062      stalled-cycles-backend    #   50.51% backend cycles idle      ( +-  0.33% )  (33.37%)
       19770706799      instructions              #    0.74  insn per cycle
                                                  #    0.68  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4419821812      branches                  # 667207369.714 M/sec               ( +-  0.03% )  (66.69%)
         121741669      branch-misses             #    2.75% of all branches          ( +-  0.28% )  (83.34%)

            6.6283 +- 0.0358 seconds time elapsed  ( +-  0.54% )
```

patch, with reclustering but without filtering (i.e. outputting all the stable *and* unstable clusters) {F8303586}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-all.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-all.html' (25 runs):

           6475.29 msec task-clock                #    0.999 CPUs utilized            ( +-  0.31% )
               213      context-switches          #   32.952 M/sec                    ( +- 23.81% )
                 1      cpu-migrations            #    0.130 M/sec                    ( +- 43.84% )
             31287      page-faults               # 4832.057 M/sec                    ( +-  0.08% )
       25939086577      cycles                    # 4006160.279 GHz                   ( +-  0.31% )  (83.31%)
        1958812858      stalled-cycles-frontend   #    7.55% frontend cycles idle     ( +-  0.68% )  (83.32%)
       13218961512      stalled-cycles-backend    #   50.96% backend cycles idle      ( +-  0.29% )  (33.37%)
       19752995402      instructions              #    0.76  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.04%)
        4417079244      branches                  # 682195472.305 M/sec               ( +-  0.03% )  (66.70%)
         121510065      branch-misses             #    2.75% of all branches          ( +-  0.19% )  (83.34%)

            6.4832 +- 0.0229 seconds time elapsed  ( +-  0.35% )
```
Funnily, *this* measurement shows that said reclustering actually improved performance.

patch, with reclustering, only the stable clusters {F8303594}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-stable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-stable.html' (25 runs):

           6387.71 msec task-clock                #    0.999 CPUs utilized            ( +-  0.13% )
               133      context-switches          #   20.792 M/sec                    ( +- 23.39% )
                 0      cpu-migrations            #    0.063 M/sec                    ( +- 61.24% )
             31318      page-faults               # 4903.256 M/sec                    ( +-  0.08% )
       25591984967      cycles                    # 4006786.266 GHz                   ( +-  0.13% )  (83.31%)
        1881234904      stalled-cycles-frontend   #    7.35% frontend cycles idle     ( +-  0.25% )  (83.33%)
       13209749965      stalled-cycles-backend    #   51.62% backend cycles idle      ( +-  0.16% )  (33.36%)
       19767554347      instructions              #    0.77  insn per cycle
                                                  #    0.67  stalled cycles per insn  ( +-  0.04% )  (50.03%)
        4417480305      branches                  # 691618858.046 M/sec               ( +-  0.03% )  (66.68%)
         118676358      branch-misses             #    2.69% of all branches          ( +-  0.07% )  (83.33%)

            6.3954 +- 0.0118 seconds time elapsed  ( +-  0.18% )
```
Performance improved even further?! Makes sense i guess, less clusters to print.

patch, with reclustering, only the unstable clusters {F8303601}
```
$ perf stat -r 25 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 43970 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-new-unstable.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=0.5 -benchmarks-file=/home/lebedevri/PileDriver-Sched/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters-new-unstable.html -analysis-display-unstable-clusters' (25 runs):

           6124.96 msec task-clock                #    1.000 CPUs utilized            ( +-  0.20% )
               194      context-switches          #   31.709 M/sec                    ( +- 20.46% )
                 0      cpu-migrations            #    0.039 M/sec                    ( +- 49.77% )
             31413      page-faults               # 5129.261 M/sec                    ( +-  0.06% )
       24536794267      cycles                    # 4006425.858 GHz                   ( +-  0.19% )  (83.31%)
        1676085087      stalled-cycles-frontend   #    6.83% frontend cycles idle     ( +-  0.46% )  (83.32%)
       13035595603      stalled-cycles-backend    #   53.13% backend cycles idle      ( +-  0.16% )  (33.36%)
       18260877653      instructions              #    0.74  insn per cycle
                                                  #    0.71  stalled cycles per insn  ( +-  0.05% )  (50.03%)
        4112411983      branches                  # 671484364.603 M/sec               ( +-  0.03% )  (66.68%)
         114066929      branch-misses             #    2.77% of all branches          ( +-  0.11% )  (83.32%)

            6.1278 +- 0.0121 seconds time elapsed  ( +-  0.20% )
```
This tells us that the actual `-analysis-inconsistencies-output-file=` outputting only takes ~0.4 sec for 43970 benchmark points (3 whole sweeps)
(Also, wow this is fast, it used to take several minutes originally)

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=40715 | PR40715 ]].

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, jdoerfert, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D58355

llvm-svn: 354441
2019-02-20 09:14:04 +00:00
Andrea Di Biagio
2820258583 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Roman Lebedev
1cc6540ed5 [llvm-exegesis] Don't default to running&dumping all analyses to '-'
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)

While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.

How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57648

llvm-svn: 353021
2019-02-04 09:12:08 +00:00
Clement Courbet
359f23868e [llvm-exegesis] Add throughput mode.
Summary:
This just uses the latency benchmark runner on the parallel uops snippet
generator.

Fixes PR37698.

Reviewers: gchatelet

Subscribers: tschuett, RKSimon, llvm-commits

Differential Revision: https://reviews.llvm.org/D57000

llvm-svn: 352632
2019-01-30 16:02:20 +00:00
Chandler Carruth
ae65e281f3 Update the file headers across all of the LLVM projects in the monorepo
to reflect the new license.

We understand that people may be surprised that we're moving the header
entirely to discuss the new license. We checked this carefully with the
Foundation's lawyer and we believe this is the correct approach.

Essentially, all code in the project is now made available by the LLVM
project under our new license, so you will see that the license headers
include that license only. Some of our contributors have contributed
code under our old license, and accordingly, we have retained a copy of
our old license notice in the top-level files in each project and
repository.

llvm-svn: 351636
2019-01-19 08:50:56 +00:00
Clement Courbet
04be13e74e Revert "[llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes."
This reverts accidental commit rL346394.

llvm-svn: 346398
2018-11-08 12:09:45 +00:00
Clement Courbet
25b7a25203 [llvm-exegesis] Add a snippet generator to generate snippets to compute ROB sizes.
llvm-svn: 346394
2018-11-08 11:45:14 +00:00
Clement Courbet
dc9ae03db9 [MCSched] Bind PFM Counters to the CPUs instead of the SchedModel.
Summary:
The pfm counters are now in the ExegesisTarget rather than the
MCSchedModel (PR39165).

This also compresses the pfm counter tables (PR37068).

Reviewers: RKSimon, gchatelet

Subscribers: mgrang, llvm-commits

Differential Revision: https://reviews.llvm.org/D52932

llvm-svn: 345243
2018-10-25 07:44:01 +00:00
Guillaume Chatelet
568cbffabb [llvm-exegesis] Implements a cache of Instruction objects.
llvm-svn: 345130
2018-10-24 11:55:06 +00:00
Fangrui Song
edf37d23b8 [llvm-exegesis] Move namespace exegesis inside llvm::
Summary:
This allows simplifying references of llvm::foo with foo when the needs
come in the future.

Reviewers: courbet, gchatelet

Reviewed By: gchatelet

Subscribers: javed.absar, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53455

llvm-svn: 344922
2018-10-22 17:10:47 +00:00
Guillaume Chatelet
6efd85afac [llvm-exegesis] Fix off by one error
llvm-svn: 344731
2018-10-18 08:20:50 +00:00
Clement Courbet
988121fb86 [llvm-exegesis] Allow measuring several instructions in a single run.
Summary:
We try to recover gracefully on instructions that would crash the
program.

This includes some refactoring of runMeasurement() implementations.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D53371

llvm-svn: 344695
2018-10-17 15:04:15 +00:00
Guillaume Chatelet
ef3efb6ae2 [llvm-exegesis][NFC] Pass Instruction instead of bare Opcode
llvm-svn: 344145
2018-10-10 14:57:32 +00:00
Fangrui Song
90ae3da927 [llvm-exegesis] Remove unused headers and fix naming issues
Reviewers: courbet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52565

llvm-svn: 343177
2018-09-27 06:10:15 +00:00
Guillaume Chatelet
d1656d7dab [llvm-exegesis][NFC] Rewrite of the YAML serialization.
Summary: This is a NFC in preparation of exporting the initial registers as part of the YAML dump

Reviewers: courbet

Reviewed By: courbet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52427

llvm-svn: 342967
2018-09-25 12:18:08 +00:00
Clement Courbet
0301f9855d [llvm-exegesis] Allow benchmarking arbitrary code snippets.
Summary:

This is a step towards fixing PR38048.

Note that right now the measurements are given per instruction. We'll
need to give measurements a per code snippet and update the analysis (PR38731).

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D52041

llvm-svn: 342947
2018-09-25 07:31:44 +00:00
Clement Courbet
3faa4f1c4d [llvm-exegesis][NFC] Remove dead parameter.
llvm-svn: 342118
2018-09-13 08:06:29 +00:00
Clement Courbet
1931b181a5 [llvm-exegesis][NFC] Split BenchmarkRunner class
Summary:
The snippet-generation part goes to the SnippetGenerator class.

This will allow benchmarking arbitrary code (see PR38437).

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D51979

llvm-svn: 342117
2018-09-13 07:40:53 +00:00
John Brawn
86bf3b3bb5 [llvm-exegesis] Delegate the decision of cycle counter name to the target
Currently the cycle counter is taken from the subtarget schedule model, which
isn't any use if the subtarget doesn't have one. Delegate the decision to the
target benchmark runner, as it may know better what to do in that case, with
the default being the current behaviour.

Differential Revision: https://reviews.llvm.org/D48779

llvm-svn: 336099
2018-07-02 13:14:49 +00:00
Clement Courbet
6b7cfabf55 [llvm-exegesis] Get the BenchmarkRunner from the ExegesisTarget.
Summary:
This allows targets to override code generation for some instructions.
As an example of override, this also moves ad-hoc instruction filtering
for X86 into the X86 ExegesisTarget.

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48587

llvm-svn: 335582
2018-06-26 08:49:30 +00:00
Clement Courbet
83a80d7efa Re-land r335038 "[llvm-exegesis] A mechanism to add target-specific functionality.""
Fix typo: LLVM_NATIVE_ARCH -> LLVM_EXEGESIS_NATIVE_ARCH.

llvm-svn: 335041
2018-06-19 11:28:59 +00:00
Clement Courbet
46623dcc80 Revert r335038 "[llvm-exegesis] A mechanism to add target-specific functionality."
Breaks buildbots.

llvm-svn: 335040
2018-06-19 10:54:12 +00:00
Clement Courbet
cc90979f36 [llvm-exegesis] A mechanism to add target-specific functionality.
Summary: This is a step towards implementing memory operands and X87.

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48210

llvm-svn: 335038
2018-06-19 10:39:50 +00:00
Clement Courbet
bdb322c5a1 [llvm-exegesis] Optionally ignore instructions without a sched class.
Summary: See PR37602.

Reviewers: RKSimon

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D48267

llvm-svn: 334932
2018-06-18 11:27:47 +00:00
Clement Courbet
356357e6bf [llvm-exegesis] Print the whole snippet in analysis.
Summary:
On hover, the whole asm snippet is displayed, including operands.

This requires the actual assembly output instead of just the MCInsts:
This is because some pseudo-instructions get lowered to actual target
instructions during codegen (e.g. ABS_Fp32 -> SSE or X87).

Reviewers: gchatelet

Subscribers: mgorny, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48164

llvm-svn: 334805
2018-06-15 07:30:45 +00:00
Guillaume Chatelet
75085cc5d9 [llvm-exegesis] Fix unhandled error.
Summary: Fixing an unhandled error when calling writeYaml.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48022

llvm-svn: 334405
2018-06-11 14:10:10 +00:00
Guillaume Chatelet
9a6f0626cf [llvm-exegesis] Program should succeed if benchmark returns StringError.
Summary: Fix for https://bugs.llvm.org/show_bug.cgi?id=37759.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D48004

llvm-svn: 334395
2018-06-11 09:18:01 +00:00
Zachary Turner
2f6a8ddfe8 [FileSystem] Split up the OpenFlags enumeration.
This breaks the OpenFlags enumeration into two separate
enumerations: OpenFlags and CreationDisposition.  The first
controls the behavior of the API depending on whether or not
the target file already exists, and is not a flags-based
enum.  The second controls more flags-like values.

This yields a more easy to understand API, while also allowing
flags to be passed to the openForRead api, where most of the
values didn't make sense before.  This also makes the apis more
testable as it becomes easy to enumerate all the configurations
which make sense, so I've added many new tests to exercise all
the different values.

llvm-svn: 334221
2018-06-07 19:58:58 +00:00
Guillaume Chatelet
6542e7eeea [llvm-exegesis] Make BenchmarkRunner handle multiple configurations.
Summary: BenchmarkRunner subclasses can now create many configurations - although this patch still generates one.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47877

llvm-svn: 334197
2018-06-07 14:00:29 +00:00
Guillaume Chatelet
faceafb547 [llvm-exegesis] Improve error reporting.
Summary: BenchmarkResult IO functions now return an Error or Expected so caller can deal take proper action.

Reviewers: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47868

llvm-svn: 334167
2018-06-07 07:51:16 +00:00
Clement Courbet
d3825552be [llvm-exegesis] Add instructions to BenchmarkResult Key.
We want llvm-exegesis to explore instructions (effect of initial register values, effect of operand selection). To enable this a BenchmarkResult muststore all the relevant data in its key. This patch starts adding such data. Here we simply allow to store the generated instructions, following patches will add operands and initial values for registers.

https://reviews.llvm.org/D47764

Authored by: Guilluame Chatelet

llvm-svn: 334008
2018-06-05 10:56:19 +00:00
Clement Courbet
075ea51309 [llvm-exegesis] Improve documentation.
Summary:
- Better flag names.
- Fix flag reference in doc.
- Add usage examples in doc.

Fixes PR37497.

Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D47015

llvm-svn: 332708
2018-05-18 12:33:57 +00:00
Clement Courbet
5b27e9775e [llvm-exegesis] Write out inconsistencies to a file.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D47013

llvm-svn: 332608
2018-05-17 13:41:28 +00:00
Clement Courbet
52045d7838 [llvm-exegesis] Analysis: detect clustering inconsistencies.
Summary:
Warn on instructions that should have the same performance
characteristics according to the sched model but actually
differ in their benchmarks.

Next step: Make the display nicer to browse, I was thinking maybe html.

Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D46945

llvm-svn: 332601
2018-05-17 12:25:18 +00:00
Clement Courbet
966ceb08e6 reland r332579: [llvm-exegesis] Update to cover latency through another opcode.
Restructuring the code to measure latency and uops.
The end goal is to have this program spawn another process to deal with SIGILL and other malformed programs. It is not yet the case in this redesign, it is still the main program that runs the code (and may crash).
It now uses BitVector instead of Graph for performance reasons.

https://reviews.llvm.org/D46821

(with fixed ARM tests)

Authored by Guillaume Chatelet

llvm-svn: 332592
2018-05-17 10:52:18 +00:00
Clement Courbet
c2ebebb08d Revert r332579 "[llvm-exegesis] Update to cover latency through another opcode."
The revision failed to update the ARM tests.

llvm-svn: 332580
2018-05-17 08:12:29 +00:00
Clement Courbet
07283fe934 [llvm-exegesis] Update to cover latency through another opcode.
Restructuring the code to measure latency and uops.
    The end goal is to have this program spawn another process to deal with SIGILL and other malformed programs. It is not yet the case in this redesign, it is still the main program that runs the code (and may crash).
    It now uses BitVector instead of Graph for performance reasons.

    https://reviews.llvm.org/D46821

    Authored by Guillaume Chatelet

llvm-svn: 332579
2018-05-17 07:38:21 +00:00
Clement Courbet
333ade66d9 [llvm-exegesis] Add a flag to output analysis csv to a file.
Reviewers: gchatelet

Subscribers: llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D46931

llvm-svn: 332445
2018-05-16 09:50:04 +00:00
Clement Courbet
58ef761cf2 [llvm-exegesis] Analysis: Display sched class for instructions.
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D46883

llvm-svn: 332437
2018-05-16 08:47:21 +00:00
Clement Courbet
d1aa19fac3 Fix r332344: only the native target is linked.
llvm-svn: 332345
2018-05-15 12:27:36 +00:00
Clement Courbet
be6ab91eb0 [llvm-exegesis] Add an analysis mode.
Summary:
The analysis mode gives the user a clustered view of the measurement results.
Next steps are (requires the split ok AsmTemplate.Name into {mnemonic, mode}):
 - Show the sched class.
 - Highlight any inconsistencies with the checked-in data.

Reviewers: gchatelet

Subscribers: mgorny, llvm-commits, tschuett

Differential Revision: https://reviews.llvm.org/D46865

llvm-svn: 332344
2018-05-15 12:08:00 +00:00
Clement Courbet
487f0c09ce [llvm-exegesis] Revert accidentally commited code.
llvm-svn: 332231
2018-05-14 11:35:37 +00:00
Clement Courbet
7d328826c2 [llvm-exegesis] Add an analysis mode.
The analysis mode gives the user a clustered view of the measurement results and
highlights any inconsistencies with the checked-in data.

llvm-svn: 332229
2018-05-14 11:30:56 +00:00
Simon Pilgrim
851134e497 [llvm-exegesis] Early out if the scheduler models have no extra info.
We were calling getExtraProcessorInfo() without checking hasExtraProcessorInfo(), resulting in an assertion.

llvm-svn: 330263
2018-04-18 14:46:54 +00:00
Clement Courbet
c7b418ae52 Re-land r329156 "Add llvm-exegesis tool."
Fixed to depend on and initialize the native target instead of X86.

llvm-svn: 329169
2018-04-04 11:37:06 +00:00