1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
Commit Graph

174856 Commits

Author SHA1 Message Date
Matt Arsenault
89112413f0 GlobalISel: Fix CheckMachineFunction passing if ReadCheckFile files
This could be tested, but the FileCheck library spams the error
message to the console.

llvm-svn: 353081
2019-02-04 19:53:22 +00:00
Matt Arsenault
2b7c211da1 GlobalISel: Allow constructing SrcOp/DstOp from MachineOperand
llvm-svn: 353080
2019-02-04 19:53:19 +00:00
Matt Arsenault
2c72ad2c8a GlobalISel: Fix parameter name in documentation
llvm-svn: 353078
2019-02-04 19:16:58 +00:00
Matt Arsenault
220c96833b GlobalISel: Fix CSE handling of buildConstant
This fixes two problems with CSE done in buildConstant. First, this
would hit an assert when used with a vector result type. Solve this by
allowing CSE on the vector elements, but not on the result vector for
now.

Second, this was also performing the CSE based on the input
ConstantInt pointer. The underlying buildConstant could potentially
convert the constant depending on the result type, giving in a
different ConstantInt*. Stop allowing the APInt and ConstantInt forms
from automatically casting to the result type to avoid any similar
problems in the future.

llvm-svn: 353077
2019-02-04 19:15:50 +00:00
Heejin Ahn
f857e44c40 [WebAssembly] clang-tidy (NFC)
Summary:
This patch fixes clang-tidy warnings on wasm-only files.
The list of checks used is:
`-*,clang-diagnostic-*,llvm-*,misc-*,-misc-unused-parameters,readability-identifier-naming,modernize-*`
(LLVM's default .clang-tidy list is the same except it does not have
`modernize-*`. But I've seen in multiple CLs in LLVM the modernize style
was recommended and code was fixed based on the style, so I added it as
well.)

The common fixes are:
- Variable names start with an uppercase letter
- Function names start with a lowercase letter
- Use `auto` when you use casts so the type is evident
- Use inline initialization for class member variables
- Use `= default` for empty constructors / destructors
- Use `using` in place of `typedef`

Reviewers: sbc100, tlively, aardappel

Subscribers: dschuff, sunfish, jgravelle-google, yurydelendik, kripken, MatzeB, mgorny, rupprecht, llvm-commits

Differential Revision: https://reviews.llvm.org/D57500

llvm-svn: 353075
2019-02-04 19:13:39 +00:00
Jordan Rupprecht
874713b07c [llvm-objcopy][NFC] simplify an error return
llvm-svn: 353074
2019-02-04 19:09:20 +00:00
Roman Lebedev
9f046e58b4 [X86] X86DAGToDAGISel::matchBitExtract(): prepare 'control' in 32 bits
Summary:
Noticed while looking at D56052.
```
  // The 'control' of BEXTR has the pattern of:
  // [15...8 bit][ 7...0 bit] location
  // [ bit count][     shift] name
  // I.e. 0b000000011'00000001 means  (x >> 0b1) & 0b11
```
I.e. we do not care about any of the bits aside from the low 16 bits.
So there is no point in doing the `slh`,`or` in 64 bits,
let's just do everything in 32 bits, and anyext if needed.

We could do that in 16 even, but we intentionally don't
zext to i16 (longer encoding IIRC),
so i'm guessing the same applies here.

Reviewers: craig.topper, andreadb, RKSimon

Reviewed By: craig.topper

Subscribers: llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D56715

llvm-svn: 353073
2019-02-04 19:04:26 +00:00
Matt Arsenault
2ae7a6c247 GlobalISel: Improve gtest usage
Don't unnecessarily use ASSERT_*, and print the MachineFunction
on failure.

llvm-svn: 353072
2019-02-04 18:58:27 +00:00
David Callahan
37990503cc Adjust cardinality of internal inliner thresholds
Summary:
While compiling openJDK11 (also other workloads), some make files would pass both  CFLAGS  and LDFLAGS at link step ; resulting in duplicate options on the command line when one is using LTO and trying to influence the inliner. Most of the internal flags are ZeroOrMore, this diff changes the remaining ones.

Reviewers: david2050, twoh, modocache

Reviewed By: twoh

Subscribers: mehdi_amini, dexonsmith, eraman, haicheng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57537

Patch by: Abdoul-Kader Keita

llvm-svn: 353071
2019-02-04 18:46:25 +00:00
Craig Topper
2613e553a6 [X86] Add ST0 as an implicit def/use of x87 load/store instructions during FP stackifying.
These instructions implicitly operate on ST0, but we don't currently add that information to the MachineInstr. We also don't add it the tablegen definitions either.

For the most part this doesn't cause any problems because the stackifying occurs after register allocation. All the instructions are marked as having side effects so the postRA scheduler won't reorder them amongst themselves.

But nothing stops inline assembly using X87 instructions from being reordered around other x87 instructions if that inline assembly wasn't marked volatile.

The two test cases I've identified so far in PR40539 involve loads and stores used to set up the inline assembly or capture the results of the inline assembly ending up in the wrong order.

This patch adds implicit ST0 uses/defs to the load/store instructions to prevent this from happening.

I plan to fix all of the FP instructions, but the binops are bit trickier to get right. So I've chosen fixing the known test cases as a good first step.

I think we also need to update the tablegen descriptions so MS inline assembly infers the right clobbers, but I haven't checked that yet.

Differential Revision: https://reviews.llvm.org/D57644

llvm-svn: 353070
2019-02-04 18:43:55 +00:00
Matt Arsenault
896c7cc6eb GlobalISel: Fix moreElementsToNextPow2
This was completely broken. The condition was inverted, and changed
the element type for vectors of pointers.

Fixes bug 40592.

llvm-svn: 353069
2019-02-04 18:42:24 +00:00
Jordan Rupprecht
255412e781 [llvm-objcopy][NFC] Use StringSaver for --keep-global-symbols
Summary: Use StringSaver/BumpPtrAlloc when parsing lines from --keep-global-symbols files. This allows us to consistently use StringRef for driver options, which avoids copying the full strings for each object copied, as well as simplifies part of D57517.

Reviewers: jhenderson, evgeny777, alexshap

Subscribers: jakehehrlich

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57617

llvm-svn: 353068
2019-02-04 18:38:00 +00:00
Wouter van Oortmerssen
d52bc029cd [WebAssembly] Make segment/size/type directives optional in asm
Summary:
These were "boilerplate" that repeated information already present
in .functype and end_function, that needed to be repeated to Please
the particular way our object writing works, and missing them would
generate errors.

Instead, we generate the information for these automatically so the
user can concern itself with writing more canonical wasm functions
that always work as expected.

Reviewers: dschuff, sbc100

Subscribers: jgravelle-google, aheejin, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D57546

llvm-svn: 353067
2019-02-04 18:03:11 +00:00
Jessica Paquette
b78928bcbd Revert "[GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR"
This reverts commit b05ecba6d687fcb3078509220c67458bf1d77a2e.

Apparently adding floor breaks AMDGPU somehow, so I have to back this out
while I look into it.

llvm-svn: 353065
2019-02-04 17:32:47 +00:00
Jessica Paquette
e04ef6ecc0 Revert "[GlobalISel] Add IRTranslator support for G_FFLOOR"
This reverts commit 8bbd570fd5205a04d88d2e5513a6e4adbd028039.

Apparently adding ffloor breaks AMDGPU somehow, so I need to back this out
while I look into it.

llvm-svn: 353064
2019-02-04 17:32:43 +00:00
Nico Weber
96bcfea5e8 gn build: Merge r352944
llvm-svn: 353063
2019-02-04 17:32:36 +00:00
Sam Clegg
d8d06a1305 [WebAssembly] Rename relocations from R_WEBASSEMBLY_ to R_WASM_
See https://github.com/WebAssembly/tool-conventions/pull/95.

This is less typing and IMHO more readable, and it also fits with
our naming around the binary format which tends to use the short name.
e.g.

include/llvm/BinaryFormat/Wasm.h
tools/llvm-objdump/WasmDump.cpp
etc..

Differential Revision: https://reviews.llvm.org/D57611

llvm-svn: 353062
2019-02-04 17:28:46 +00:00
Craig Topper
e1dccb4d49 [X86] Print all register forms of x87 fadd/fsub/fdiv/fmul as having two arguments where on is %st.
All of these instructions consume one encoded register and the other register is %st. They either write the result to %st or the encoded register. Previously we printed both arguments when the encoded register was written. And we printed one argument when the result was written to %st. For the stack popping forms the encoded register is always the destination and we didn't print both operands. This was inconsistent with gcc and objdump and just makes the output assembly code harder to read.

This patch changes things to always print both operands making us consistent with gcc and objdump. The parser should still be able to handle the single register forms just as it did before. This also matches the GNU assembler behavior.

llvm-svn: 353061
2019-02-04 17:28:18 +00:00
Sam Clegg
0301813285 [WebAssembly] Remove redundant namespaces qualifiers. NFC.
Differential Revision: https://reviews.llvm.org/D57610

llvm-svn: 353060
2019-02-04 17:26:22 +00:00
Leonard Chan
7230ae9ced [Intrinsic] Unsigned Fixed Point Multiplication Intrinsic
Add an intrinsic that takes 2 unsigned integers with the scale of them
provided as the third argument and performs fixed point multiplication on
them.

This is a part of implementing fixed point arithmetic in clang where some of
the more complex operations will be implemented as intrinsics.

Differential Revision: https://reviews.llvm.org/D55625

llvm-svn: 353059
2019-02-04 17:18:11 +00:00
Jessica Paquette
cd735a0458 [GlobalISel] Add IRTranslator support for G_FFLOOR
Follow-up to https://reviews.llvm.org/D57484

Adds G_FFLOOR to translateKnownIntrinsic and update arm64-irtranslator.ll.

Differential Revision: https://reviews.llvm.org/D57485

llvm-svn: 353058
2019-02-04 17:15:34 +00:00
Jessica Paquette
b34b9ca867 [GlobalISel] Introduce a generic floating point floor opcode, G_FFLOOR
This introduces a generic opcode for floating point floor, working towards
selecting @llvm.floor.

Differential Revision: https://reviews.llvm.org/D57484

llvm-svn: 353057
2019-02-04 17:10:55 +00:00
Sanjay Patel
cdd2afe58c [CGP] use IRBuilder to simplify code
This is no-functional-change-intended although there could
be intermediate variations caused by a difference in the
debug info produced by setting that from the builder's 
insertion point. 

I'm updating the IR test file associated with this code just
to show that the naming differences from using the builder
are visible.

The motivation for adding a helper function is that we are
likely to extend this code to deal with other overflow ops.

llvm-svn: 353056
2019-02-04 16:30:46 +00:00
James Henderson
5632da10d6 [CommandLine] Don't print empty sentinel values from EnumValN lists in help text
In order to make an option value truly optional, both the ValueOptional
attribute and an empty-named value are required. Prior to this change,
this empty-named value appears in the command-line help text:

-some-option - some help text
  =v1        - description 1
  =v2        - description 2
  =          -

This change improves the help text for these sort of options in a number
of ways:

1) ValueOptional options with an empty-named value now print their help
   text twice: both without and then with '=<value>' after the name. The
   latter version then lists the allowed values after it.
2) Empty-named values with no help text in ValueOptional options are not
   listed in the permitted values.

-some-option         - some help text
-some-option=<value> - some help text
  =v1                - description 1
  =v2                - description 2

3) Otherwise empty-named options are printed as =<empty> rather than
   simply '='.
4) Option values without help text do not have the '-' separator
   printed.

-some-option=<value> - some help text
  =v1                - description 1
  =v2
  =<empty>           - description

It also tweaks the llvm-symbolizer -functions help text to not print a
trailing ':' as that looks bad combined with 1) above.

This is mostly a reland of r353048 which in turn was a reland of
r352750.

Reviewed by: ruiu, thopre, mstorsjo

Differential Revision: https://reviews.llvm.org/D57030

llvm-svn: 353053
2019-02-04 16:17:57 +00:00
Simon Pilgrim
08d6482cba [X86][SSE] SimplifyDemandedBitsForTargetNode - PCMPGT(0,X) sign mask
For PCMPGT(0, X) patterns where we only demand the sign bit (e.g. BLENDV or MOVMSK) then we can use X directly.

Differential Revision: https://reviews.llvm.org/D57667

llvm-svn: 353051
2019-02-04 15:43:36 +00:00
James Henderson
50f0930f9b Revert r353048.
It was causing unexpected unit test failures on build bots.

llvm-svn: 353050
2019-02-04 15:09:58 +00:00
James Henderson
21e157896b [CommandLine] Don't print empty sentinel values from EnumValN lists in help text
In order to make an option value truly optional, both the ValueOptional
attribute and an empty-named value are required. Prior to this change,
this empty-named value appears in the command-line help text:

-some-option - some help text
  =v1        - description 1
  =v2        - description 2
  =          -

This change improves the help text for these sort of options in a number
of ways:

1) ValueOptional options with an empty-named value now print their help
   text twice: both without and then with '=<value>' after the name. The
   latter version then lists the allowed values after it.
2) Empty-named values with no help text in ValueOptional options are not
   listed in the permitted values.

-some-option         - some help text
-some-option=<value> - some help text
  =v1                - description 1
  =v2                - description 2

3) Otherwise empty-named options are printed as =<empty> rather than
   simply '='.
4) Option values without help text do not have the '-' separator
   printed.

-some-option=<value> - some help text
  =v1                - description 1
  =v2
  =<empty>           - description

It also tweaks the llvm-symbolizer -functions help text to not print a
trailing ':' as that looks bad combined with 1) above.

This is mostly a reland of r352750.

Reviewed by: ruiu, thopre, mstorsjo

Differential Revision: https://reviews.llvm.org/D57030

llvm-svn: 353048
2019-02-04 14:48:33 +00:00
Matt Arsenault
ae7721f12b GlobalISel: Fix formatting of debug output
There was a missing space before the instruction name, and the newline
is redundant since MI::print by default adds one.

llvm-svn: 353046
2019-02-04 14:05:33 +00:00
Matt Arsenault
f2659e02d9 AMDGPU/GlobalISel: Legalize select for v4s16
Also add some more select tests to help show future legalization
changes.

llvm-svn: 353045
2019-02-04 14:04:52 +00:00
Simon Pilgrim
d28b271aff [DAGCombine] Add ADD(SUB,SUB) combines
Noticed while investigating PR40483, and fixes the basic test case from the bug - but not a more general case.

We're pretty weak at dealing with ADD/SUB combines compared to the SimplifyAssociativeOrCommutative/SimplifyUsingDistributiveLaws abilities that InstCombine can manage.

llvm-svn: 353044
2019-02-04 13:44:49 +00:00
Andrea Di Biagio
2820258583 [AsmPrinter] Remove hidden flag -print-schedule.
This patch removes hidden codegen flag -print-schedule effectively reverting the
logic originally committed as r300311
(https://llvm.org/viewvc/llvm-project?view=revision&revision=300311).

Flag -print-schedule was originally introduced by r300311 to address PR32216
(https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better
testing of schedule model instruction latencies/throughputs".

These days, we can use llvm-mca to test scheduling models. So there is no longer
a need for flag -print-schedule in LLVM. The main use case for PR32216 is
now addressed by llvm-mca.
Flag -print-schedule is mainly used for debugging purposes, and it is only
actually used by x86 specific tests. We already have extensive (latency and
throughput) tests under "test/tools/llvm-mca" for X86 processor models. That
means, most (if not all) existing -print-schedule tests for X86 are redundant.

When flag -print-schedule was first added to LLVM, several files had to be
modified; a few APIs gained new arguments (see for example method
MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained
a couple of getSchedInfoStr() methods.

Method getSchedInfoStr() had to originally work for both MCInst and
MachineInstr. The original implmentation of getSchedInfoStr() introduced a
subtle layering violation (reported as PR37160 and then fixed/worked-around by
r330615).
In retrospect, that new API could have been designed more optimally. We can
always query MCSchedModel to get the latency and throughput. More importantly,
the "sched-info" string should not have been generated by the subtarget.
Note, r317782 fixed an issue where "print-schedule" didn't work very well in the
presence of inline assembly. That commit is also reverted by this change.

Differential Revision: https://reviews.llvm.org/D57244

llvm-svn: 353043
2019-02-04 12:51:26 +00:00
Simon Pilgrim
00ba9eabc9 [X86] Add a couple of missed ADD combine tests
Noticed while investigating PR40483

llvm-svn: 353042
2019-02-04 12:37:38 +00:00
Simon Pilgrim
793452a500 Use auto for dyn_cast case to save a line. NFCI.
llvm-svn: 353041
2019-02-04 12:32:39 +00:00
David Green
43a3f10458 [ARM] Mark 255 and 65535 as cheap for Thumb1 "And"
This prevents Constant Hoisting from pulling the constant out of the block,
allowing us to still produce LDRH/UXTH nodes. LDRB/UXTB (255) is already cheap
by the default getIntImmCost, but I've added it for clarity.

Differential Revision: https://reviews.llvm.org/D57671

llvm-svn: 353040
2019-02-04 11:58:48 +00:00
David Green
5b97d2f0ab [ARM] Add testcases for D57671. NFC
llvm-svn: 353039
2019-02-04 11:50:14 +00:00
Max Kazantsev
c65950c195 [NFC] Make a check in GuardWidening more obvious
llvm-svn: 353038
2019-02-04 10:41:17 +00:00
Dmitry Venikov
eb58fb41c4 Commit tests for changes in revision D41608
llvm-svn: 353037
2019-02-04 10:32:07 +00:00
Max Kazantsev
b5c1304d21 [NFC] Rename variables to reflect the actual status of GuardWidening
llvm-svn: 353036
2019-02-04 10:31:18 +00:00
Clement Courbet
3ca9df15e6 [llvm-objcopy][NFC] Fix trailing semicolon warning.
llvm-svn: 353035
2019-02-04 10:24:42 +00:00
Max Kazantsev
af16abecf0 [NFC] Remove redundant parameters for better readability
llvm-svn: 353034
2019-02-04 10:20:51 +00:00
Max Kazantsev
10bad41844 [NFC] Replace equivalent condition for better readability
llvm-svn: 353032
2019-02-04 09:55:18 +00:00
Clement Courbet
0c025989e1 [SelectionDAG] Add a BaseIndexOffset::print() method for debugging.
llvm-svn: 353028
2019-02-04 09:30:43 +00:00
Roman Lebedev
42a5fa8d02 [llvm-exegesis] Cut run time of analysis mode by another -35% (*sic*) (YamlContext::getRegNo())
Summary:

Together with the previous patch, it's an -90% improvement,
or roughly -96% improvement if you look starting with rL347204

```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs):

           1483.18 msec task-clock                #    0.999 CPUs utilized            ( +-  0.10% )
                68      context-switches          #   46.085 M/sec                    ( +- 22.62% )
                 0      cpu-migrations            #    0.000 K/sec
             11641      page-faults               # 7850.880 M/sec                    ( +-  0.62% )
        5943246799      cycles                    # 4008184.428 GHz                   ( +-  0.10% )  (83.28%)
         442869514      stalled-cycles-frontend   #    7.45% frontend cycles idle     ( +-  0.41% )  (83.29%)
        1443375663      stalled-cycles-backend    #   24.29% backend cycles idle      ( +-  0.47% )  (33.43%)
        7714006752      instructions              #    1.30  insn per cycle
                                                  #    0.19  stalled cycles per insn  ( +-  0.07% )  (50.17%)
        1977242936      branches                  # 1333472193.855 M/sec              ( +-  0.07% )  (66.79%)
          32624220      branch-misses             #    1.65% of all branches          ( +-  0.18% )  (83.34%)

           1.48438 +- 0.00143 seconds time elapsed  ( +-  0.10% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-newer.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-newer.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-newer.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-newer.html' (9 runs):

            963.28 msec task-clock                #    0.999 CPUs utilized            ( +-  0.37% )
                12      context-switches          #   12.695 M/sec                    ( +- 52.79% )
                 0      cpu-migrations            #    0.000 K/sec
             11599      page-faults               # 12046.971 M/sec                   ( +-  0.59% )
        3860122322      cycles                    # 4009359.596 GHz                   ( +-  0.37% )  (83.19%)
         380300669      stalled-cycles-frontend   #    9.85% frontend cycles idle     ( +-  0.34% )  (83.30%)
        1071910340      stalled-cycles-backend    #   27.77% backend cycles idle      ( +-  1.30% )  (33.51%)
        4773418224      instructions              #    1.24  insn per cycle
                                                  #    0.22  stalled cycles per insn  ( +-  0.15% )  (50.17%)
        1106990316      branches                  # 1149787979.919 M/sec              ( +-  0.11% )  (66.80%)
          23632231      branch-misses             #    2.13% of all branches          ( +-  0.18% )  (83.33%)

           0.96389 +- 0.00356 seconds time elapsed  ( +-  0.37% )
```
```
$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-bew.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-newer.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-old.html
```

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57658

llvm-svn: 353025
2019-02-04 09:12:25 +00:00
Roman Lebedev
65f4e7936b [llvm-exegesis] Cut run time of analysis mode by -84% (*sic*) (YamlContext::getInstrOpcode())
Summary:
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-old.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-old.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-old.html' (9 runs):

           9465.46 msec task-clock                #    1.000 CPUs utilized            ( +-  0.05% )
                60      context-switches          #    6.363 M/sec                    ( +- 79.45% )
                 0      cpu-migrations            #    0.000 K/sec
             11364      page-faults               # 1200.697 M/sec                    ( +-  0.60% )
       37935623543      cycles                    # 4008083.912 GHz                   ( +-  0.05% )  (83.32%)
        2371625356      stalled-cycles-frontend   #    6.25% frontend cycles idle     ( +-  0.37% )  (83.32%)
        8476077875      stalled-cycles-backend    #   22.34% backend cycles idle      ( +-  0.18% )  (33.36%)
       41822439158      instructions              #    1.10  insn per cycle
                                                  #    0.20  stalled cycles per insn  ( +-  0.02% )  (50.03%)
       11607658944      branches                  # 1226405861.486 M/sec              ( +-  0.01% )  (66.69%)
         210864633      branch-misses             #    1.82% of all branches          ( +-  0.06% )  (83.34%)

           9.46636 +- 0.00441 seconds time elapsed  ( +-  0.05% )
```
```
$ perf stat -r 9 ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file="" -analysis-inconsistencies-output-file=/tmp/clusters-bew.html
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'
...
no exegesis target for x86_64-unknown-linux-gnu, using default
Parsed 14656 benchmark points
Printing sched class consistency analysis results to file '/tmp/clusters-bew.html'

 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput-onefull.yaml -analysis-clusters-output-file= -analysis-inconsistencies-output-file=/tmp/clusters-bew.html' (9 runs):

           1480.66 msec task-clock                #    1.000 CPUs utilized            ( +-  0.19% )
                13      context-switches          #    8.483 M/sec                    ( +- 83.10% )
                 0      cpu-migrations            #    0.075 M/sec                    ( +-100.00% )
             11596      page-faults               # 7834.247 M/sec                    ( +-  0.59% )
        5933732194      cycles                    # 4008977.535 GHz                   ( +-  0.19% )  (83.22%)
         438111928      stalled-cycles-frontend   #    7.38% frontend cycles idle     ( +-  0.37% )  (83.25%)
        1454969705      stalled-cycles-backend    #   24.52% backend cycles idle      ( +-  0.94% )  (33.53%)
        7724218604      instructions              #    1.30  insn per cycle
                                                  #    0.19  stalled cycles per insn  ( +-  0.07% )  (50.14%)
        1979796413      branches                  # 1337599858.945 M/sec              ( +-  0.06% )  (66.74%)
          32641638      branch-misses             #    1.65% of all branches          ( +-  0.18% )  (83.31%)

           1.48128 +- 0.00284 seconds time elapsed  ( +-  0.19% )

$ sha512sum /tmp/clusters-*
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-bew.html
db4bbd904fe8840853b589b032c5041bc060b91bcd9c27b914b56581fbc473550eea74b852238c79963b5adf2419f379e9f5db76784048b48e3937f9f3e732bf  /tmp/clusters-old.html
```

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits, RKSimon

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57657

llvm-svn: 353024
2019-02-04 09:12:21 +00:00
Roman Lebedev
ca5caf3fc8 [llvm-exegesis] Throughput support in analysis mode
Summary:
D57000 / [[ https://bugs.llvm.org/show_bug.cgi?id=37698 | PR37698 ]] added support for measuring of the inverse throughput.
But the support for the analysis was not added.
This attempts to fix that. (analysis done o bdver2 / piledriver)

First, small-scale experiment:
```
$ ./bin/llvm-exegesis -num-repetitions=10000 -mode=inverse_throughput -opcode-name=BSF64rr
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-d0acdd.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'BSF64rr RAX RDX'
  config:          ''
  register_initial_values:
    - 'RDX=0x0'
cpu_name:        bdver2
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 10000
measurements:
  - { key: inverse_throughput, value: 3.0278, per_snippet_value: 3.0278 }
error:           ''
info:            instruction has no tied variables picking Uses different from defs
assembled_snippet: 48BA0000000000000000480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2480FBCC2C3
...
```
If we plug `bsfq	%r12, %r10` into llvm-mca:
https://godbolt.org/z/ZtOyhJ
```
Dispatch Width:    4
uOps Per Cycle:    3.00
IPC:               0.50
Block RThroughput: 2.0
```
So RThroughput mismatch exists.

Now, let's upscale and analyse:
{F8207148}
`$ ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=1.0 -benchmarks-file=/tmp/benchmarks-inverse_throughput.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html`:
{F8207172}
{F8207197}
And if we now look at https://www.agner.org/optimize/instruction_tables.pdf,
`Reciprocal throughput` for `BSF r,r` is listed as `3`.
Yay?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57647

llvm-svn: 353023
2019-02-04 09:12:17 +00:00
Roman Lebedev
7979d76356 [llvm-exegesis] deserializeMCInst(): bump SmallVector small size up to 16
Summary:
... from 8.
`VALIGNDZ128rmbik XMM0 XMM0 K1 XMM3 RDI i_0x1  i_0x0  i_0x1` instruction already has 9 components.
It does not matter much in terms of performance, but avoiding allocation seems to come with low cost here..

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57654

llvm-svn: 353022
2019-02-04 09:12:13 +00:00
Roman Lebedev
1cc6540ed5 [llvm-exegesis] Don't default to running&dumping all analyses to '-'
Summary:
Up until the point i have looked in the source, i didn't even understood that
i can disable 'cluster' output. I have always silenced it via ` &> /dev/null`.
(And hoped it wasn't contributing much of the run time.)

While i expect that it has it's use-cases i never once needed it so far.
If i forget to silence it, console is completely flooded with that output.

How about not expecting users to opt-out of analyses,
but to explicitly specify the analyses that should be performed?

Reviewers: courbet, gchatelet

Reviewed By: courbet

Subscribers: tschuett, RKSimon, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D57648

llvm-svn: 353021
2019-02-04 09:12:08 +00:00
Max Kazantsev
ae64db0c14 [SCEV] Do not bother creating separate SCEVUnknown for unreachable nodes
Currently, SCEV creates SCEVUnknown for every node of unreachable code. If we
have a huge amounts of such code, we will be littering SE with these nodes. We could
just state that they all are undef and save some memory.

Differential Revision: https://reviews.llvm.org/D57567
Reviewed By: sanjoy

llvm-svn: 353017
2019-02-04 05:04:19 +00:00
Craig Topper
32f3ee2f37 Recommit r352660 "[X86] Mark EMMS and FEMMS as clobbering MM0-7 and ST0-7."
We now print ST0 as 'st' when generating the clobber list for MS inline assembly in clang. This matches what the gcc reg name list expects.

Original commit message:

This fixes the test case in PR35982 by preventing MMX instructions that read MM0-7 from being moved below EMMS/FEMMS by the post RA scheduler.

Though as discussed in bugzilla, this is not a complete fix. There is still the possibility of reordering in IR or by the pre-RA scheduler.

Differential Revision: https://reviews.llvm.org/D57298

llvm-svn: 353016
2019-02-04 04:44:20 +00:00
Craig Topper
c4b858bb71 [X86] Print %st(0) as %st when its implicit to the instruction. Continue printing it as %st(0) when its encoded in the instruction.
This is a step back from the change I made in r352985. This appears to be more consistent with gcc and objdump behavior.

llvm-svn: 353015
2019-02-04 04:15:10 +00:00