Stanislav Mekhanoshin
af64ca04f5
[AMDGPU] Add support for architected flat scratch
...
Add support for the readonly flat Scratch register initialized
by the SPI.
Differential Revision: https://reviews.llvm.org/D102432
2021-05-14 10:53:48 -07:00
Nico Weber
8a2b8ef8d1
[gn build] (manually) merge b7d1ab75cf47
...
No check-hwasan-lam target yet, though.
2021-05-14 13:51:10 -04:00
Tomasz Miąsko
19ed746cf3
[Demangle][Rust] Parse integer constants
...
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102179
2021-05-14 19:47:19 +02:00
Philip Reames
72f2c7d2ee
Do actual DCE in LoopUnroll (try 2)
...
Recommitting after addressing a missed review comment, and updating an aarch64 test I'd missed.
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
2021-05-14 10:42:36 -07:00
Benjamin Kramer
e9a9f45f1e
Document updated googletest + modifications
2021-05-14 19:26:12 +02:00
Matt Arsenault
15058e16a1
AMDGPU: Fix assert when rewriting saddr d16 loads
...
moveOperands does not handle moving tied operands since it would
generally have to fixup the tied operand references. Avoid the assert
by untying and retying after the modification. These in place
modifications really aren't managable.
2021-05-14 13:24:19 -04:00
Roman Lebedev
7a6506cfad
[NFC][X86][MCA] Add sudo-zero-idiom vperm2f128/vperm2i128 tests - don't break deps
...
While btver2 model states that this pattern is a zero-cycle zero-idiom
on Jaguar, it does not appear to be the case on Znver3,
here it measures as not being recognized as dep-breaking zero-idiom,
let alone a zero-cycle one.
2021-05-14 20:23:05 +03:00
Roman Lebedev
a8537d3144
[X86] AMD Zen 3: same-reg AVX YMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
...
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:05 +03:00
Roman Lebedev
d28d04de75
[X86] AMD Zen 3: same-reg AVX XMM VPCMPGT{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
...
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev
2ff7114732
[X86] AMD Zen 3: same-reg SSE XMM PCMPGT{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
...
As measured by exegesis, and confirmed by ref docs.
2021-05-14 20:23:04 +03:00
Roman Lebedev
ff2ff878b4
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPCMPGT{B,W,D,Q} tests
2021-05-14 20:23:04 +03:00
Roman Lebedev
317197c4a8
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPCMPGT{B,W,D,Q} tests
2021-05-14 20:23:04 +03:00
Roman Lebedev
069fb685b6
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PCMPGT{B,W,D,Q} tests
2021-05-14 20:23:03 +03:00
Roman Lebedev
6e90e82fb2
[X86] AMD Zen 3: same-reg AVX YMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
...
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev
e641283c37
[X86] AMD Zen 3: same-reg AVX XMM VPSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
...
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:03 +03:00
Roman Lebedev
7224d53fef
[X86] AMD Zen 3: same-reg SSE XMM PSUBUS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
...
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:03 +03:00
Roman Lebedev
103eef39fa
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBUS{B,W} tests
2021-05-14 20:23:03 +03:00
Roman Lebedev
747b0319b1
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBUS{B,W} tests
2021-05-14 20:23:02 +03:00
Roman Lebedev
d2246847e9
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBUS{B,W} tests
2021-05-14 20:23:02 +03:00
Roman Lebedev
19263a16d8
[X86] AMD Zen 3: same-reg AVX YMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
...
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev
139c28fa45
[X86] AMD Zen 3: same-reg AVX XMM VPSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
...
Not really mentioned in ref docs, but measures as such.
Yes, this one is also not zero-cycle.
2021-05-14 20:23:02 +03:00
Roman Lebedev
fe32a28378
[X86] AMD Zen 3: same-reg SSE XMM PSUBS{B,W} is a 1-cycle(!) dep-breaking zero-idiom
...
Not really mentioned in ref docs, but measures as such.
2021-05-14 20:23:02 +03:00
Roman Lebedev
9a5a00fd92
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUBS{B,W} tests
2021-05-14 20:23:01 +03:00
Roman Lebedev
c27c048587
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUBS{B,W} tests
2021-05-14 20:23:01 +03:00
Roman Lebedev
10c74938fb
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUBS{B,W} tests
2021-05-14 20:23:01 +03:00
Roman Lebedev
0d11d5063f
[X86] AMD Zen 3: same-reg AVX YMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
...
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:01 +03:00
Roman Lebedev
4181fdd9ec
[X86] AMD Zen 3: same-reg AVX XMM VPSUB{B,W,D,Q} is a zero-cycle(!) dep-breaking zero-idiom
...
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:01 +03:00
Roman Lebedev
5b956e1952
[X86] AMD Zen 3: same-reg SSE XMM PSUB{B,W,D,Q} is a 1-cycle(!) dep-breaking zero-idiom
...
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev
554780708a
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPSUB{B,W,D,Q} tests
2021-05-14 20:23:00 +03:00
Roman Lebedev
7a8c45f4d0
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPSUB{B,W,D,Q} tests
2021-05-14 20:23:00 +03:00
Roman Lebedev
530148d646
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PSUB{B,W,D,Q} tests
2021-05-14 20:23:00 +03:00
Roman Lebedev
522d03976e
[X86] AMD Zen 3: same-reg AVX YMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
...
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev
d334fd8763
[X86] AMD Zen 3: same-reg AVX XMM VPANDN is a zero-cycle(!) dep-breaking zero-idiom
...
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:23:00 +03:00
Roman Lebedev
747aa83d9d
[X86] AMD Zen 3: same-reg SSE XMM PANDN is a 1-cycle(!) dep-breaking zero-idiom
...
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:22:59 +03:00
Roman Lebedev
f96afc073b
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPANDN tests
2021-05-14 20:22:59 +03:00
Roman Lebedev
7cdc1e03f2
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPANDN tests
2021-05-14 20:22:59 +03:00
Roman Lebedev
106fd4d50d
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PANDN tests
2021-05-14 20:22:59 +03:00
Roman Lebedev
1fc967929a
[X86] AMD Zen 3: same-reg AVX YMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
...
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:22:59 +03:00
Roman Lebedev
3b69b7222f
[X86] AMD Zen 3: same-reg AVX XMM VPXOR is a zero-cycle(!) dep-breaking zero-idiom
...
As confirmed by exegesis measurements, and ref docs.
2021-05-14 20:22:58 +03:00
Roman Lebedev
722c0e895f
[X86] AMD Zen 3: same-reg SSE XMM PXOR is a 1-cycle(!) dep-breaking zero-idiom
...
As confirmed by the exegesis measurements, and ref docs.
2021-05-14 20:22:58 +03:00
Roman Lebedev
2d84799d26
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX YMM VPXOR tests
2021-05-14 20:22:58 +03:00
Roman Lebedev
42e170ffb7
[NFC][X86][MCA] AMD Zen 3: add same-reg AVX XMM VPXOR tests
2021-05-14 20:22:58 +03:00
Roman Lebedev
7e11b78748
[NFC][X86][MCA] AMD Zen 3: add same-reg SSE XMM PXOR tests
2021-05-14 20:22:58 +03:00
Benjamin Kramer
62a029fa79
Bump googletest to 1.10.0
2021-05-14 19:16:31 +02:00
Philip Reames
658d86d1c9
Revert "Do actual DCE in LoopUnroll"
...
This reverts commit 9d1a61e695eb01298e26c76867d65592f1e1968c.
I'd missed some review feedback, and had missed updating an aarch64 test. Reverting while I fix both.
2021-05-14 10:15:30 -07:00
Philip Reames
8371b39ae3
Do actual DCE in LoopUnroll
...
LoopUnroll does a limited DCE pass after unrolling, but if you have a chain of dead instructions, it only deletes the last one. Improve the code to recursively delete all trivially dead instructions.
Differential Revision: https://reviews.llvm.org/D102511
2021-05-14 10:05:25 -07:00
Philip Reames
143d78fe72
Autogen a test for ease of update
2021-05-14 09:33:17 -07:00
Florian Hahn
0a57e484b4
[LV] Add a few more complex first-order recurrence tests.
2021-05-14 17:27:17 +01:00
Bradley Smith
626d5e84ba
[AArch64][SVE] Combine cntp intrinsics with add/sub to produce incp/decp
...
Depends on D101062
Differential Revision: https://reviews.llvm.org/D102077
2021-05-14 17:16:06 +01:00
Simon Pilgrim
95a1c7f9d0
[X86][SSE] Pull out combineToHorizontalAddSub helper from inside (F)ADD/SUB combines. NFCI.
...
The intention is to be able to run this from additional locations (such as shuffle combining) in the future.
2021-05-14 16:52:55 +01:00