1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 12:43:36 +01:00
Commit Graph

216303 Commits

Author SHA1 Message Date
Anirudh Prasad
f663aad8da [SystemZ][z/OS] Implement getHostCPUName for z/OS
- Currently, the host cpu information is not easily available on z/OS as in other platforms.
- This information is stored in the Communications Vector Table (https://www.ibm.com/docs/en/zos/2.2.0?topic=information-cvt-mapping)

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D102793
2021-05-25 11:18:12 -04:00
Joe Nash
75d94cb27c [AMDGPU] Allow no-modifier operands in cvtDPP
NFC, since no instructions have their AsmMatchConverter
changed, but prepares for that to happen.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103046

Change-Id: I6afefad899076de7b9a412374d09b95b29e012fa
2021-05-25 10:58:06 -04:00
Simon Pilgrim
414d8672a3 [CostModel][X86] Improve accuracy of vXi64 vector non-uniform shift costs on AVX2+ targets
rG1ad4f887bd7692a9e63fb42586f0ece366f2fe01 incorrectly assumed that vXi64 non-uniform shifts were slow like vXi32 were - but llvm-mca (+Agner) both confirm that Haswell/Broadwell are full rate.
2021-05-25 15:58:23 +01:00
Simon Pilgrim
ddf476d680 [X86][SSE] Regenerate vector shift codegen tests. NFCI. 2021-05-25 15:58:22 +01:00
Joe Nash
b28aeaea9c [AMDGPU] More accurate names for dpp operand types
NFC. Renames the variable in the dpp input operand generators
from DstRC to OldRC, because that is what it actually sets.

Also documents the importance of setting HasModifiers = 0 in the
dpp8 asm string.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103047

Change-Id: Ice69ae38f644de7f228a75ca47c43e88b1f7d9e1
2021-05-25 10:35:25 -04:00
David Goldblatt
ce067f4056 [InstSimplify] Transform X * Y % Y --> 0
simplifyDiv already handles the case X * Y / Y --> X (barring overflow).
This adds the equivalent handling to simplifyRem.

Correctness:
https://alive2.llvm.org/ce/z/J2cUbS
https://alive2.llvm.org/ce/z/us9NUM
https://alive2.llvm.org/ce/z/AvaDGJ
https://alive2.llvm.org/ce/z/kq9ige

Extending the situations in which we apply this transform would not be
correct:
https://alive2.llvm.org/ce/z/Lf9V63
https://alive2.llvm.org/ce/z/6RPQK3
https://alive2.llvm.org/ce/z/p9UdxC
https://alive2.llvm.org/ce/z/A2zlhE
https://alive2.llvm.org/ce/z/vHTtLw
https://alive2.llvm.org/ce/z/lvpH42

Differential Revision: https://reviews.llvm.org/D102864
2021-05-25 10:16:04 -04:00
Florian Hahn
82f075623d [VectorCombine] Add test that combines load & store scalarization. 2021-05-25 14:28:37 +01:00
Florian Hahn
5a31a2bcff [VectorCombine] Use constant range info for index scalarization legality.
We can only scalarize memory accesses if we know the index is valid.

This patch adjusts canScalarizeAcceess to fall back to
computeConstantRange to check if the index is known to be valid.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D102476
2021-05-25 13:58:42 +01:00
Sanjay Patel
8343401f13 [InstCombine] canonicalize cast before unary shuffle
We could go either direction on this transform. VectorCombine already goes this
way for bitcasts (and handles more complicated cases using the cost model), so
let's try cast-first.

Deferring completely to VectorCombine is another possibility. But the backend
should be able to invert this easily when the vectors have the same shape, so
it doesn't seem like a transform that we need to avoid.

The motivating example from https://llvm.org/PR49081 has an int-to-float
sandwiched between 2 shuffles, and the backend currently does not reduce that,
so on x86, we get something like:

  pshufd	$249, %xmm0, %xmm0]
  cvtdq2ps	%xmm0, %xmm0
  shufps	$144, %xmm0, %xmm0

...instead of just a single conversion instruction.

Differential Revision: https://reviews.llvm.org/D103038
2021-05-25 08:43:09 -04:00
Sanjay Patel
e031b0571b [InstCombine] add tests for cast-of-shuffle; NFC 2021-05-25 08:43:09 -04:00
Chuanqi Xu
1abdf4bef5 [NFC] [Coroutines] Remove unused variable: UnreachableCache 2021-05-25 20:33:46 +08:00
Roman Lebedev
895d1ce31b [LoopIdiom] Support 'left-shift until zero' idiom
This adds support for the "count active bits" pattern, i.e.:
```
int countBits(unsigned val) {
    int cnt = 0;
    for( ; (val << cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one:
```
int countBits(unsigned val, int start, int off) {
    int cnt;
    for (cnt = start; val << (cnt + off); cnt++)
        ;
    return cnt;
}
```

alive2 is happy with all the tests there.

Note that, again, much like with the right-shift cases,
we don't require the `val != 0` guard.

This is the last pattern that was supported by
`detectShiftUntilZeroIdiom()`, which now becomes obsolete.
2021-05-25 15:26:35 +03:00
Roman Lebedev
36a7fe2d2f [NFC][LoopIdiom] Add tests for 'left-shift until zero' idiom 2021-05-25 15:26:34 +03:00
Bradley Smith
1d41093ecb [AArch64][SVE] Add fixed length codegen for FP_TO_{S,U}INT/{S,U}INT_TO_FP
Depends on D102607

Differential Revision: https://reviews.llvm.org/D102777
2021-05-25 12:54:55 +01:00
Roman Lebedev
af1a311083 [LoopIdiom] Support 'arithmetic right-shift until zero' idiom
This adds support for the "count active bits" pattern, i.e.:
```
int countActiveBits(signed val) {
    int cnt = 0;
    for( ; (val >> cnt) != 0; ++cnt)
        ;
    return cnt;
}
```
but a somewhat more general one:
```
int countActiveBits(signed val, int start, int off) {
    int cnt;
    for (cnt = start; val >> (cnt + off); cnt++)
        ;
    return cnt;
}
```

This directly matches the existing 'logical right-shift until zero' idiom.
alive2 is happy with all the tests there.

Note that, again, much like with the original unsigned case,
we don't require the `val != 0` guard.

The old `detectShiftUntilZeroIdiom()` already supports this pattern,
the idea here is that the `val` must be positive (have at least one
leading zero), because otherwise the loop is non-terminating,
but since it is not `while(1)`, that would have been UB.
2021-05-25 14:30:49 +03:00
Roman Lebedev
6b3c845322 [NFC][LoopIdiom] Add tests for 'arithmetic right-shift until zero' idiom 2021-05-25 14:30:49 +03:00
Marco Elver
b835b9cf36 [SanitizeCoverage] Add support for NoSanitizeCoverage function attribute
We really ought to support no_sanitize("coverage") in line with other
sanitizers. This came up again in discussions on the Linux-kernel
mailing lists, because we currently do workarounds using objtool to
remove coverage instrumentation. Since that support is only on x86, to
continue support coverage instrumentation on other architectures, we
must support selectively disabling coverage instrumentation via function
attributes.

Unfortunately, for SanitizeCoverage, it has not been implemented as a
sanitizer via fsanitize= and associated options in Sanitizers.def, but
rolls its own option fsanitize-coverage. This meant that we never got
"automatic" no_sanitize attribute support.

Implement no_sanitize attribute support by special-casing the string
"coverage" in the NoSanitizeAttr implementation. To keep the feature as
unintrusive to existing IR generation as possible, define a new negative
function attribute NoSanitizeCoverage to propagate the information
through to the instrumentation pass.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=49035

Reviewed By: vitalybuka, morehouse

Differential Revision: https://reviews.llvm.org/D102772
2021-05-25 12:57:14 +02:00
Simon Pilgrim
91c3ce1475 Fix MSVC "truncation of constant value" warning. NFCI. 2021-05-25 11:35:57 +01:00
Simon Pilgrim
491a1dbd4b [CostModel][X86] Improve accuracy of vXi8/vXi16 vector non-uniform shift costs on AVX2/AVX512 targets
Determined from llvm-mca analysis, AVX2+ capable targets have a higher throughput for VPBLENDVB and VPMOVZX ops, making it cheaper to perform shift+select patterns for vXi8 shifts or extend/shift/truncate for vXi16 shifts. Similarly AVX512BW can perform vXi8 as extend/shift/truncate patterns.
2021-05-25 11:35:57 +01:00
Christudasan Devadasan
50950cea4e [AMDGPU] Remove dead declaration (NFC). 2021-05-25 16:04:04 +05:30
Florian Hahn
c4658762f3 [AArch64] Add tests for lowering of vector load + single extract.
Currently the vector load + extract gets lowered to a single scalar
store, not accounting for the fact that the index could be
out-of-bounds, which is poison, not UB.

See PR50382.
2021-05-25 11:09:15 +01:00
Stanislav Mekhanoshin
c7c6ba1331 [IR] Allow Value::replaceUsesWithIf() to process constants
The change is currently NFC, but exploited by the depending D102954.
Code to handle constants is borrowed from the general implementation
of Value::doRAUW().

Differential Revision: https://reviews.llvm.org/D103051
2021-05-25 02:12:01 -07:00
Roman Lebedev
5d534d8259 [llvm-exegesis] Loop unrolling for loop snippet repetitor mode
I really needed this, like, factually, yesterday,
when verifying dependency breaking idioms for AMD Zen 3 scheduler model.

Consider the following example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error:           ''
info:            ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...

```
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.

Now, second example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```
Now that's just worse. Due to the looping, the throughput completely plummeted,
and now we can only do a single instruction/cycle!?

That's not great.
And final example:
```
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-body-size=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode:            inverse_throughput
key:
  instructions:
    - 'VPXORYrr YMM0 YMM0 YMM0'
  config:          ''
  register_initial_values: []
cpu_name:        znver3
llvm_triple:     x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
  - { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error:           ''
info:            ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
```

So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x
(loop-body-size/instruction count in snippet), and run a loop with 1000 iterations
over that duplicated/unrolled snippet, the measured throughput goes through the roof,
up to 5.9 instructions/cycle, which finally tells us that this idiom is zero-cycle!

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D102522
2021-05-25 12:08:27 +03:00
Kristina Bessonova
a85df00b51 [ARM][NEON] Combine base address updates for vld1x intrinsics
Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D102855
2021-05-25 11:06:39 +02:00
David Spickett
a4a13012a7 [clang][ARM] Remove non-existent arm9312 CPU
I cannot find documentation on this CPU, and it
is not supported by the Arm Compiler 5 product either.

It was likely a mistake or a different name for the
"ep9312", which is an Arm based Cirrus Logic chip.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D103024
2021-05-25 08:58:24 +00:00
David Spickett
523f0589d5 [llvm][ARM] Remove non-existent arm1176j-s CPU
This was removed in https://reviews.llvm.org/D52594 for clang.

The one test using it has been updated to use the mpcore
CPU as the linked clang change does.

This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D103022
2021-05-25 08:56:55 +00:00
Benjamin Kramer
9d856bcc84 [GlobalISel] Silence unused variable warning in Release builds. NFC. 2021-05-25 10:55:29 +02:00
David Spickett
e073151ba7 [clang][ARM] Remove non-existent arm1136jz-s CPU
There is an ARM1136JF-S and an ARM1136J-S but I could find
no references to an ARM1136JZ-S. In CPU manuals or the manual
for Arm Compiler 5.

See:
https://developer.arm.com/documentation/ddi0211/latest/
https://developer.arm.com/documentation/dui0472/latest/

Using this CPU you get:
$ ./bin/clang --target=arm-linux-gnueabihf -march=armv3m -mcpu=arm1136jz-s -c /tmp/test.c -o /tmp/test.o
'arm1136jz-s' is not a recognized processor for this target (ignoring processor)

Since the llvm target does not know what it is.

This is part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454.

Reviewed By: peter.smith

Differential Revision: https://reviews.llvm.org/D103019
2021-05-25 08:54:59 +00:00
Alexey Lapshin
28ca22b845 [TRE] Reland: allow TRE for non-capturing calls.
The D82085 "allow TRE for non-capturing calls" caused failure during bootstrap.
This patch does the same as D82085 plus fixes bootstrap error.

The problem with D82085 is that it does not create copies for byval
operands, while replacing function call with a branch.

Consider following example:

```
    int zoo ( S p1 );

    int foo ( int count, S p1 ) {
      if ( count > 10 )
        return zoo(p1);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      // lifetime.start p1.byvalue.temp
      return foo(count+1, p1);
      // lifetime.end p1.byvalue.temp
    }
```

After recursive call to foo is replaced with a jump into
start of the function, its parameters could be passed to
zoo function. i.e. temporarily variable created for byvalue
parameter "p1" could be passed to zoo. Finally zoo receives
broken operand:

```
    int foo ( int count, S p1 ) {
    :tailrecurse
      p1_tr = phi p1, p1.byvalue.temp
      if ( count > 10 )
        return zoo(p1_tr);

      // temporarily variable created for passing byvalue parameter
      // p1 could be used when zoo(p1) is called(after TRE is done).
      lifetime.start p1.byvalue.temp
      memcpy (p1.byvalue.temp, p1_tr)
      count = count + 1
      lifetime.end p1.byvalue.temp
      br tailrecurse
    }
```

To prevent using p1.byvalue.temp after its scope finished by
lifetime.end marker this patch copies value from p1.byvalue.temp
into another temporarily variable and then copies this variable
into the input parameter for next iteration.

This patch passes bootstrap build and bootstrap build with AddressSanitizer.

Differential Revision: https://reviews.llvm.org/D85614
2021-05-25 11:35:48 +03:00
Amara Emerson
afaf301e48 [GlobalISel] Fix MachineIRBuilder not using the DstOp argument for G_SHUFFLE_VECTOR. 2021-05-25 00:43:26 -07:00
Ben Shi
a929002e61 [RISCV] Optimize xor/or with immediate in the zbs extension
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D102893
2021-05-25 14:14:09 +08:00
Lang Hames
9ee7d4ffa3 [JITLink] Suppress expect-death test in release mode. 2021-05-24 22:57:10 -07:00
Max Kazantsev
701a7ca441 [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration
This patch handles one particular case of one-iteration loops for which SCEV
cannot straightforwardly prove BECount = 1. The idea of the optimization is to
symbolically execute conditional branches on the 1st iteration, moving in topoligical
order, and only visiting blocks that may be reached on the first iteration. If we find out
that we never reach header via the latch, then the backedge can be broken.

Differential Revision: https://reviews.llvm.org/D102615
Reviewed By: reames
2021-05-25 12:43:31 +07:00
Max Kazantsev
bebc3be883 [Test] Add test for unreachable backedge with duplicating predecessors 2021-05-25 12:43:31 +07:00
Christudasan Devadasan
022be2495f AMDGPU/GlobalISel: Legalize G_[SU]DIVREM instructions
Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D100726
2021-05-25 10:51:07 +05:30
Lang Hames
5ce7249a28 [JITLink] Enable creation and management of mutable block content.
This patch introduces new operations on jitlink::Blocks: setMutableContent,
getMutableContent and getAlreadyMutableContent. The setMutableContent method
will set the block content data and size members and flag the content as
mutable. The getMutableContent method will return a mutable copy of the existing
content value, auto-allocating and populating a new mutable copy if the existing
content is marked immutable. The getAlreadyMutableMethod asserts that the
existing content is already mutable and returns it.

setMutableContent should be used when updating the block with totally new
content backed by mutable memory. It can be used to change the size of the
block. The argument value should *not* be shared with any other block.

getMutableContent should be used when clients want to modify the existing
content and are unsure whether it is mutable yet.

getAlreadyMutableContent should be used when clients want to modify the existing
content and know from context that it must already be immutable.

These operations reduce copy-modify-update boilerplate and unnecessary copies
introduced when clients couldn't me sure whether the existing content was
mutable or not.
2021-05-24 22:09:36 -07:00
Arthur Eubanks
05ebdcd268 Making Instrumentation aware of LoopNest Pass
Intrumentation callbacks are not made aware of LoopNest passes. From the loop pass manager, we can pass the outermost loop of the LoopNest to instrumentation in case of LoopNest passes.

The current patch made the change in two places in StandardInstrumentation.cpp. I will submit a proper patch where the OuterMostLoop is passed from the LoopPassManager to the call backs. That way we will avoid making changes at multiple places in StandardInstrumentation.cpp.

A testcase also will be submitted.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D102463
2021-05-24 20:25:52 -07:00
maekawatoshiki
b5c1f57fc4 Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit d65c32fb41b03a35a2a16330ba1ea15cf6818f04.
2021-05-25 11:39:49 +09:00
David Blaikie
8e3f8bcb4e Add a range-based wrapper for std::unique(begin, end, binary_predicate) 2021-05-24 17:26:46 -07:00
Vitaly Buka
cebde3ea57 [NFC][OMP] Fix 'unused' warning 2021-05-24 17:14:38 -07:00
Jonas Devlieghere
8e74c099a0 [dsymutil] Emit an error when the Mach-O exceeds the 4GB limit.
The Mach-O object file format is limited to 4GB because its used of
32-bit offsets in the header. It is possible for dsymutil to (silently)
emit an invalid binary. Instead of having consumers deal with this, emit
an error instead.
2021-05-24 16:29:06 -07:00
Jonas Devlieghere
8f5c83a0a1 [dsymutil] Use EXIT_SUCCESS and EXIT_FAILURE (NFC) 2021-05-24 16:29:05 -07:00
Jonas Devlieghere
5e6dceac4f [dsymutil] Compute the output location once per input file (NFC)
Compute the location of the output file just once outside the loop over
the different architectures.
2021-05-24 16:29:05 -07:00
Anton Afanasyev
114349058c [SLP] Fix "gathering" of insertelement instructions
For rare exceptional case vector tree node (insertelements for now only)
is marked as `NeedToGather`, this case is processed by patch. Follow-up
of D98714 to fix bug reported here https://reviews.llvm.org/D98714#2764135.

Differential Revision: https://reviews.llvm.org/D102675
2021-05-25 01:35:43 +03:00
Hongtao Yu
68eaf84013 [NFC][CSSPGO]llvm-profge] Fix Build warning dueo to an attrbute usage. 2021-05-24 12:59:02 -07:00
Hongtao Yu
9a044d4e9b [CSSPGO][llvm-profgen] Report samples for untrackable frames.
Fixing an issue where samples collected for an untrackable frame is not reported. An untrackable frame refers to a frame whose caller is untrackable due to missing debug info or pseudo probe. Though the frame is connected to its parent frame through the frame pointer chain at runtime, the compiler cannot build the connection without debug info or pseudo probe. In such case we just need to report the untrackable frame as the base frame and all of its child frames.

With more samples reported I'm seeing this improves the performance of an internal benchmark by 2.5%.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D102961
2021-05-24 12:39:12 -07:00
Nick Desaulniers
6d2f5ef7fe fix up test from D102742
In D102742, I mistakenly put the split file designator above a bunch of
CHECK lines, which unintentionally removed the CHECKs from actually
being verified.

This can be verified by observing:
<build dir>/test/CodeGen/X86/Output/stack-protector-3.ll.tmp/main.ll
2021-05-24 12:09:02 -07:00
LLVM GN Syncbot
604164d943 [gn build] Port b510e4cf1b96 2021-05-24 18:48:17 +00:00
Craig Topper
9ed3d57a44 [RISCV] Add a vsetvli insert pass that can be extended to be aware of incoming VL/VTYPE from other basic blocks.
This is a replacement for D101938 for inserting vsetvli
instructions where needed. This new version changes how
we track the information in such a way that we can extend
it to be aware of VL/VTYPE changes in other blocks. Given
how much it changes the previous patch, I've decided to
abandon the previous patch and post this from scratch.

For now the pass consists of a single phase that assumes
the incoming state from other basic blocks is unknown. A
follow up patch will extend this with a phase to collect
information about how VL/VTYPE change in each block and
a second phase to propagate this information to the entire
function. This will be used by a third phase to do the
vsetvli insertion.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D102737
2021-05-24 11:47:27 -07:00
LLVM GN Syncbot
90d57fab4e [gn build] Port a64ebb863727 2021-05-24 18:36:50 +00:00