1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
Commit Graph

215353 Commits

Author SHA1 Message Date
Arthur Eubanks
0883647021 [docs][NewPM] Add section on analyses
Reviewed By: asbirlea, ychen

Differential Revision: https://reviews.llvm.org/D100912
2021-05-03 10:15:02 -07:00
Chris Lattner
7dfeebafe5 [Support/Parallel] Add a special case for 0/1 items to llvm::parallel_for_each.
This avoids the non-trivial overhead of creating a TaskGroup in these degenerate
cases, but also exposes parallelism.  It turns out that the default executor
underlying TaskGroup prevents recursive parallelism - so an instance of a task
group being alive will make nested ones become serial.

This is a big issue in MLIR in some dialects, if they have a single instance of
an outer op (e.g. a firrtl.circuit) that has many parallel ops within it (e.g.
a firrtl.module).  This patch side-steps the problem by avoiding creating the
TaskGroup in the unneeded case.  See this issue for more details:
https://github.com/llvm/circt/issues/993

Note that this isn't a really great solution for the general case of nested
parallelism.  A redesign of the TaskGroup stuff would be better, but would be
a much more invasive change.

Differential Revision: https://reviews.llvm.org/D101699
2021-05-03 10:08:00 -07:00
David Green
26a0ca2d62 [AArch64] Fold CSEL x, x, cc -> x
This can come up in rare situations, where a csel is created with
identical operands. These can be folded simply to the original value,
allowing the csel to be removed and further simplification to happen.

This patch also removes FCSEL as it is unused, not being produced
anywhere or lowered to anything.

Differential Revision: https://reviews.llvm.org/D101687
2021-05-03 17:34:05 +01:00
Anirudh Prasad
b06531e8b0 [SystemZ][z/OS] Enforce prefix-less registers in SystemZAsmParser for the HLASM dialect.
- Previously, https://reviews.llvm.org/D101308 removed prefixes from register while printing them out. This was especially needed for inline asm statements which used input/output operands.
- However, the backend SystemZAsmParser, accepts both prefixed registers and prefix-less registers as part of its implementation
- This patch aims to change that by ensuring that prefixed registers are only allowed for the ATT dialect.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D101665
2021-05-03 11:44:44 -04:00
Alexey Bataev
edad8dc50d [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 08:06:20 -07:00
Dávid Bolvanský
e5c45a7f5b [InstCombine] cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true' (PR50172)
Zext doesn't change the number of trailing zeros, so narrow cttz(zext(x)) -> zext(cttz(x)) if the 'ZeroIsUndef' parameter is 'true'.

Proofs:
https://alive2.llvm.org/ce/z/o2dnjY

Solves https://bugs.llvm.org/show_bug.cgi?id=50172

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D101582
2021-05-03 17:05:12 +02:00
Jon Roelofs
88d0ab2b78 Partial revert of "Use std::foo_t rather than std::foo in LLVM." in googlebench
Since googlebench builds as c++11, the change there is incorrect and breaks the
googlebench build when the STL implementation is strict about std::enable_if_t
not being available in lesser c++ versions.

partial revert of: 1bd6123b781120c9190b9ba58b900cdcb718cdd1 (https://reviews.llvm.org/D74384)

Differential Revision: https://reviews.llvm.org/D101583
2021-05-03 07:49:30 -07:00
Alexey Bataev
93f0893ced Revert "[SLP]Allow masked gathers only if allowed by target."
This reverts commit b5f64768cfeecca16c7c9c53cbd97ac7289c43aa to fix
a compiler crash revealed by buildbots.
2021-05-03 07:20:00 -07:00
Alexey Bataev
0683521de4 [SLP]Allow masked gathers only if allowed by target.
Need to check if target allows/supports masked gathers before trying to
estimate its cost, otherwise we may fail to vectorize some of the
patterns because of too pessimistic cost model.

Part of D57059.

Differential Revision: https://reviews.llvm.org/D101297
2021-05-03 06:45:42 -07:00
Konstantin Zhuravlyov
173adcc9eb AMDGPU: XFAIL LLVM::note-amd-valid-v2.test for big endian 2021-05-03 09:45:19 -04:00
Florian Hahn
43890ed791 [LV] Iterate over recipes in VPlan to fix PHI (NFC).
As we gradually move more elements of LV to VPlan, we are trying to
reduce the number of places that still has to check IR of the original
loop.

This patch adjusts the code to fix cross iteration phis to get the PHIs
to fix directly from the VPlan that is executed. We still need the
original PHI to check for first-order recurrences, but we can get rid of
that once we model that explicitly in VPlan as well.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D99293
2021-05-03 14:09:46 +01:00
LLVM GN Syncbot
74312e44fa [gn build] Port 1527a5e4b483 2021-05-03 12:53:10 +00:00
Abhina Sreeskantharajan
8c2f634996 [SystemZ][z/OS] Add the functions needed for handling EBCDIC I/O
This patch adds the basic functions needed for controlling auto conversion on z/OS.
Auto conversion is enabled on untagged input file to ASCII by making the assumption that all untagged files are EBCDIC encoded. Output files are auto converted to EBCDIC IBM-1047.
This change also enables conversion for stdin/stdout/stderr.

For more information on how fcntl controls codepage https://www.ibm.com/docs/en/zos/2.4.0?topic=descriptions-fcntl-bpx1fct-bpx4fct-control-open-file-descriptors

Reviewed By: anirudhp

Differential Revision: https://reviews.llvm.org/D100483
2021-05-03 08:52:38 -04:00
Sanjay Patel
e3bd2dc38e [InstCombine] improve demanded bits analysis of left-shifted operand
If we don't demand high bits, then we also don't care about those
high bits of a left-shift operand regardless of shift amount.
I noticed the sext/trunc pattern in a motivating example.
It seems like there should be a low-bits with right-shift sibling,
but I haven't looked at that yet.

https://alive2.llvm.org/ce/z/JuS6jc
https://rise4fun.com/Alive/Trm (not sure how to use 'width' with Alive1)
https://alive2.llvm.org/ce/z/gRadbF

Differential Revision: https://reviews.llvm.org/D101489
2021-05-03 08:39:20 -04:00
David Green
c7e45bd25e [ARM] Memory operands for MVE gathers/scatters
Similarly to D101096, this makes sure that MMO operands get propagated
through from MVE gathers/scatters to the Machine Instructions. This
allows extra scheduling freedom, not forcing the instructions to act as
scheduling barriers. We create MMO's with an unknown size, specifying
that they can load from anywhere in memory, similar to the masked_gather
or X86 intrinsics.

Differential Revision: https://reviews.llvm.org/D101219
2021-05-03 11:24:59 +01:00
Fraser Cormack
7f2df8f0bc [RISCV] Add support for fmin/fmax vector reductions
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D101518
2021-05-03 10:33:51 +01:00
Christian Kühnel
dd38cc31c5 [doc] typo fixes
as proposed by @FlashSheridan in
https://reviews.llvm.org/rG7f9717b922d4
2021-05-03 10:59:51 +02:00
Sebastian Neubauer
4d62056781 [AMDGPU] Do not annotate features for graphics
SITargetLowering::LowerFormalArguments asserts that none of these
features are used for graphics calling conventions, so
AnnotateKernelFeatures should not add them.

Differential Revision: https://reviews.llvm.org/D101534
2021-05-03 10:33:11 +02:00
Reshabh Sharma
8c125da418 [ASAN][AMDGPU] Add support for accesses to global and constant addrspaces
Add address sanitizer instrumentation support for accesses to global
and constant address spaces in AMDGPU. It strictly avoids instrumenting
the stack and assumes x86 as the host.

Reviewed by: vitalybuka

Differential Revision: https://reviews.llvm.org/D99071
2021-05-03 09:01:15 +05:30
Konstantin Zhuravlyov
864ab932a8 Reland "AMDGPU/llvm-readobj: Add missing tests for note parsing/displaying"
This reverts commit 54aad6365951247e9f18c718c14422745b3afa4c.

Includes fix for note-amd-valid-v3.s test.
2021-05-02 22:56:17 -04:00
Sergio Perez Gonzalez
346a580fc6 [Object] Fix e_machine description for EM_CR16 and add EM_MICROBLAZE
Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101133
2021-05-02 19:25:39 -07:00
David Green
63dd7e3e82 [ARM] Transfer memory operands for VLDn
We create MMO's for the VLDn/VSTn intrinsics in ARMTargetLowering::
getTgtMemIntrinsic, but they do not currently make it ll the way through
ISel.  This changes that in the various places it needs changing, making
sure that the MMO is propagate through to the final instruction. This
can help in scheduling, not treating the VLD2/VST2 as a scheduling
barrier.

Differential Revision: https://reviews.llvm.org/D101096
2021-05-03 00:04:21 +01:00
Stelios Ioannou
cac69bd2c1 [AArch64] Sets the preferred function alignment for Cortex-A53/A55.
Setting the preffered function alignment to 16 for Cortex A53/A55
improves performance in a wide range of benchmarks. This brings it
in line with the Cortex-A53/A55 tuning that is used in GCC
(gcc/config/aarch64/aarch64.c).

Differential Revision: https://reviews.llvm.org/D101636

Change-Id: I2ce47fe7ab5e3b54f49c89038d8da4e404742de2
2021-05-03 00:00:10 +01:00
Craig Topper
b555d38aa1 [TableGen] Use sign rotated VBR for OPC_EmitInteger.
This allows for a much more efficient encoding for small negative
numbers by storing the sign bit first and negating the rest of
the bits. This was already being used for OPC_CheckInteger.

For every in tree target this affects, the table got smaller.
R600GenDAGISel.inc saw the largest reduction of 7K.

I did have to add a new opcode for StringIntegers used for
register class ids and subregister indices since we don't have the
integer value to encode. The enum name is emitted directly into
the table. Previously assumed the enum would expand to a positive
7-bit number. We might be able to just shift that right by 1 and
assume it is a positive 6 bit number, but that will need more
investigation.
2021-05-02 12:40:44 -07:00
Craig Topper
3b8bc4cd8f [RISCV] Store SEW in RISCV vector pseudo instructions in log2 form.
This shrinks the immediate that isel table needs to emit for these
instructions. Hoping this allows me to change OPC_EmitInteger to
use a better variable length encoding for representing negative
numbers. Similar to what was done a few months ago for OPC_CheckInteger.

The alternative encoding uses less bytes for negative numbers, but
increases the number of bytes need to encode 64 which was a very
common number in the RISCV table due to SEW=64. By using Log2 this
becomes 6 and is no longer a problem.
2021-05-02 12:09:20 -07:00
Arthur Eubanks
2b3350003c [NFC] Use Aliasee to determine Type and AddrSpace in GlobalAlias::create()
As opposed to going through the Aliasee type.

For opaque pointers, we're trying to remove uses of PointerType::getElementType().

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101715
2021-05-02 11:50:08 -07:00
Florian Hahn
b7eb3dfd3b [VPlan] Add VPBasicBlock::phis() helper (NFC).
This patch introduces a helper to obtain an iterator range for the
PHI-like recipes in a block.

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D100101
2021-05-02 19:20:13 +01:00
Nikita Popov
afad6987f7 [SCEV] Add test for non-unit stride with multiple exits (NFC)
We currently can't determine any exit counts here, because there
is no "controlling exit".
2021-05-02 18:14:05 +02:00
Juneyoung Lee
a6a81be470 [InstCombine] Add a few more patterns for folding select of select
This is a patch that folds select of select to salvage some optimizations after select -> and/or folding is disabled.

```
select (select a, true, b), c, false -> select a, c, false
select c, (select a, true, b), false -> select c, a, false
  if c implies that b is false (isImpliedCondition).
```
https://alive2.llvm.org/ce/z/ANatjt, https://alive2.llvm.org/ce/z/rv8zTB

```
sel (sel c, a, false), true, (sel !c, b, false) -> sel c, a, b
sel (sel !c, a, false), true, (sel c, b, false) -> sel c, b, a
```
https://alive2.llvm.org/ce/z/U2kp-t, https://alive2.llvm.org/ce/z/bc88EE

See D101191

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D101375
2021-05-02 19:00:42 +09:00
Juneyoung Lee
5ffbd4dbc2 [InstCombine] Precommit tests for D101375 (NFC) 2021-05-02 19:00:42 +09:00
Juneyoung Lee
33fda542f4 Fix MSan crash after 1977c53b 2021-05-02 13:44:43 +09:00
Arthur Eubanks
0086e8ade4 [NFC] Use getParamByValType instead of pointee type
To reduce dependence on pointee types for opaque pointers.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D101706
2021-05-01 21:22:41 -07:00
Juneyoung Lee
0884a1465d run update_test_checks.py for the tests in D101191 (NFC)
This is an NFC that reruns update_test_checks.py on the tests that are
going to be updated in D101191.
2021-05-02 13:11:57 +09:00
Juneyoung Lee
b4b7d6d0cc [ValueTracking] ctpop propagates poison
This is a patch that adds ctpop intrinsics to propagatesPoison.

Splitted from D101191
2021-05-02 13:04:37 +09:00
Juneyoung Lee
75528017f8 [ValueTracking] Improve impliesPoison to look into overflow intrinsics
This update supports the following transformation:

```
select(extract(mul_with_overflow(a, _), _), (a == 0), false)
=>
and(extract(mul_with_overflow(a, _), _), (a == 0))
```

which is correct because if `a` was poison the select's condition was
also poison.

This update is splitted from D101423.
2021-05-02 12:03:55 +09:00
LLVM GN Syncbot
6038981d86 [gn build] Port 1977c53b2ae4 2021-05-02 02:55:47 +00:00
Juneyoung Lee
c112624b0b [InstCombine] Fold overflow bit of [u|s]mul.with.overflow in a poison-safe way
As discussed in D101191, this patch adds a poison-safe folding of overflow bit check:
```
  %Op0 = icmp ne i4 %X, 0
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = select i1 %Op0, i1 %Op1, i1 false
=>
  %Y.fr = freeze %Y
  %Agg = call { i4, i1 } @llvm.[us]mul.with.overflow.i4(i4 %X, i4 %Y.fr)
  %Op1 = extractvalue { i4, i1 } %Agg, 1
  %ret = %Op1
```

https://alive2.llvm.org/ce/z/zgPUGT
https://alive2.llvm.org/ce/z/h2gZ_6

Note that there are cases where inserting freeze is not necessary: e.g. %Y is `noundef`.
In this case, LLVM is already good because `%ret` is already successfully folded into `and`,
triggering the pre-existing optimization in InstSimplify: https://godbolt.org/z/v6qena15K

Differential Revision: https://reviews.llvm.org/D101423
2021-05-02 11:54:12 +09:00
Juneyoung Lee
e820cde371 [InstCombine] Precommit tests for D101423 (NFC) 2021-05-02 11:41:47 +09:00
Harald van Dijk
9a4efe9fab [X32][CET] Fix size and alignment of .note.gnu.property section
X32 uses 32-bit ELF object files with 32-bit alignment, so the
.note.gnu.property section needs to be emitted as it is for X86.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D101689
2021-05-01 22:17:04 +01:00
Nikita Popov
adfe3b159b [LVI] Handle mask not equal zero conditions
If V & Mask != 0, we know that at least one of the bits in Mask
must be set, so the value must be >= the lowest bit in Mask.
2021-05-01 23:08:49 +02:00
Nikita Popov
1339e10140 [CVP] Add tests for mask not equal zero guard (NFC) 2021-05-01 23:08:48 +02:00
Roman Lebedev
fed21911ba [X86] AMD Zen 3 Scheduler Model
Introduce basic schedule model for AMD Zen 3 CPU's, a.k.a `znver3`.

This is fully built from scratch, from llvm-mca measurements
and documented reference materials.
Nothing was copied from `znver2`/`znver1`.

I believe this is in a reasonable state of completion for inclusion,
probably better than D52779 `bdver2` was :)

Namely:
* uops are pretty spot-on (at least what llvm-mca can measure)
  {F16422596}
* latency is also pretty spot-on (at least what llvm-mca can measure)
  {F16422601}
* throughput is within reason
  {F16422607}

I haven't run much benchmarks with this,
however RawSpeed benchmarks says this is beneficial:
{F16603978}
{F16604029}

I'll call out the obvious problems there:
* i didn't really bother with X87 instructions
* i didn't really bother with obviously-microcoded/system instructions
* There are large discrepancy in throughput for `mr` and `rm` instructions.
  I'm not really sure if it's a modelling defect that needs to be fixed,
  or it's a defect of measurments.
* Pipe distributions are probably bad :)
  I can't do much here until AMD allows that to be fixed
  by documenting the appropriate counters and updating libpfm

That being said, as @RKSimon notes:
>>! In D94395#2647381, @RKSimon wrote:
> I'll mention again that all the znver* models appear to be very inaccurate wrt SIMD/FPU instructions <...>
so how much worse this could possibly be?!

Things that aren't there:
* Various tunings: zero idioms, etc. That is follow-ups.

Differential Revision: https://reviews.llvm.org/D94395
2021-05-01 22:08:13 +03:00
Nikita Popov
b39114fa53 [SCEV] Simplify backedge count clearing (NFC)
This seems to be a leftover from when the BackedgeTakenInfo
stored multiple exit counts with manual memory management. At
some point this was switchted to a simple vector, and there should
be no need to micro-manage the clearing anymore. We can simply
drop the loop from the map and the the destructor do its job.
2021-05-01 17:50:01 +02:00
Nikita Popov
26eb9e64e1 [IndVars] Remove redundant loop invariance check (NFC)
This is checked again directly below this condition.
2021-05-01 17:22:00 +02:00
LemonBoy
f58a03724a [AArch64] Prevent spilling between ldxr/stxr pairs
Apply the same logic used to check if CMPXCHG nodes should be expanded
at -O0: the register allocator may end up spilling some register in
between the atomic load/store pairs, breaking the atomicity and possibly
stalling the execution.

Fixes PR48017

Reviewed By: efriedman

Differential Revision: https://reviews.llvm.org/D101163
2021-05-01 17:17:05 +02:00
Nikita Popov
0c935681c7 [SCEV] Add tests for and/or loop guards (NFC) 2021-05-01 17:10:23 +02:00
LemonBoy
c556195c00 [NFC][ARM] Regenerate arm64-atomic.ll test
Pre-requisite for D101163, the `NOLSE-0O` case shows registers being
spilled inside the rmw loop.

Use two separate prefixes for the `LSE-O0` case as some outputs differ
only by a comment that update_llc_test_checks.py ignores but lit does
not, causing the test to fail unexpectedly when run.
2021-05-01 16:29:03 +02:00
LemonBoy
2056e830c7 Revert "[NFC][ARM] Regenerate arm64-atomic.ll test"
This reverts commit d9856b12f2be257f1b7aaccde71eb0421a1aaaf3.
2021-05-01 13:00:45 +02:00
LemonBoy
72ba5f53d1 [NFC][ARM] Regenerate arm64-atomic.ll test
Pre-requisite for D101163, the NOLSE-0O case shows registers being
spilled inside the rmw loop.
2021-05-01 12:09:14 +02:00
Nikita Popov
7ce20b169c [InstCombine] Add eq-of-parts tests using or (NFC)
or-ne is the conjugated pattern for and-eq.
2021-05-01 11:57:09 +02:00