1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 03:02:36 +01:00
Commit Graph

217109 Commits

Author SHA1 Message Date
Koutheir Attouchi
3a880fd8a3 Do not generate calls to the 128-bit function __multi3() on 32-bit ARM
Re-applying this patch after bots failures. Should be fine now.

The function __multi3() is undefined on 32-bit ARM, so a call to it should
never be emitted. Instead, plain instructions need to be generated to
perform 128-bit multiplications.

Differential Revision: https://reviews.llvm.org/D103906
2021-06-11 11:45:21 +01:00
Rosie Sumpter
b4082e9aee [CostModel][AArch64] Improve the cost estimate of CTPOP intrinsic
Added a case for CTPOP to AArch64TTIImpl::getIntrinsicInstrCost so that
the cost estimate matches the codegen in
test/CodeGen/AArch64/arm64-vpopcnt.ll

Differential Revision: https://reviews.llvm.org/D103952
2021-06-11 11:15:46 +01:00
Simon Pilgrim
bb3852411e [llvm-stress] Fix dead code preventing us generating per-element vector selects
This has been reported several times by the PVS Studio team as well as coming up in some static analysis.

getRandom() % 1 always returns 0 so we never actually test this codepath, (git blame suggests this has always been like this) - given that we have plenty of other "getRandom() & 1" the typo is pretty obvious, and matches the intention in the comment above - with this change we generate a nice mixture of scalar/vector condition selects of vectors.

I don't know llvm-stress that well - but I don't think we guarantee that the same seed value will always generate the same IR for later versions of the program - just that the same binary would.

Differential Revision: https://reviews.llvm.org/D104022
2021-06-11 10:56:19 +01:00
Roman Lebedev
43b5ee114b [VectorCombine] scalarizeLoadExtract(): use computeAlignmentAfterScalarization() helper
This results in slightly more optimistic alignments in some cases
2021-06-11 12:47:10 +03:00
Roman Lebedev
32191d3791 [NFC][VectorCombine] Extract computeAlignmentAfterScalarization() helper function 2021-06-11 12:47:09 +03:00
Bing1 Yu
c7250ce1db [X86] Support __tile_stream_loadd intrinsic for new AMX interface
Adding support for __tile_stream_loadd intrinsic.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D103784
2021-06-11 17:28:43 +08:00
Simon Pilgrim
5c5e621290 SampleProf.h - fix spelling mistake in assert message. NFC. 2021-06-11 10:24:14 +01:00
Simon Pilgrim
cac15052d4 [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Simon Pilgrim
2e422e4f72 Fix implicit dependency on <string> header. NFCI. 2021-06-11 10:24:14 +01:00
LLVM GN Syncbot
a7bd77dff1 [gn build] Port c4a0969b9c14 2021-06-11 08:23:07 +00:00
Sjoerd Meijer
6a49dbd1a3 Function Specialization Pass
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.

This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.

This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:

https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.

The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.

Function specialisation has great potential to increase compile-times and
code-size.  The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
  increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
  in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
  everywhere a little bit more time is spent processing more functions and
  instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
  show up very high in the profile of the optimisation passes.

The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.

Differential Revision: https://reviews.llvm.org/D93838
2021-06-11 09:11:29 +01:00
Qiu Chaofan
b4c3d240d2 [PowerPC] Relax register superclasses for paired memops
Relaxing superclass constraint for VSX register classes helps reducing
32-byte spills and copies when register pressure is high.

In test case affected, some of them introduces more copies due to new
allocation order. However, this patch should not be the root cause, and
we may be able to fix it in other places of register allocation.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D104006
2021-06-11 14:54:03 +08:00
Hsiangkai Wang
6f9d560485 [RISCV] Avoid scalar outgoing argumetns overwriting vector frame objects.
When using FP to access stack objects, the scalable stack objects will
be put at the lower end of the frame. It looks like

```
|-------------------|  <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|
| RVV local vars    |
|-------------------|  <-- SP
```

If there are scalar arguments that need to pass through memory and there
are vector objects on the stack using FP to access. The outgoing scalar
arguments will overwrite the vector objects. It looks like

```
|-------------------|  <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|         |-------------------|
| RVV local vars    |         | outgoing args     | <- outgoing arguments
|-------------------|  <-- SP |-------------------|    overwrite from here.
```

In this patch, we reserve the stack for the outgoing arguments before
function calls if using FP to access and there are scalable vector frame
objects. It looks like

```
|-------------------|  <-- FP
| callee-saved regs |
|-------------------|
| scalar local vars |
|-------------------|
| RVV local vars    |
|-------------------|
| outgoing args     |
|-------------------|  <-- SP
```

Differential Revision: https://reviews.llvm.org/D103622
2021-06-11 12:26:29 +08:00
Qiu Chaofan
7331aa3b53 [VectorCombine] Fix alignment in single element store
This fixes the concern in single element store scalarization that the
alignment of new store may be larger than it should be. It calculates
the largest alignment if index is constant, and a safe one if not.

Reviewed By: lebedev.ri, spatel

Differential Revision: https://reviews.llvm.org/D103419
2021-06-11 10:28:15 +08:00
Craig Topper
495448f11e [RISCV] Use ComputeNumSignBits/MaskedValueIsZero in RISCVDAGToDAGISel::selectSExti32/selectZExti32.
This helps us select W instructions in more cases. Most of the
affected tests have had the sign_extend_inreg or AND folded into
sextload/zextload.

Differential Revision: https://reviews.llvm.org/D104079
2021-06-10 19:06:45 -07:00
Amara Emerson
fcb703cbf2 [AArch64][GlobalISel] Fix incorrectly generating uxtw/sxtw for addressing modes.
When the extend is from 8 or 16 bits, the addressing modes don't support those
extensions, but we weren't checking that and therefore always generated the 32->64b
extension mode. Fun.

Differential Revision: https://reviews.llvm.org/D104070
2021-06-10 16:59:39 -07:00
Carl Ritson
15d8bd80ce [ValueTypes] Define MVTs for v6i32, v6f32, v7i32, v7f32
For use in AMDGPU selection DAG.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103881
2021-06-11 08:58:16 +09:00
Carl Ritson
03819dac19 [SDAG] Fix pow2 assumption when splitting vectors
When reducing vector builds to shuffles it possible that
the DAG combiner may try to extract invalid subvectors.

This happens as the existing code assumes vectors will be power
of 2 sizes, which is already untrue, but becomes more noticable
with v6 and v7 types.
Specifically the existing code assumes that half PowerOf2Ceil of
a given vector index will fit twice into a given vector.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103880
2021-06-11 08:58:16 +09:00
Craig Topper
4541f1286d [RISCV] Add test cases that show failure to use some W instructions if they are proceeded by a load. NFC
The loads end up becoming sextload/zextload which prevent our
isel patterns from finding the sign_extend_inreg or AND instruction
we need.

The easiest way to fix this is to use computeKnownBits or
ComputeNumSignBits in our isel matching to catch this.
2021-06-10 16:55:49 -07:00
Sami Tolvanen
139e93a37c [IR] Value: Fix OpCode checks
Value::SubclassID cannot be directly compared to Instruction enums, such as
Instruction::{Call,Invoke,CallBr}. We have to first subtract InstructionVal
from the SubclassID to get the OpCode, similar to Instruction::getOpCode().

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D104043
2021-06-10 16:46:33 -07:00
Wolfgang Pieb
249a0bf5b1 [static initializers] Emit global_ctors and global_dtors in reverse order when .ctors/.dtors are used.
Reviewed By: rnk, MaskRay, efriedma

Differential Revision: https://reviews.llvm.org/D103495
2021-06-10 16:44:47 -07:00
Slava Nikolaev
324eb2c007 LoadStoreVectorizer: support different operand orders in the add sequence match
First we refactor the code which does no wrapping add sequences
match: we need to allow different operand orders for
the key add instructions involved in the match.

Then we use the refactored code trying 4 variants of matching operands.

Originally the code relied on the fact that the matching operands
of the two last add instructions of memory index calculations
had the same LHS argument. But which operand is the same
in the two instructions is actually not essential, so now we allow
that to be any of LHS or RHS of each of the two instructions.
This increases the chances of vectorization to happen.

Reviewed By: volkan

Differential Revision: https://reviews.llvm.org/D103912
2021-06-10 16:31:35 -07:00
Nick Desaulniers
e9e1661fa2 [IR] make -warn-frame-size into a module attr
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
2021-06-10 16:15:27 -07:00
Andy Kaylor
52ecff94eb Preserve more MD_mem_parallel_loop_access and MD_access_group in SROA
SROA sometimes preserves MD_mem_parallel_loop_access and MD_access_group metadata on loads/stores, and sometimes fails to do so. This change adds copying of the MD after other CreateAlignedLoad/CreateAlignedStores. Also fix a case where the metadata was being copied from a load, rather than the store.

Added a LIT test to catch one case.

Patch by Mark Mendell

Differential Revision: https://reviews.llvm.org/D103254
2021-06-10 15:47:03 -07:00
Jessica Paquette
2a84a282fe [AArch64][GlobalISel] Legalize scalar G_CTTZ + G_CTTZ_ZERO_UNDEF
This adds legalization for scalar G_CTTZ and G_CTTZ_ZERO_UNDEF. Vector support
requires handling vector G_BITREVERSE, which I haven't gotten around to yet.

For G_CTTZ_ZERO_UNDEF, we just lower it to G_CTTZ.

For G_CTTZ, we match SelectionDAG's lowering to a G_BITREVERSE + G_CTLZ.

e.g. https://godbolt.org/z/nPEseYh1s

(With this patch, we have slightly worse codegen than SDAG for types smaller
than s32; it seems like we're missing a combine.)

Also, this adds in a function to build G_BITREVERSE to MachineIRBuilder.

Differential Revision: https://reviews.llvm.org/D104065
2021-06-10 15:29:51 -07:00
Joachim Meyer
6b73f118b0 [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Sanjay Patel
2425855981 [SimplifyCFG] avoid 'tmp' variables in test file; NFC 2021-06-10 17:04:23 -04:00
David Green
b06694ed39 [ARM] Fix Changed status in MVEGatherScatterLoweringPass.
Now that we are calling SimplifyInstructionsInBlock, make sure we update
Changed when it reports alterations.
2021-06-10 21:53:04 +01:00
Philip Reames
f2dafac922 [LI] Add a cover function for checking if a loop is mustprogress [nfc]
Essentially, the cover function simply combines the loop level check and the function level scope into one call.  This simplifies several callers and is (subjectively) less error prone.
2021-06-10 13:37:32 -07:00
Philip Reames
d870f55264 [SCEV] Use mustprogress flag on loops (in addition to function attribute)
This addresses a performance regression reported against 3c6e4191.  That change (correctly) limited a transform based on assumed finiteness to mustprogress loops, but the previous change (38540d7) which introduced the mustprogress check utility only handled function attributes, not the loop metadata form.

It turns out that clang uses the function attribute form for C++, and the loop metadata form for C.  As a result, 3c6e4191 ended up being a large regression in practice for C code as loops weren't being considered mustprogress despite the language semantics.
2021-06-10 13:20:28 -07:00
Philip Reames
52d05589ca Move code for checking loop metadata into Analysis [nfc]
I need the mustprogress loop metadata in ScalarEvolution and it makes sense to keep all the accessors for quering loop metadate together.
2021-06-10 13:01:22 -07:00
LLVM GN Syncbot
c9694dc562 [gn build] Port bbb3d03f93b8 2021-06-10 19:39:58 +00:00
Michael Kruse
4460a4c76f [OpenMP] Implement '#pragma omp unroll'.
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99459
2021-06-10 14:30:17 -05:00
David Green
a3bd30b5e4 [ARM] Ensure instructions are simplified prior to GatherScatter lowering.
Surprisingly, not all instructions are always simplified after unrolling
and before MVE gather/scatter lowering. Notably dead gather operations
can be left around which cause the gather/scatter lowering pass to crash
if there are multiple gathers, some of which are dead.

This patch ensures they are simplified before we modify anything, which
can change some of the existing tests, including making them no-longer
test what they originally tested. This uses a combination of disabling
the gather/scatter lowering pass and adjusting the test to keep them as
before.

Differential Revision: https://reviews.llvm.org/D103150
2021-06-10 20:18:12 +01:00
Eric Astor
2afb636ed1 [ms] [llvm-ml] Warn on command-line redefinition
If a macro is defined on the command line and then overridden in the source code, this is likely to be an error in the user's build system. We should warn on this.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D104008
2021-06-10 14:20:21 -04:00
Jessica Paquette
46c84c116b [AArch64][GlobalISel] Mark some G_BITREVERSE types as legal + select them
We fall back on G_CTTZ_ZERO_UNDEF a lot when building clang for arm64 with
gisel.

Handling this will require that we can handle G_BITREVERSE.

This patch marks G_BITREVERSE instructions with natively supported types as
legal. We get selection on these types for free via the importer.

Differential Revision: https://reviews.llvm.org/D103999
2021-06-10 10:33:52 -07:00
Alexey Bataev
da3e13d49c [SLP]Disable scheduling of insertelements.
There is no need to schedule insertelement instructions. The compiler
did not schedule them before it started support their vectorization and
it should not do it after. We pre-schedule them manually when finding
a build vector sequence.
Disabling scheduling of insertelement instructions improves compile
time and vectorization of the very large basic blocks by saving
scheduling budget for other instructions.

Differential Revision: https://reviews.llvm.org/D104026
2021-06-10 10:25:26 -07:00
Nico Weber
f63c735892 [gn build] minor TODO.txt update 2021-06-10 12:50:23 -04:00
David Tenty
dcf97ff292 [AIX] Build libLTO as MODULE rather than SHARED
On CMake versions greater that >= 3.16 on AIX, shared libraries are
created as archives (which is the normal form for the platform). However
plugins libraries which are passed directly to a executable, like
libLTO to the linker, are usual build as plain `.so`, so this patch
restores this behaviour for libLTO on AIX (and adjust the name if need be
to account for the fact that llvm_add_library likes to force an empty
name prefix on modules), so we end up with the expected libLTO.so

Reviewed By: w2yehia

Differential Revision: https://reviews.llvm.org/D103824
2021-06-10 12:08:59 -04:00
Keith Smiley
040b1baf1d Fix range-loop-analysis warning
```
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:19: warning: loop variable 'VF' of type 'const llvm::ElementCount' creates a copy from type 'const llvm::ElementCount' [-Wrange-loop-analysis]
  for (const auto VF : VFCandidates) {
                  ^
llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:8024:8: note: use reference type 'const llvm::ElementCount &' to prevent copying
  for (const auto VF : VFCandidates) {
       ^~~~~~~~~~~~~~~
                  &
1 warning generated.
```

Differential Revision: https://reviews.llvm.org/D103970
2021-06-10 08:39:54 -07:00
gbreynoo
7e227c0d37 [docs][llvm-ar] Add rsp-quoting option to the llvm-ar command guide.
I noticed that I did not update the command guide when introducing the
--rsp-quoting option. This change fixes this.

Differential Revision: https://reviews.llvm.org/D103915
2021-06-10 16:32:31 +01:00
Benjamin Kramer
4abec25573 [AArch64] Silence fallthrough warning. NFC.
AArch64TargetTransformInfo.cpp:302:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
  default:
    ^
2021-06-10 17:23:37 +02:00
Luo, Yuanke
83aebff64b [X86][NFC] Fix typo. 2021-06-10 22:49:11 +08:00
Paul C. Anagnostopoulos
5171fb6939 [TableGen] Eliminate dead code in ParseForeachDeclaration [NFC]
Differential Revision: https://reviews.llvm.org/D103904
2021-06-10 10:34:44 -04:00
Irina Dobrescu
e44ee5c21e [AArch64] Add cost tests for bitreverse
This patch includes cost tests for bit reverse as well as some adjustments to the cost model.

Differential Revision: https://reviews.llvm.org/D102755
2021-06-10 14:51:33 +01:00
David Green
b47792a75b [ARM] Skip debug during vpt block creation
Debug info is currently preventing VPT block creation, leading to
different codegen. This patch attempts to skip any debug instructions
during vpt block creation, making sure they do not interfere.

Differential Revision: https://reviews.llvm.org/D103610
2021-06-10 14:49:04 +01:00
David Green
f929d3380c [ARM] MVE VPT block tests with debug info. NFC 2021-06-10 14:49:04 +01:00
Caroline Concatto
e0c2f7cb50 [InstCombine] Add fold for extracting known elements from a stepvector
This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.

Differential Revision: https://reviews.llvm.org/D103153
2021-06-10 13:36:57 +01:00
Eric Astor
db5d12aba7 [ms] [llvm-ml] Make variable redefinition match ML.EXE
MASM specifies that all variable definitions are redefinable, except for EQU definitions to expressions. (TEXTEQU is unspecified, but appears to be fully redefinable as well.)

Also, in practice, ML.EXE allows redefinitions where the value doesn't change.

Make variable redefinition possible for text macros, suppressing expansion if written as the first argument to an EQU or TEXTEQU directive.

Reviewed By: thakis

Differential Revision: https://reviews.llvm.org/D103993
2021-06-10 08:36:15 -04:00
Caroline Concatto
7714042905 [InstSimplify] Add constant fold for extractelement + splat for scalable vectors
This patch allows that scalable vector can fold extractelement and constant splat
only when the lane index is lower than the minimum number of elements of the vector.

Differential Revision: https://reviews.llvm.org/D103180
2021-06-10 12:41:40 +01:00