1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00
Commit Graph

216384 Commits

Author SHA1 Message Date
Philip Reames
88bae72814 [unroll] Use value domain for symbolic execution based cost model
The current full unroll cost model does a symbolic evaluation of the loop up to a fixed limit. That symbolic evaluation currently simplifies to constants, but we can generalize to arbitrary Values using the InstructionSimplify infrastructure at very low cost.

By itself, this enables some simplifications, but it's mainly useful when combined with the branch simplification over in D102928.

Differential Revision: https://reviews.llvm.org/D102934
2021-05-26 08:41:25 -07:00
Jonas Paulsson
0a9439773a [SystemZ] Support i128 inline asm operands.
Support virtual, physical and tied i128 register operands in inline assembly.

i128 is on SystemZ not really supported and is not a legal type and generally
such a value will be split into two i64 parts. There are however some
instructions that require a pair of two GPR64 registers contained in the GR128
bit reg class, which is untyped.

For inline assmebly operands, it proved to be very cumbersome to first follow
the general behavior of splitting an i128 operand into two parts and then
later rebuild the INLINEASM MI to have one GR128 register. Instead, some
minor common code changes were made to SelectionDAGBUilder to only create one
GR128 register part to begin with. In particular:

- getNumRegisters() now has an optional parameter "RegisterVT" which is
  passed by AddInlineAsmOperands() and GetRegistersForValue().

- The bitcasting in GetRegistersForValue is not performed if RegVT is
  Untyped.

- The RC for a tied use in AddInlineAsmOperands() is now computed either from
  the tied def (virtual register), or by getMinimalPhysRegClass() (physical
  register).

- InstrEmitter.cpp:EmitCopyFromReg() has been fixed so that the register
  class (DstRC) can also be computed for an illegal type.

In the SystemZ backend getNumRegisters(), splitValueIntoRegisterParts() and
joinRegisterPartsIntoValue() have been implemented to handle i128 operands.

Differential Revision: https://reviews.llvm.org/D100788

Review: Ulrich Weigand
2021-05-26 10:08:32 -05:00
Andrea Di Biagio
cdf30f3bbb [MCA] Add a test for PR50483. NFC 2021-05-26 15:52:11 +01:00
Anirudh Prasad
0affa6b4e3 [SystemZ][z/OS] Enable the AllowAtInName attribute for the HLASM dialect
- Currently, LLVM supports symbols of the name "token1@token2".
- "token2" is used to identify whether an appropriate symbol reference can be used for the symbol.
- Now, if the symbol reference couldn't be found, the AsmParser usually emits an error, unless the backend is configured to accept the "@" in a symbol name
- Thus, this patch aims to do that. It sets the `AllowAtInName` attribute in the SystemZ backend for the HLASM dialect.
- Setting this attribute ensures that, if a particular symbol reference is found, it uses that. If it doesn't, and there exists an "@" in the symbol name, it will use that instead of explicitly erroring out.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D103111
2021-05-26 10:49:57 -04:00
jweightma
b8e979ccdd [AMDGPU] Fix function pointer argument bug in AMDGPU Propagate Attributes pass.
This patch fixes a bug in the AMDGPU Propagate Attributes pass where a call
instruction with a function pointer argument is identified as a user of the
passed function, and illegally replaces the called function of the
instruction with the function argument.

For example, given functions f and g with appropriate types, the following
illegal transformation could occur without this fix:
call void @f(void ()* @g)
-->
call void @g(void ()* @g.1)

The solution introduced in this patch is to prevent the cloning and
substitution if the instruction's called function and the function which
might be cloned do not match.

Reviewed By: arsenm, madhur13490

Differential Revision: https://reviews.llvm.org/D101847
2021-05-26 16:40:15 +02:00
Anirudh Prasad
f106aa368d [SystemZ][z/OS] Validate symbol names for z/OS for printing without quotes
- Currently, before printing a label in MCSymbol.cpp (MCSymbol::print), the current code "validates" the label that is to be printed.
- If it fails the validation step, then it prints the label within double quotes.
- However, the validation is provided as a virtual function in MCAsmInfo.h (i.e. isAcceptableChar() function). So we can override this for the AD_HLASM dialect in SystemZMCAsmInfo.cpp.

Reviewed By: uweigand

Differential Revision: https://reviews.llvm.org/D103091
2021-05-26 10:37:09 -04:00
Luo, Yuanke
47e83818b4 [X86][AMX] Fix a bug on tile config.
The previous code detect if a MBB is bottom block to determine if it is
a backedge of a loop. We should check latch block instead of bottom
block and we should check the header and the bottom block are in the
same loop.

Differential Revision: https://reviews.llvm.org/D103145
2021-05-26 21:57:49 +08:00
Sjoerd Meijer
4ed1b287cc [CostModel][AArch64] Add tests for bitreverse. NFC. 2021-05-26 14:56:58 +01:00
David Green
6a115c5260 [ARM] Extra test for reverted WLS memset. NFC 2021-05-26 14:54:36 +01:00
Simon Pilgrim
5e9b7fabd4 [X86][SSE] Regenerate some tests to expose the rip relative vector/broadcast loads 2021-05-26 14:50:47 +01:00
Andrea Di Biagio
c31b5f70ca [MCA][InOrderIssueStage] Fix LastWriteBackCycle computation.
Conservatively use the instruction latency to compute the last write-back cycle.
Before this patch, the last write cycle computation was incorrect for store
instructions that didn't declare any register writes.
2021-05-26 14:17:43 +01:00
Alexey Bataev
14c21c99ec [SLP][NFC]Add a test for multiple uses of insertelement instruction,
NFC.
2021-05-26 06:17:03 -07:00
Kerry McLaughlin
d947615252 [LoopVectorize] Enable strict reductions when allowReordering() returns false
When loop hints are passed via metadata, the allowReordering function
in LoopVectorizationLegality will allow the order of floating point
operations to be changed:

  bool allowReordering() const {
    // When enabling loop hints are provided we allow the vectorizer to change
    // the order of operations that is given by the scalar loop. This is not
    // enabled by default because can be unsafe or inefficient.

The -enable-strict-reductions flag introduced in D98435 will currently only
vectorize reductions in-loop if hints are used, since canVectorizeFPMath()
will return false if reordering is not allowed.

This patch changes canVectorizeFPMath() to query whether it is safe to
vectorize the loop with ordered reductions if no hints are used. For
testing purposes, an additional flag (-hints-allow-reordering) has been
added to disable the reordering behaviour described above.

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D101836
2021-05-26 13:59:12 +01:00
Max Kazantsev
dd9aaa06a7 Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration" (try 2)
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615
2021-05-26 19:47:14 +07:00
Sanjay Patel
594f982cda [InstCombine] add fmul tests with shared operand; NFC
Baseline tests for:
D102698
2021-05-26 08:32:08 -04:00
Sanjay Patel
3b7139d3ee [InstCombine] avoid 'tmp' usage in test files; NFC
The update script ( utils/update_test_checks.py ) warns against this.
2021-05-26 08:32:07 -04:00
Sanjay Patel
7bd5fdb9c0 [InstCombine] avoid 'tmp' usage in test file; NFC
The update script ( utils/update_test_checks.py ) warns against this.
2021-05-26 08:32:07 -04:00
Max Kazantsev
5f3ecc5640 Revert "Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration""
This reverts commit 43d2e51c2e86788b9e2a582fdd3d8ffa7829328a.

Commited wrong version.
2021-05-26 19:29:07 +07:00
Max Kazantsev
5577bed3a9 Return "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"
The patch was reverted due to compile time impact of contextual SCEV
queries. It also appeared that it introduced a miscompile on irreducible CFG.

Changes made:
1. isKnownPredicateAt is replaced with more lightweight isKnownPredicate;
2. Irreducible CFG in live code is now detected and excluded from processing.

Differential Revision: https://reviews.llvm.org/D102615
2021-05-26 19:23:21 +07:00
Andrew Savonichev
44a60c9c6a [AArch64] Generate LD1 for anyext i8 or i16 vector load
The existing LD1 patterns do not cover cases where result type does
not match the memory type. This happens when illegal vector types are
extended and scalarized, for example:

  load <2 x i16>* %v2i16

is lowered into:

  // first element
  (v4i32 (insert_subvector (v2i32 (scalar_to_vector (load anyext from i16)))))
  // other elements
  (v4i32 (insert_vector_elt (i32 (load anyext from i16)) idx))

Before this patch these patterns were compiled into LDR + INS.
Now they are compiled into LD1.

The problem was reported in
PR24820: LLVM Generates abysmal code in simple situation.

Differential Revision: https://reviews.llvm.org/D102938
2021-05-26 14:44:21 +03:00
Max Kazantsev
65b3a9381f [Test] Add Loop Deletion test with irreducible CFG
Authored by Mikael Holmén. It demonstrated miscompile on irreducible
CFG with patch "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration".
The patch is reverted. Checking in the test to make sure this bug
does not return.
2021-05-26 18:40:14 +07:00
Tomas Matheson
d1c0dd082f [MC] Move elf-unique-sections-by-flags.ll to X86/ 2021-05-26 12:28:17 +01:00
pooja2299
e5cec4b7a2 [Docs] Updated the content of getting started documentation under llvm/lib/MC
Wrote about llvm/lib/MC subproject on https://llvm.org/docs/GettingStarted.html page.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D101047
2021-05-26 16:25:26 +05:30
Tomas Matheson
79405b0d62 [MC][ELF] Emit unique sections for different flags
Global values imply flags such as readable, writable, executable for the
sections that they will be placed in. Currently MC places all such
entries into the same section, using the first set of flags seen. This
can lead to situations in LTO where a writable global is placed in the
same named section as a readable global from another file, and the
section may not be marked writable.

D72194 ensures that mergeable globals with explicit sections are placed
in separate sections with compatible entry size, by emitting the
`unique` assembly syntax where appropriate. This change extends that
approach to include section flags, so that globals with different
section flags are emitted in separate unique sections.

Differential revision: https://reviews.llvm.org/D100944
2021-05-26 11:51:29 +01:00
Tomas Matheson
4b71dc385f [MC][NFCI] Factor out ELF section unique ID calculation
Precursor to D100944. The logic for determining the unique ID had become
quite difficult to reason about, so I have factored this out into a
separate function.

Differential Revision: https://reviews.llvm.org/D102336
2021-05-26 11:51:29 +01:00
Simon Pilgrim
6128b86b1d [X86][SLM] Fix vector PSHUFB + variable shift resource/throughputs
Match whats documented in the Intel AOM (+Agner) - PSHUFB xmm is really slow, and mmx/xmm vector shifts are half rate.

Noticed while working to get the cost tables to more closely match llvm-mca analysis, in this case for shifts and truncations.
2021-05-26 11:14:21 +01:00
Florian Hahn
dc7c658d73 [SCEV] Add tests with signed predicates for applyLoopGuards. 2021-05-26 11:10:11 +01:00
Kerry McLaughlin
fe86cc6dc2 [NFC] Add CHECK lines for unordered FP reductions
An additional RUN line has been added to both strict-fadd.ll &
scalable-strict-fadd.ll to ensure the correct behaviour of these
tests where `-enable-strict-reductions` is false.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D103015
2021-05-26 11:00:20 +01:00
Mirko Brkusanin
f5f0329306 [AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks
This function can change regbank for registers which already have a selected
bank. Depending on the instruction where these registers were used it can
cause instruction selection to fail.

Differential Revision: https://reviews.llvm.org/D98515
2021-05-26 11:57:41 +02:00
Mirko Brkusanin
7eda2c55b2 Revert "[AMDGPU][GlobalISel] Stop foldInsertEltToCmpSelect from changing reg banks"
This reverts commit 18c5444702893fd63b0a99ec7133dd714284f9d2.
2021-05-26 11:57:41 +02:00
Fraser Cormack
691b0e7fba [RISCV] Pre-commit fixed-length mask vselect tests
These are default-expanded but later unrolled due to RISC-V's vector
boolean content policy. A patch to improve this codegen will follow
shortly.
2021-05-26 10:44:45 +01:00
Max Kazantsev
35b10351c0 [Test] Add simplified versions of tests for loop deletion that don't need context 2021-05-26 16:39:00 +07:00
Tim Northover
22efee723c AArch64: support post-indexed stores to bfloat types. 2021-05-26 10:35:52 +01:00
Simon Pilgrim
215df0a901 [CostModel][X86] Remove old testshift* tests
The vector shift cost tests are better covered (more cpu/sse levels) by the vshift-*-*cost files, and we're trying to avoid codegen tests in here as it makes it harder to maintain the test files.
2021-05-26 10:31:00 +01:00
Simon Pilgrim
f18fae2383 [X86][Atom] Fix vector variable shift resource/throughputs
Match whats documented in the Intel AOM - the non-immediate variants of the PSLL*/PSRA*/PSRL* shift instructions requires BOTH ports - this was being incorrectly modelled as EITHER port.

Now that we can use in-order models in llvm-mca, the atom model is a good "worst case scenario" analysis for x86.
2021-05-26 10:30:59 +01:00
Max Kazantsev
2adf59415e [Test] Add test on unrolling to make sure it won't fail
Initially it failed an assertion with "Do actual DCE in LoopUnroll (try 2)"
which was later reverted. Make sure that when this patch is returned, the
test works fine.
2021-05-26 16:30:41 +07:00
Roman Lebedev
395eb3f2e5 [NFC][X86] clang-format X86TTIImpl::getInterleavedMemoryOpCostAVX2()
I plan to make changes to it, and undoing formatting each time is not going to be fun.
2021-05-26 12:27:47 +03:00
David Sherwood
8526f9ba5b Fix warning introduced by 9c766f4090d19e3e2f56e87164177f8c3eba4b96 2021-05-26 10:20:39 +01:00
David Sherwood
8067824e3d [InstCombine] Fold extractelement + vector GEP with one use
We sometimes see code like this:

Case 1:
  %gep = getelementptr i32, i32* %a, <2 x i64> %splat
  %ext = extractelement <2 x i32*> %gep, i32 0

or this:

Case 2:
  %gep = getelementptr i32, <4 x i32*> %a, i64 1
  %ext = extractelement <4 x i32*> %gep, i32 0

where there is only one use of the GEP. In such cases it makes
sense to fold the two together such that we create a scalar GEP:

Case 1:
  %ext = extractelement <2 x i64> %splat, i32 0
  %gep = getelementptr i32, i32* %a, i64 %ext

Case 2:
  %ext = extractelement <2 x i32*> %a, i32 0
  %gep = getelementptr i32, i32* %ext, i64 1

This may create further folding opportunities as a result, i.e.
the extract of a splat vector can be completely eliminated. Also,
even for the general case where the vector operand is not a splat
it seems beneficial to create a scalar GEP and extract the scalar
element from the operand. Therefore, in this patch I've assumed
that a scalar GEP is always preferrable to a vector GEP and have
added code to unconditionally fold the extract + GEP.

I haven't added folds for the case when we have both a vector of
pointers and a vector of indices, since this would require
generating an additional extractelement operation.

Tests have been added here:

  Transforms/InstCombine/gep-vector-indices.ll

Differential Revision: https://reviews.llvm.org/D101900
2021-05-26 09:54:26 +01:00
Esme-Yi
b08bc2bd30 [NFC][object] Change the input parameter of the method isDebugSection.
Summary: This is a NFC patch to change the input parameter of the method SectionRef::isDebugSection(), by replacing the StringRef SectionName with DataRefImpl Sec. This allows us to determine if a section is debug type in more ways than just by section name.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D102601
2021-05-26 08:47:53 +00:00
David Green
75476d7b62 [ARM] Add patterns for vmulh
Now that vmulh can be selected, this adds the MVE patterns to make it
legal and generate instructions.

Differential Revision: https://reviews.llvm.org/D88011
2021-05-26 09:22:12 +01:00
LLVM GN Syncbot
4876ff1ec7 [gn build] Port 36d0fdf9ac3b 2021-05-26 04:31:12 +00:00
Arthur Eubanks
9085f7d6c9 [OpaquePtr] Make atomicrmw work with opaque pointers
FullTy is only necessary when we need to figure out what type an
instruction works with given a pointer's pointee type. However, we just
end up using the value operand's type, so FullTy isn't necessary.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102788
2021-05-25 20:16:21 -07:00
Teresa Johnson
9d60059fb7 [LTT] Handle merged llvm.assume when dropping type tests
When the lower type test pass is invoked a second time with
DropTypeTests set to true, it expects that all remaining type tests feed
assume instructions, which are removed along with the type tests.

In some cases the llvm.assume might have been merged with another one,
i.e. from a builtin_assume instruction, in which case the type test
would actually feed a phi that in turn feeds the merged assume
instruction. In this case we can simply replace that operand of the phi
with "true" before removing the type test.

Differential Revision: https://reviews.llvm.org/D103073
2021-05-25 17:02:13 -07:00
Arthur Eubanks
a1a83c59b0 [OpaquePtr] Create new bitcode encoding for atomicrmw
Since the opaque pointer type won't contain the pointee type, we need to
separately encode the value type for an atomicrmw.

Emit this new code for atomicrmw.

Handle this new code and the old one in the bitcode reader.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103123
2021-05-25 16:30:34 -07:00
Kevin Athey
671838d601 LLVM Detailed IR tests for introduction of flag -fsanitize-address-detect-stack-use-after-return-mode.
Rework all tests that interact with use after return to correctly handle the case where the mode has been explicitly set to Never or Always.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D102462
2021-05-25 16:17:39 -07:00
Alexandre Ganea
27303db33a [benchmark] Silence 'suggest override' and 'missing override' warnings
When building with Clang 11 on Windows, silence the following:

F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(955,8): warning: 'Run' overrides a member function but is not marked 'override' [-Wsuggest-override]
  void Run(State& st);
       ^
F:\aganea\llvm-project\llvm\utils\benchmark\include\benchmark/benchmark.h(895,16): note: overridden virtual function is here
  virtual void Run(State& state) = 0;
               ^
1 warning generated.
2021-05-25 18:46:37 -04:00
David Green
21ada5fffa [ARM] Extra predicated tests for VMULH. NFC 2021-05-25 22:24:06 +01:00
Fangrui Song
875f91ef4b [Internalize] Rename instead of removal if a to-be-internalized comdat has more than one member
Beside the `comdat any` deduplication feature, instrumentations use comdat to
establish dependencies among a group of sections, to prevent section based
linker garbage collection from discarding some members without discarding all.
LangRef acknowledges this usage with the following wording:

> All global objects that specify this key will only end up in the final object file if the linker chooses that key over some other key.

On ELF, for PGO instrumentation, a `__llvm_prf_cnts` section and its associated
`__llvm_prf_data` section are placed in the same GRP_COMDAT group.  A
`__llvm_prf_data` is usually not referenced and expects the liveness of its
associated `__llvm_prf_cnts` to retain it.

The `setComdat(nullptr)` code (added by D10679) in InternalizePass can break the
use case (a `__llvm_prf_data` may be dropped with its associated `__llvm_prf_cnts` retained).
The main goal of this patch is to fix the dependency relationship.

I think it makes sense for InternalizePass to internalize a comdat and thus
suppress the deduplication feature, e.g. a relocatable link of a regular LTO can
create an object file affected by InternalizePass.
If a non-internal comdat in a.o is prevailed by an internal comdat in b.o, the
a.o references to the comdat definitions will be non-resolvable (references
cannot bind to STB_LOCAL definitions in b.o).

On PE-COFF, for a non-external selection symbol, deduplication is naturally
suppressed with link.exe and lld-link. However, this is fuzzy on ELF and I tend
to believe the spec creator has not thought about this use case (see D102973).

GNU ld and gold are still using the "signature is name based" interpretation.
So even if D102973 for ld.lld is accepted, for portability, a better approach is
to rename the comdat. A comdat with one single member is the common case,
leaving the comdat can waste (sizeof(Elf64_Shdr)+4*2) bytes, so we optimize by
deleting the comdat; otherwise we rename the comdat.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103043
2021-05-25 14:15:27 -07:00
Matt Morehouse
86272fe99b Revert "[LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration"
This reverts commit 2531fd70d19aa5d61feb533bbdeee7717a4129eb due to
performance regression on the PPC buildbot.
2021-05-25 13:58:42 -07:00