1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 18:54:02 +01:00
Commit Graph

218060 Commits

Author SHA1 Message Date
Albion Fung
2776c1ab5d [PowerPC] Implament Load and Reserve and Store Conditional Builtins
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.

Differential revision: https://reviews.llvm.org/D105236
2021-07-05 21:35:41 -05:00
David Green
797d9342f0 [ARM] Fix arm.mve.pred.v2i range upper limit
The range metadata specifies a half open range, so our top limit was one
off.
2021-07-05 21:06:30 +01:00
Akira Hatanaka
0cc046d72c [ObjC][ARC] Prevent moving objc_retain calls past objc_release calls
that release the retained object

This patch fixes what looks like a longstanding bug in ARC optimizer
where it reverses the order of objc_retain calls and objc_release calls
that retain and release the same object.

The code in ARC optimizer that is responsible for code motion takes the
following steps:

1. Traverse the CFG bottom-up and determine how far up objc_release
   calls can be moved. Determine the insertion points for the
   objc_release calls, but don't actually move them.
2. Traverse the CFG top-down and determine how far down objc_retain
   calls can be moved. Determine the insertion points for the
   objc_retain calls, but don't actually move them.
3. Try to move the objc_retain and objc_release calls if they can't be
   removed.

The problem is that the insertion points for the objc_retain calls are
determined in step 2 without taking into consideration the insertion
points for objc_release calls determined in step 1, so the order of an
objc_retain call and an objc_release call can be reversed, which is
incorrect, even though each step is correct in isolation.

To fix this bug, this patch teaches the top-down traversal step to take
into consideration the insertion points for objc_release calls
determined in the bottom-up traversal step. Code motion for an
objc_retain call is disabled if there is a possibility that it can be
moved past an objc_release call that releases the retained object.

rdar://79292791

Differential Revision: https://reviews.llvm.org/D104953
2021-07-05 12:16:15 -07:00
Nico Weber
27befb62e0 [gn build] (manually) port 98f078324fc5 (llvm-strings Opts.td) 2021-07-05 14:43:05 -04:00
Sushma Unnibhavi
44791e33f4 [M68k][GloballSel] Lower outgoing return values in IRTranslator
Implementation of lowerReturn in the IRTranslator for the M68k backend.

Differential Revision: https://reviews.llvm.org/D105332
2021-07-05 11:39:09 -07:00
Fangrui Song
98d2a19fea [llvm-strings] Switch command line parsing from llvm::cl to OptTable
Some behavior changes:

* `-t=d` is removed. Use `-t d` instead.
* one-dash long options like `-all` are supported. Use `--all` instead.
* `--all=0` or `--all=false` cannot be used. (Note: `--all` is silently ignored anyway)
* `--help-list` is removed. This is a `cl::` specific option.

Nobody is likely leveraging any of the above.

Advantages:

* `-t` diagnostic gets improved.
* in the absence of `HideUnrelatedOptions`, `--help` will not list unrelated options if linking against libLLVM-13git.so or linker GC is not used.
* Decrease the probability of cl::opt collision if we do decide to support multiplexing

Note: because the tool is so simple, used more for forensics instead of a building
tool, and its long options are unlikely used in one-dash form, I just drop the
one-dash form in this patch.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104889
2021-07-05 10:46:17 -07:00
Akira Hatanaka
bb430adbb5 Precommit another test for https://reviews.llvm.org/D104953 2021-07-05 10:28:03 -07:00
Tiehu Zhang
66e4833ec7 [AArch64ISelDAGToDAG] Fix ORRWrs/ORRXrs usefulbits calculation bug
For the following case:

    t8: i32 = or t7, t4
    t10: i32 = ORRWrs t8, t8, TargetConstant:i32<73>

Current code wrongly returns (t8 >> shiftConstant) as the
UsefulBits of t8, which in fact is (t8 | (t8 >> shiftConstant)).

Reviewed by: sdesmalen, mdchen

Differential Revision: https://reviews.llvm.org/D102759
2021-07-06 00:38:42 +08:00
Paul Walker
3ed0563b6c Fix typo in help text for -aarch64-enable-branch-targets. 2021-07-05 16:15:40 +01:00
Anirudh Prasad
5a7de1d054 [MCParser][z/OS] Mark a few tests as unsupported for the z/OS Target
- Background here is that that these sets of tests are "invalid" to be run on z/OS
- The reason is because these test constructs that HLASM never supports (HLASM doesn't support GNU style directives)
- Usually tests are geared towards a particular target via the use of a triple that targets just that platform, but these tests require the use of a "default triple"
- Thus, we mark these tests as "UNSUPPORTED" for z/OS since we don't want to run these for z/OS

Reviewed By: yusra.syeda, abhina.sreeskantharajan

Differential Revision: https://reviews.llvm.org/D105204
2021-07-05 11:06:52 -04:00
Florian Hahn
137df65805 [LV] Extend FIXME in test add in 91ee1e379901af3. 2021-07-05 15:56:03 +01:00
Florian Hahn
494eb99702 [LV] Add initial test cases with small clamped indices. 2021-07-05 15:51:12 +01:00
Sanjay Patel
4906453b2f [InstCombine] fold icmp slt/sgt of offset value with constant
This follows up patches for the unsigned siblings:
0c400e895306
c7b658aeb526

We are translating an offset signed compare to its
unsigned equivalent when one end of the range is
at the limit (zero or unsigned max).

(X + C2) >s C --> X <u (SMAX - C) (if C == C2 - 1)
(X + C2) <s C --> X >u (C ^ SMAX) (if C == C2)

This probably does not show up much in IR derived
from C/C++ source because that would likely have
'nsw', and we have folds for that already.

As with the previous unsigned transforms, the folds
could be generalized to handle non-constant patterns:

https://alive2.llvm.org/ce/z/Y8Xrrm

  ; sgt
  define i1 @src(i8 %a, i8 %c) {
    %c2 = add i8 %c, 1
    %t = add i8 %a, %c2
    %ov = icmp sgt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_off = sub i8 127, %c ; SMAX
    %ov = icmp ult i8 %a, %c_off
    ret i1 %ov
  }

https://alive2.llvm.org/ce/z/c8uhnk

  ; slt
  define i1 @src(i8 %a, i8 %c) {
    %t = add i8 %a, %c
    %ov = icmp slt i8 %t, %c
    ret i1 %ov
  }

  define i1 @tgt(i8 %a, i8 %c) {
    %c_offnot = xor i8 %c, 127 ; SMAX
    %ov = icmp ugt i8 %a, %c_offnot
    ret i1 %ov
  }
2021-07-05 10:08:31 -04:00
Sanjay Patel
0acfe2bcae [InstCombine][tests] add tests for signed icmp with constant and offset; NFC 2021-07-05 10:08:31 -04:00
Caroline Concatto
1631d2fbaa [AArch64][CostModel] Add cost model for experimental.vector.splice
This patch adds a new  ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.

Differential Revision: https://reviews.llvm.org/D104630
2021-07-05 14:30:24 +01:00
Wang, Pengfei
cd7ebd1f50 [X86] Twist shuffle mask when fold HOP(SHUFFLE(X,Y),SHUFFLE(X,Y)) -> SHUFFLE(HOP(X,Y))
This patch fixes PR50823.

The shuffle mask should be twisted twice before gotten the correct one due to the difference between inner HOP and outer.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D104903
2021-07-05 21:29:42 +08:00
Simon Pilgrim
8ed3c99d18 [CostModel][X86] Handle costs for insert/extractelement with non-immediate indices via stack
Determine the insert/extractelement costs when performing this as a sequence of aliased loads+stores via the stack.
2021-07-05 13:26:53 +01:00
Simon Pilgrim
4d75d8ef44 [CostModel][X86] Adjust i32/i64 to f32/f64 scalar based on llvm-mca reports (+ Agner).
Older SSE targets have slower gpr->fpu scalar conversions - we also need to account for uitofp i32 > f32/f64 being lowered as sitofp i64 -> f32/f64
2021-07-05 13:26:53 +01:00
Sanjay Patel
ab96742bc5 [InstSimplify] fold extractelement of splat with variable extract index
We already have a fold for variable index with constant vector,
but if we can determine a scalar splat value, then it does not
matter whether that value is constant or not.

We overlooked this fold in D102404 and earlier patches,
but the fixed vector variant is shown in:
https://llvm.org/PR50817

Alive2 agrees on that:
https://alive2.llvm.org/ce/z/HpijPC

The same logic applies to scalable vectors.

Differential Revision: https://reviews.llvm.org/D104867
2021-07-05 08:19:40 -04:00
Caroline Concatto
eae00d0f38 [SLPVectorizer] Fix crash in vectorizeChainsInBlock for scalable vector.
The function vectorizeChainsInBlock does not support scalable vector,
because function like canReuseExtract and isCommutative in the code
path assert with scalable vectors.

This patch avoids vectorizing blocks that have extract instructions with scalable
vector..

Differential Revision: https://reviews.llvm.org/D104809
2021-07-05 12:43:41 +01:00
Bradley Smith
e338c9199a [AArch64][SVE] Improve fixed length codegen for common vector shuffle case
Improve codegen when lowering the common vector shuffle case from the
vectorizer (op1[last]:op2[0:last-1]). This patch only handles this
common case as it is difficult to handle this more generally when using
fixed length vectors, due to being unable to use the SVE ext instruction.

Differential Revision: https://reviews.llvm.org/D105289
2021-07-05 12:09:27 +01:00
David Stuttard
4a7f8b5385 [DAGCombiner] Add support for mulhi const folding in DAGCombiner
Differential Revision: https://reviews.llvm.org/D103323

Change-Id: I4ffaaa32301795ba8a339567a68e77fe0862b869
2021-07-05 12:01:26 +01:00
David Stuttard
d546220b1b [DAGCombiner] Pre-commit test to demonstrate mulhi const folding
D103323 will fold this

Differential Revision: https://reviews.llvm.org/D105424

Change-Id: I64947215eb531fbd70b52a72203b39e43fefafcc
2021-07-05 11:34:38 +01:00
Sjoerd Meijer
cda6f69edf [AArch64] Cost-model i8 vector loads/stores
Loads of <4 x i8> vectors were modeled as extremely expensive. And while we
don't have a load instruction that supports this, it isn't that expensive to
create a vector of i8 elements. The codegen for this was fixed/optimised in
D105110. This now tweaks the cost model and enables SLP vectorisation of my
motivating case loadi8.ll.

Differential Revision: https://reviews.llvm.org/D103629
2021-07-05 11:25:10 +01:00
Stephen Tozer
d44ff6488d [DebugInfo] CGP+HWasan: Handle dbg.values with duplicate location ops
This patch fixes an issue which occurred in CodeGenPrepare and
HWAddressSanitizer, which both at some point create a map of Old->New
instructions and update dbg.value uses of these. They did this by
iterating over the dbg.value's location operands, and if an instance of
the old instruction was found, replaceVariableLocationOp would be
called on that dbg.value. This would cause an error if the same operand
appeared multiple times as a location operand, as the first call to
replaceVariableLocationOp would update all uses of the old instruction,
invalidating the old iterator and eventually hitting an assertion.

This has been fixed by no longer iterating over the dbg.value's location
operands directly, but by first collecting them into a set and then
iterating over that, ensuring that we never attempt to replace a
duplicated operand multiple times.

Differential Revision: https://reviews.llvm.org/D105129
2021-07-05 10:35:19 +01:00
David Stuttard
d7200011f1 [AMDGPU] Stop mulhi from doing 24 bit mul for uniform values
Added support to check if architecture supports s_mulhi which is used as part of
the decision whether or not to use valu 24 bit mul (if the mulhi gets
transformed to a valu op anyway, then may as well use it).

This is an extension of the work in D97063

Differential Revision: https://reviews.llvm.org/D103321

Change-Id: I80b1323de640a52623d69ac005a97d06a5d42a14
2021-07-05 10:33:23 +01:00
Craig Topper
dbc3cdb17c [RISCV] Pass FeatureBitset by reference rather than by value. NFCI
FeatureBitset is 4 64-bit values in an array. It's better passed by
reference rather than copying it.

I may be adding FeatureBitset as an argument to another function
and noticed this while working on that.
2021-07-04 23:11:40 -07:00
Esme-Yi
11bbb4a8e4 [llvm-readobj][XCOFF] Add support for printing the String Table.
Summary: The patch adds the StringTable dumping to
llvm-readobj. Currently only XCOFF is supported.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104613
2021-07-05 04:16:58 +00:00
Chen Zheng
e4aa7692e8 [XCOFF][NFC] add DWARF section support in XCOFF object writer
Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D97049
2021-07-05 03:13:29 +00:00
Nikita Popov
3cc10c45ba [IR] Deprecate GetElementPtrInst::CreateInBounds without element type
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.

Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
2021-07-04 16:49:30 +02:00
Paul Walker
ba16635997 [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Nikita Popov
ecd2dc975e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Christopher Di Bella
b463417679 [llvm][iwyu] explicitly includes <functional> and <utility>
Compiling LLVM with Clang modules and libc++ identified that
`Support/Printable.h` and `ADL/SmallVector.h` were using features that
live in these headers.

Differential Revision: https://reviews.llvm.org/D105402
2021-07-04 06:02:11 +00:00
Simon Pilgrim
f73cebf7e4 [KnownBits] Merge const/non-const KnownBits::extractBits implementations. NFC.
These are identical and can be just const.
2021-07-03 19:00:25 +01:00
Simon Pilgrim
a3140ba7ba [X86][SSE] Add mulhu/mulhs constant folding tests
These should be folded by D103323
2021-07-03 17:01:59 +01:00
Simon Pilgrim
ed92aa4a26 [SelectionDAG] Replace APInt.lshr().trunc() with APInt.extractBits() where possible. NFC.
This also allows us to use KnownBits::extractBits in one case.
2021-07-03 16:33:00 +01:00
Simon Pilgrim
3325ca15e6 [SelectionDAG] Use KnownBits::insertBits instead of separate APInt::insertBits calls. NFC. 2021-07-03 16:32:59 +01:00
Nikita Popov
3da103d01e [IRBuilder] Avoid fetching pointer element type in some assertions
Specifically the CreateMaskedStore and CreateMaskedScatter APIs.
The CreateMaskedLoad and CreateMaskedGather APIs will need an
additional type argument.
2021-07-03 12:52:55 +02:00
Roman Lebedev
aec6b731b1 [SimplifyCFG] simplifyUnreachable(): erase instructions iff they are guaranteed to transfer execution to unreachable
This replaces the current ad-hoc implementation,
by syncing the code from InstCombine's implementation in `InstCombinerImpl::visitUnreachableInst()`,
with one exception that here in SimplifyCFG we are allowed to remove EH instructions.

Effectively, this now allows SimplifyCFG to remove calls (iff they won't throw and will return),
arithmetic/logic operations, etc.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D105374
2021-07-03 10:45:44 +03:00
David Green
c41e03ba29 [AArch64] Add S/UQXTRN tablegen patterns.
This adds simple patterns for signed and unsigned saturating extract
narrow instructions. They combine a min/max/truncate into a single
instruction, providing that the immediates on the min/max are correct
for the saturation type. This is just handled in tablegen with some
extra patterns.

v2i64->v2i32 is not handled here as the min/max nodes are not legal,
making the lowering quite different.

Differential Revision: https://reviews.llvm.org/D103263
2021-07-03 07:57:19 +01:00
Kai Luo
afc65d2411 [AIX] Adjust CSR order to avoid breaking ABI regarding traceback
Allocate non-volatile registers in order to be compatible with ABI, regarding gpr_save.

Quoted from https://www.ibm.com/docs/en/ssw_aix_72/assembler/assembler_pdf.pdf page55,
> The preferred method of using GPRs is to use the volatile registers first. Next, use the nonvolatile registers
> in descending order, starting with GPR31.

This patch is based on @jsji 's initial draft.

Tested on test-suite and SPEC, found no degradation.

Reviewed By: jsji, ZarkoCA, xingxue

Differential Revision: https://reviews.llvm.org/D100167
2021-07-03 04:45:26 +00:00
Craig Topper
1ddc2a3bd1 [SelectionDAG] Rename memory VT argument for getMaskedGather/getMaskedScatter from VT to MemVT.
Use getMemoryVT() in MGATHER/MSCATTER DAG combines instead of
using the passthru or store value VT for this argument.
2021-07-02 17:37:40 -07:00
Fangrui Song
e20d2e60fb [ThinLTO] Respect ClearDSOLocalOnDeclarations for unimported functions
D74751 added `ClearDSOLocalOnDeclarations` and dropped dso_local for
isDeclarationForLinker `GlobalValue`s. It missed a case for imported
declarations (`doImportAsDefinition` is false while `isPerformingImport` is
true). This can lead to a linker error for a default visibility symbol in
`ld.lld -shared`.

When `ClearDSOLocalOnDeclarations` is true, we check
`isPerformingImport() && !doImportAsDefinition(&GV)` along with
`GV.isDeclarationForLinker()`. The new condition checks an imported declaration.

This patch fixes a `LLVMPolly.so` link error using a trunk clang -DLLVM_ENABLE_LTO=Thin.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D104986
2021-07-02 17:08:25 -07:00
Jonas Devlieghere
3020664b33 Revert "[DebugInfo] Enforce implicit constraints on distinct MDNodes"
This reverts commit 8cd35ad854ab4458fd509447359066ea3578b494.

It breaks `TestMembersAndLocalsWithSameName.py` on GreenDragon and
Mikael Holmén points out in D104827 that bitcode files created with the
patch cannot be parsed with binaries built before it.
2021-07-02 15:57:07 -07:00
Roman Lebedev
322eb67111 [NFC][Codegen] Autogenerate check lines in PowerPC/2007-11-16-landingpad-split.ll
It is being affected by an upcoming SimplifyCFG change.
2021-07-02 23:33:31 +03:00
Roman Lebedev
cc3b86bd5c [NFC][Codegen] Tune a few tests to not end with a naked unreachable terminator
These rely on the fact that currently simplifycfg won't really propagate
said `unreachable`, but that is about to change.
2021-07-02 23:33:30 +03:00
Amara Emerson
128d2d791b [GlobalISel] Clean up CombinerHelper::apply* functions to return void.
For some reason we/I started writing these as returning bool when the return value
is actually ignored by the combiner.
2021-07-02 13:17:06 -07:00
Roman Lebedev
8253f84760 [NFC][SimplifyCFG] Autogenerate checklines in a few tests 2021-07-02 22:35:24 +03:00
Amara Emerson
b1924533d9 [GlobalISel] Add re-association combine for G_PTR_ADD to allow better addressing mode usage.
We're trying to match a few pointer computation patterns here for
re-association opportunities.
1) Isolating a constant operand to be on the RHS, e.g.:
   G_PTR_ADD(BASE, G_ADD(X, C)) -> G_PTR_ADD(G_PTR_ADD(BASE, X), C)

2) Folding two constants in each sub-tree as long as such folding
   doesn't break a legal addressing mode.
   G_PTR_ADD(G_PTR_ADD(BASE, C1), C2) -> G_PTR_ADD(BASE, C1+C2)

AArch64 code size improvements on CTMark with -Os:
Program              before  after   diff
 pairlocalalign      251048  251044 -0.0%
 consumer-typeset    421820  421812 -0.0%
 kc                  431348  431320 -0.0%
 SPASS               413404  413300 -0.0%
 clamscan            384396  384220 -0.0%
 tramp3d-v4          370640  370412 -0.1%
 lencod              432096  431772 -0.1%
 bullet              479400  478796 -0.1%
 sqlite3             288504  288072 -0.1%
 7zip-benchmark      573796  570768 -0.5%
 Geomean difference                 -0.1%

Differential Revision: https://reviews.llvm.org/D105069
2021-07-02 12:31:21 -07:00
Roman Lebedev
56d41a008d [NFCI][SimplifyCFG] simplifyUnreachable(): Use poison constant to represent the result of unreachable instrs
Mimics similar change for InstCombine:
  ce192ced2b901be67444c481ab5ca0d731e6d982 / D104602

All these uses are in blocks that aren't reachable from function's entry,
and said blocks are removed by SimplifyCFG itself,
so we can't really test this change.
2021-07-02 22:11:52 +03:00