1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-23 19:23:23 +01:00
Commit Graph

180382 Commits

Author SHA1 Message Date
Roman Lebedev
760a88fd91 [InstSimplify] Fix addo/subo undef folds (PR42209)
Fix folds of addo and subo with an undef operand to be:

`@llvm.{u,s}{add,sub}.with.overflow` all fold to `{ undef, false }`,
 as per LLVM undef rules.
Same for commuted variants.

Based on the original version of the patch by @nikic.

Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=42209 | PR42209 ]]

Differential Revision: https://reviews.llvm.org/D63065

llvm-svn: 363522
2019-06-16 20:39:45 +00:00
Nicolai Haehnle
fdedd83469 [AsmPrinter] Make EmitLinkage and EmitVisibility public
Summary:
This allows target to implement custom emit of global variables if
required. See subsequent patch for a use case.

Change-Id: I9654197e3df24503104a54c41fff06845aed37fe

Reviewers: arsenm, kzhuravl

Subscribers: wdng, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61650

llvm-svn: 363519
2019-06-16 18:30:42 +00:00
Nicolai Haehnle
7e12ec07a2 AMDGPU: Prepare for explicit absolute relocations in code generation
Summary:
We will use absolute relocations for LDS symbols.

Change-Id: I9a32795ed0ea835e433a787129cfe3c57ee9a325

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61492

llvm-svn: 363517
2019-06-16 17:43:37 +00:00
Nicolai Haehnle
fa39305b4a AMDGPU: Be explicit about whether the high-word in SI_PC_ADD_REL_OFFSET is 0
Summary:
Instead of encoding a high-word of 0 using a fake TargetGlobalAddress,
just use a literal target constant. This simplifies some subsequent changes.

The generated assembly is now more explicit about the kind of relocation
that is to be used.

Change-Id: I066835202d23b5941fa7a358eb4b89e9b71ab6f8

Reviewers: arsenm, rampitec

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D61491

llvm-svn: 363516
2019-06-16 17:32:01 +00:00
Nicolai Haehnle
c02be57950 AMDGPU/GFX10: Support DLC bit in llvm.amdgcn.s.buffer.load intrinsic
Summary: Change-Id: Ie4c971462a7749740938c687144e77441dac2539

Reviewers: rampitec, arsenm

Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D62486

Change-Id: Iae59523edd75c74918d2118df6571a7b671717a0
llvm-svn: 363514
2019-06-16 17:14:12 +00:00
Stanislav Mekhanoshin
cefbe0fc94 [AMDGPU] gfx10 conditional registers handling
This is cpp source part of wave32 support, excluding overriden
getRegClass().

Differential Revision: https://reviews.llvm.org/D63351

llvm-svn: 363513
2019-06-16 17:13:09 +00:00
Sanjay Patel
7019c260cd [CodeGenPrepare][x86] shift both sides of a vector select when profitable
This is based on the example/discussion in PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428

Proper vector shift instructions don't appear until AVX2, so we may generate several
extra instructions within a loop trying to compensate for that. It's difficult to
recover from that shift expansion later than this, so use the existing TLI hook and
splat analysis to enable better codegen.

This extends CGP functionality introduced with:
rL201655

Differential Revision: https://reviews.llvm.org/D63233

llvm-svn: 363511
2019-06-16 15:29:03 +00:00
Sanjay Patel
df1008af37 [x86] split 256-bit vector selects if operands are vector concats
This is similar logic/motivation to the select splitting in D62969.

In D63233, the pattern changes so that we no longer have an extract_subvector of vselect,
but the operands of the select are still being concatenated.

The closest case is represented in either the first or last test diffs here - we have an
extra instruction, but we converted 3-4 ymm instructions into 4-5 xmm instructions.
I think that's the right trade-off for most AVX1 targets.

In the example based on PR37428:
https://bugs.llvm.org/show_bug.cgi?id=37428
...this makes the loop about 30% faster (tested on Haswell by compiling with -mavx).

Differential Revision: https://reviews.llvm.org/D63364

llvm-svn: 363508
2019-06-16 14:04:49 +00:00
Simon Pilgrim
5aad3bee75 [X86] CombineShuffleWithExtract - handle cases with different vector extract sources
Insert the shorter vector source into an undef vector of the longer vector source's type.

llvm-svn: 363507
2019-06-16 08:00:41 +00:00
Nico Weber
3f76d78820 gn build: Merge r363444
llvm-svn: 363505
2019-06-16 02:24:01 +00:00
Simon Pilgrim
16bbccb36f [X86] CombineShuffleWithExtract - assert all src ops types are multiples of rootsize. NFCI.
llvm-svn: 363501
2019-06-15 19:12:44 +00:00
Simon Pilgrim
f90ae92229 [X86][AVX] Handle lane-crossing shuffle(extract_subvector(x,c1),extract_subvector(y,c2),m1) shuffles
Pull out the existing (non)lane-crossing fold into a helper lambda and use for lane-crossing unary shuffles as well.

Fixes PR34380

llvm-svn: 363500
2019-06-15 18:30:43 +00:00
Simon Pilgrim
e165d25b9b [X86][AVX] Decode constant bits from insert_subvector(c1, c2, c3)
This mostly happens due to SimplifyDemandedVectorElts reducing a vector to insert_subvector(undef, c1, 0)

llvm-svn: 363499
2019-06-15 17:05:24 +00:00
Roman Lebedev
eca9899a89 [NFC][MCA][X86] Add one more 'clear super register' pattern - movss/movsd load clears high XMM bits
llvm-svn: 363498
2019-06-15 16:12:13 +00:00
Roman Lebedev
f3a76c9591 [NFC][MCA][X86] Add baseline test coverage for AMD Barcelona (aka K10, fam10h)
Looking into sched model for that CPU ...

llvm-svn: 363497
2019-06-15 16:12:05 +00:00
Aaron Puchert
148bc13d52 [Clang] Harmonize Split DWARF options with llc
Summary:
With Split DWARF the resulting object file (then called skeleton CU)
contains the file name of another ("DWO") file with the debug info.
This can be a problem for remote compilation, as it will contain the
name of the file on the compilation server, not on the client.

To use Split DWARF with remote compilation, one needs to either

* make sure only relative paths are used, and mirror the build directory
  structure of the client on the server,
* inject the desired file name on the client directly.

Since llc already supports the latter solution, we're just copying that
over. We allow setting the actual output filename separately from the
value of the DW_AT_[GNU_]dwo_name attribute in the skeleton CU.

Fixes PR40276.

Reviewers: dblaikie, echristo, tejohnson

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D59673

llvm-svn: 363496
2019-06-15 15:38:51 +00:00
Kang Zhang
b6bd9f2f91 [PowerPC] Set the innermost hot loop to align 32 bytes
Summary:
If the nested loop is an innermost loop, prefer to a 32-byte alignment, so that
we can decrease cache misses and branch-prediction misses. Actual alignment of
 the loop will depend on the hotness check and other logic in alignBlocks.

The old code will only align hot loop to 32 bytes when the LoopSize larger than
16 bytes and smaller than 32 bytes, this patch will align the innermost hot loop
 to 32 bytes not only for the hot loop whose size is 16~32 bytes.

Reviewed By: steven.zhang, jsji

Differential Revision: https://reviews.llvm.org/D61228

llvm-svn: 363495
2019-06-15 15:10:24 +00:00
Gauthier Harnisch
ed940af216 [clang] Add storage for APValue in ConstantExpr
Summary:
When using ConstantExpr we often need the result of the expression to be kept in the AST. Currently this is done on a by the node that needs the result and has been done multiple times for enumerator, for constexpr variables... . This patch adds to ConstantExpr the ability to store the result of evaluating the expression. no functional changes expected.

Changes:
 - Add trailling object to ConstantExpr that can hold an APValue or an uint64_t. the uint64_t is here because most ConstantExpr yield integral values so there is an optimized layout for integral values.
 - Add basic* serialization support for the trailing result.
 - Move conversion functions from an enum to a fltSemantics from clang::FloatingLiteral to llvm::APFloatBase. this change is to make it usable for serializing APValues.
 - Add basic* Import support for the trailing result.
 - ConstantExpr created in CheckConvertedConstantExpression now stores the result in the ConstantExpr Node.
 - Adapt AST dump to print the result when present.

basic* : None, Indeterminate, Int, Float, FixedPoint, ComplexInt, ComplexFloat,
the result is not yet used anywhere but for -ast-dump.

Reviewers: rsmith, martong, shafik

Reviewed By: rsmith

Subscribers: rnkovacs, hiraditya, dexonsmith, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D62399

llvm-svn: 363493
2019-06-15 10:24:47 +00:00
Fangrui Song
ae9b5d79fc [BranchProbability] Delete a redundant overflow check
llvm-svn: 363492
2019-06-15 10:09:59 +00:00
Nikita Popov
c334a9e7fa [SCEV] Use unsigned/signed intersection type in SCEV
Based on D59959, this switches SCEV to use unsigned/signed range
intersection based on the sign hint. This will prefer non-wrapping
ranges in the relevant domain. I've left the one intersection in
getRangeForAffineAR() to use the smallest intersection heuristic,
as there doesn't seem to be any obvious preference there.

Differential Revision: https://reviews.llvm.org/D60035

llvm-svn: 363490
2019-06-15 09:15:52 +00:00
Nikita Popov
e851c6c59e [SimplifyIndVar] Simplify non-overflowing saturating add/sub
If we can detect that saturating math that depends on an IV cannot
overflow, replace it with simple math. This is similar to the CVP
optimization from D62703, just based on a different underlying
analysis (SCEV vs LVI) that catches different cases.

Differential Revision: https://reviews.llvm.org/D62792

llvm-svn: 363489
2019-06-15 08:48:52 +00:00
Fangrui Song
1aca9a529e [RISCV] Regenerate remat.ll and atomic-rmw.ll after D43256
llvm-svn: 363487
2019-06-15 07:49:14 +00:00
Fangrui Song
3d2a483a8a [RISCV] Simplify RISCVAsmBackend::writeNopData(). NFC
llvm-svn: 363486
2019-06-15 06:14:15 +00:00
Alex Brachet
1c3240a461 [objcopy] Error when --preserve-dates is specified with standard streams
Summary: llvm-objcopy/strip now error when -p is specified when reading from stdin or writing to stdout

Reviewers: jhenderson, rupprecht, espindola, alexshap

Reviewed By: jhenderson, rupprecht

Subscribers: emaste, arichardson, jakehehrlich, MaskRay, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63090

llvm-svn: 363485
2019-06-15 05:32:23 +00:00
Michael Berg
70f373fb8d adding more fmf propagation for selects plus updated tests
llvm-svn: 363484
2019-06-15 04:53:51 +00:00
Fangrui Song
81b56b45f1 Revert "adding more fmf propagation for selects plus tests"
This reverts rL363474. -debug-only=isel was added to some tests that
don't specify `REQUIRES: asserts`. This causes failures on
-DLLVM_ENABLE_ASSERTIONS=off builds.

I chose to revert instead of fixing the tests because I'm not sure
whether we should add `REQUIRES: asserts` to more tests.

llvm-svn: 363482
2019-06-15 03:51:08 +00:00
Huihui Zhang
ed311665bd [InstCombine] Add tests to show missing fold opportunity for "icmp and shift" (nfc).
Summary:
For icmp pred (and (sh X, Y), C), 0

  When C is signbit, expect to fold (X << Y) & signbit ==/!= 0 into (X << Y) >=/< 0,
  rather than (X & (signbit >> Y)) != 0.

  When C+1 is power of 2, expect to fold (X << Y) & ~C ==/!= 0 into (X << Y) </>= C+1,
  rather than (X & (~C >> Y)) == 0.

For icmp pred (and X, (sh signbit, Y)), 0

  Expect to fold (X & (signbit l>> Y)) ==/!= 0 into (X << Y) >=/< 0
  Expect to fold (X & (signbit << Y)) ==/!= 0 into (X l>> Y) >=/< 0

  Reviewers: lebedev.ri, efriedma, spatel, craig.topper

  Reviewed By: lebedev.ri

  Subscribers: llvm-commits

  Tags: #llvm

  Differential Revision: https://reviews.llvm.org/D63025

llvm-svn: 363479
2019-06-15 00:33:41 +00:00
Matt Arsenault
e24b975d54 Reapply "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This reapplies r363410, avoiding null dereference if there is no
AltRegBank.

llvm-svn: 363478
2019-06-15 00:33:26 +00:00
Richard Smith
25adaafc6f Add a map_range function for applying map_iterator to a range.
In preparation for use in Clang.

llvm-svn: 363477
2019-06-14 23:56:40 +00:00
Mitch Phillips
c40bb0e320 Revert "GlobalISel: Avoid producing Illegal copies in RegBankSelect"
This patch breaks UBSan build bots. See
https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild for
a guide as to how to reproduce the error.

This reverts commit c2864c0de07efb5451d32d27a7d4ff2984830929.
This reverts rL363410.

llvm-svn: 363476
2019-06-14 23:45:34 +00:00
Michael Berg
19763b0404 adding more fmf propagation for selects plus tests
llvm-svn: 363474
2019-06-14 23:30:52 +00:00
Guozhi Wei
6a55f3d913 [MBP] Move a latch block with conditional exit and multi predecessors to top of loop
Current findBestLoopTop can find and move one kind of block to top, a latch block has one successor. Another common case is:

    * a latch block
    * it has two successors, one is loop header, another is exit
    * it has more than one predecessors

If it is below one of its predecessors P, only P can fall through to it, all other predecessors need a jump to it, and another conditional jump to loop header. If it is moved before loop header, all its predecessors jump to it, then fall through to loop header. So all its predecessors except P can reduce one taken branch.

Differential Revision: https://reviews.llvm.org/D43256

llvm-svn: 363471
2019-06-14 23:08:59 +00:00
Akira Hatanaka
b788377f2b [ObjC][ARC] Delete ObjC runtime calls on global variables annotated
with 'objc_arc_inert'

Those calls are no-ops, so they can be safely deleted.

rdar://problem/49839633

Differential Revision: https://reviews.llvm.org/D62433

llvm-svn: 363468
2019-06-14 22:06:32 +00:00
Matt Arsenault
d3eb4b0274 AMDGPU: Avoid most waitcnts before calls
Currently you get extra waits, because waits are inserted for the
register dependencies of the call, and the function prolog waits on
everything.

Currently waits are still inserted on returns. It may make sense to
not do this, and wait in the caller instead.

llvm-svn: 363465
2019-06-14 21:52:26 +00:00
Ziang Wan
cced2cd82d Add --print-supported-cpus flag for clang.
This patch allows clang users to print out a list of supported CPU models using
clang [--target=<target triple>] --print-supported-cpus

Then, users can select the CPU model to compile to using
clang --target=<triple> -mcpu=<model> a.c

It is a handy feature to help cross compilation.

llvm-svn: 363464
2019-06-14 21:42:21 +00:00
Francis Visoiu Mistrih
9be6e61346 [Remarks][NFC] Improve testing and documentation of -foptimization-record-passes
This adds:

* documentation to the user manual
* nicer error message
* test for the error case
* test for the gold plugin

llvm-svn: 363463
2019-06-14 21:38:57 +00:00
Matt Arsenault
731346ec27 SROA: Allow eliminating addrspacecasted allocas
There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.

This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.

For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.

llvm-svn: 363462
2019-06-14 21:38:31 +00:00
Jinsong Ji
9a1733d754 [PowerPC][NFC] Comments update and remove some unused def
llvm-svn: 363461
2019-06-14 21:33:51 +00:00
Matt Arsenault
e8819fc12c SROA: Add baseline test for addrspacecast changes
llvm-svn: 363460
2019-06-14 21:22:26 +00:00
Matt Arsenault
5f185d6ecc AMDGPU: Fix capitalized register names in asm constraints
This was a workaround a long time ago, but the canonical lower case
names work now.

llvm-svn: 363459
2019-06-14 21:16:06 +00:00
Matt Arsenault
1897dcd325 AMDGPU: Fix dropping memref for ds append/consume
The way SelectionDAG treats memory operands is very frustrating, and
by default drops them unless a property is set on the pattern. There
is no pattern for manually selected instructions, so this requires
manually setting them.

llvm-svn: 363455
2019-06-14 21:01:24 +00:00
Matt Arsenault
cc1ef072d9 AMDGPU: Set isTrap on S_TRAP
This seems to only be used for generating some kind
of documentation, but might as well set it.

llvm-svn: 363454
2019-06-14 21:01:24 +00:00
Matt Arsenault
d469e3d76e AMDGPU: Add baseline test for call waitcnt insertion
llvm-svn: 363453
2019-06-14 21:01:23 +00:00
Matt Arsenault
d4b5b30e5d UpdateTestChecks: Consider .section as end of function for AMDGPU
Kernels seem to go directly to a section switch instead of emitting
.Lfunc_end. This fixes including all of the kernel metadata in the
check lines, which is undesirable most of the time.

llvm-svn: 363452
2019-06-14 20:40:15 +00:00
Sanjay Patel
607142d387 [x86] add test for 256-bit blendv with AVX targets; NFC
This is a reduction of the pattern seen in D63233.

llvm-svn: 363448
2019-06-14 20:03:42 +00:00
Lang Hames
1644fea7a8 [JITLink] Move JITLinkMemoryManager into its own header.
llvm-svn: 363444
2019-06-14 19:41:21 +00:00
Saleem Abdulrasool
0d0cbbbdf4 build: extract LLVM distribution target handling
This extracts the LLVM distribution target handling into a support module.
Extraction will enable us to restructure the builds to support multiple
distribution configurations (e.g. developer and user) to permit us to build the
development package and the user package at once.

llvm-svn: 363440
2019-06-14 18:28:57 +00:00
Francis Visoiu Mistrih
8a74d6dc35 [Remarks] Use the RemarkSetup error in setupOptimizationRemarks
Added the errors in r363415 but they were not used in the
RemarkStreamer.

llvm-svn: 363439
2019-06-14 18:18:26 +00:00
Nico Weber
a2154ca0c7 gn build: Add NVPTX target
The NVPTX target is a bit unusual in that it's the only target without a
disassembler, and one of three targets without an asm parser (and the
first one of those three in the gn build). NVPTX doesn't have those
because it's not a binary format.

The CMake build checks for the existence of
{AsmParser,Disassembler}/CMakeLists.txt when setting
LLVM_ENUM_ASM_PARSERS / LLVM_ENUM_DISASSEBLERS
(http://llvm-cs.pcc.me.uk/CMakeLists.txt#744). The GN build doesn't want
to hit the disk for things like this, so instead I'm adding explicit
`targets_with_asm_parsers` and `targets_with_disassemblers` lists. Since
both are needed rarely, they are defined in their own gni files.

Differential Revision: https://reviews.llvm.org/D63210

llvm-svn: 363437
2019-06-14 18:07:00 +00:00
Nico Weber
837241307e gn build: Simplify Target build files
Now that the cycle between MCTargetDesc and TargetInfo is gone
(see revisions 360709 360718 360722 360724 360726 360731 360733 360735 360736),
remove the dependency from TargetInfo on MCTargetDesc:tablegen. In most
targets, this makes MCTargetDesc:tablegen have just a single use, so
inline it there.

For AArch64, ARM, and RISCV there's still a similar cycle between
MCTargetDesc and Utils, so the MCTargetDesc:tablegen indirection is
still needed there.

Differential Revision: https://reviews.llvm.org/D63200

llvm-svn: 363436
2019-06-14 17:58:34 +00:00