1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-10-19 19:12:56 +02:00
Commit Graph

200457 Commits

Author SHA1 Message Date
Serge Pavlov
7d91a1a046 [Windows] Fix limit on command line size
Documentation on CreateProcessW states that maximal size of command line
is 32767 characters including ternimation null character. In the
function llvm::sys::commandLineFitsWithinSystemLimits this limit was set
to 32768. As a result if command line was exactly 32768 characters long,
a response file was not created and CreateProcessW was called with
too long command line.

Differential Revision: https://reviews.llvm.org/D83772
2020-07-21 17:33:22 +07:00
Djordje Todorovic
bd9b7ebd66 [NFC][Debugify] Rename OptCustomPassManager into DebugifyCustomPassManager
In addition, move the definition of the class into the Debugify.h,
so we can use it from different levels.

The motivation for this is D82547.

Differential Revision: https://reviews.llvm.org/D83391
2020-07-21 12:16:07 +02:00
Florian Hahn
de943b561f [SCCP] Add range metadata to call sites with known return ranges.
If we inferred a range for the function return value, we can add !range
at all call-sites of the function, if the range does not include undef.

Reviewers: efriedma, davide, nikic

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D83952
2020-07-21 10:06:54 +01:00
Nathan James
6fc76c37f5 [ADT] use is_base_of inplace of is_same for random_access_iterator_tag checks
Replace `std::is_same<X, std::random_access_iterator_tag>` with `std::is_base_of<std::random_access_iterator_tag, X>` in STLExtra algos.

This doesn't have too much impact on LLVM internally as no structs derive from it.
However external projects embedding LLVM may use `std::contiguous_iterator_tag` which should be considered by these algorithms.
As well as any other potential tags people want to define derived from `std::random_access_iterator_tag`

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D84141
2020-07-21 09:55:16 +01:00
Alex Richardson
4b94bc362e [NFC] Use FileCheck for llvm-reduce interesness test
This makes the test added in 6187eeb683d8c639282d437e6af585e9b7f9c93e
easier to understand since you no longer have to look at another script
to see if it's doing the right thing.
2020-07-21 09:03:45 +01:00
Jared Wyles
5603ee7630 [jitlink] Updating test file for GOT relocations for elf x86 2020-07-21 17:19:48 +10:00
David Green
5a76ea81c6 [ARM] More unpredictable VCVT instructions.
These extra vcvt instructions were missed from 74ca67c109 because they
live in a different Domain, but should be treated in the same way.

Differential Revision: https://reviews.llvm.org/D83204
2020-07-21 07:24:37 +01:00
David Green
b082241dfa [ARM] Predicated MVE reduction tests. NFC 2020-07-21 06:47:48 +01:00
Logan Smith
49c75c362c [NFC] Add another missing 'override'
This should be the last one needed to appease the -Werror bots (knock on wood).
2020-07-20 22:04:27 -07:00
Logan Smith
24c415cf6d [NFC] Add missing 'override's 2020-07-20 19:52:49 -07:00
David Blaikie
dc9888b6b6 DebugInfo: Move getMD5AsBytes from DwarfUnit to DwarfDebug
It wasn't using any state from DwarfUnit anyway.
2020-07-20 19:21:39 -07:00
Matt Arsenault
f4e8befd9e GlobalISel: Rewrite getLCMType
Try to make the behavior more consistent with getGCDType, and bias
towards returning something closer to the source type whenever there's
an ambiguity.
2020-07-20 21:06:30 -04:00
Matt Arsenault
6ba9358e88 GlobalISel: Handle more cases in getGCDType
Try harder to find a canonical unmerge type when trying to cover the
desired target type. Handle finding a compatible unmerge type for two
vectors with different element types. This will return the largest
multiple of the source vector element that will evenly divide the
target vector type.

Also make the handling mixing scalars and vectors, and prefer the
source element type as the unmerge target type.
2020-07-20 20:53:35 -04:00
Matt Arsenault
d057444115 AMDGPU/GlobalISel: Remove unnecessary parameter 2020-07-20 20:53:01 -04:00
Artem Belevich
9d0375c353 [MC,NVPTX] Add MCAsmPrinter support for unsigned-only data directives.
PTX does not support negative values in .bNN data directives and we must
typecast such values to unsigned before printing them.

MCAsmInfo can now specify whether such casting is necessary for particular
target.

Differential Revision: https://reviews.llvm.org/D83423
2020-07-20 16:24:41 -07:00
Lang Hames
abf1154870 [ExecutionEngine] Initialize near block hint in SectionMemoryManager.
When allocating a new memory block in SectionMemoryManager, initialize
the Near hint for the other memory groups if they have not been set
already.

Patch by Dana Koch. Thanks Dana!
2020-07-20 14:40:54 -07:00
Roman Lebedev
0885e066d4 [Reduce] Argument reduction: don't try to drop terminator instructions
Newly-added test previously crashed.

While it is up for debate whether or not instruction reduction
should be indiscriminate in instruction dropping (there you can
just ensure that the test case is still -verify'ies), here
if we drop terminator, CloneFunctionInto() will immediately crash.

So let's not do that :)
2020-07-21 00:06:03 +03:00
Logan Smith
038f81782a [llvm][unittest] Add -Wno-suggest-override to more infrastructure that includes googletest/googlemock headers 2020-07-20 13:59:39 -07:00
Louis Dionne
93b238862a [NFC] Use std::free instead of ::free
Since we include <cstdlib> instead of <stdlib.h>, it makes sense to
use std::free.
2020-07-20 16:19:08 -04:00
Sanjay Patel
d607929186 [InstCombine] allow peeking through zext of shift amount to match rotate idioms (PR45701)
We might want to also allow trunc of the shift amount, but that seems less likely?

  define i32 @src(i32 %x, i1 %y) {
  %0:
    %rem = and i1 %y, 1
    %cmp = icmp eq i1 %rem, 0
    %sh_prom = zext i1 %rem to i32
    %sub = sub nsw nuw i1 0, %rem
    %sh_prom1 = zext i1 %sub to i32
    %shr = lshr i32 %x, %sh_prom1
    %shl = shl i32 %x, %sh_prom
    %or = or i32 %shl, %shr
    %r = select i1 %cmp, i32 %x, i32 %or
    ret i32 %r
  }
  =>
  define i32 @tgt(i32 %x, i1 %y) {
  %0:
    %t = zext i1 %y to i32
    %r = fshl i32 %x, i32 %x, i32 %t
    ret i32 %r
  }

  Transformation seems to be correct!

https://alive2.llvm.org/ce/z/xgMvE3

http://bugs.llvm.org/PR45701
2020-07-20 16:18:11 -04:00
Sanjay Patel
2feb6522ca [InstCombine] add tests for funnel shift/rotate with narrow shift amount; NFC 2020-07-20 16:18:11 -04:00
Florian Hahn
ded006d0d9 [Matrix] Use TileInfo to create tiled loop nest for matrix multiply.
This patch uses the TileInfo introduced in D77550 to generate a loop
nest for tiled matrix multiplication, instead of generating the
unrolled code for the whole multiplication. This makes code-generation
more scalable for larger matrixes.

Initially loops are only used if both the number of rows and columns are
divisible by the tile size. Other cases will be added as follow-up.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke, nicolasvasilache

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D81308
2020-07-20 21:11:53 +01:00
Eli Friedman
941026e51d [AArch64][SVE] Add support for trunc to <vscale x N x i1>.
This isn't a natively supported operation, so convert it to a
mask+compare.

In addition to the operation itself, fix up some surrounding stuff to
make the testcase work: we need concat_vectors on i1 vectors, we need
legalization of i1 vector truncates, and we need to fix up all the
relevant uses of getVectorNumElements().

Differential Revision: https://reviews.llvm.org/D83811
2020-07-20 13:11:02 -07:00
Logan Smith
16a34f1645 Enable -Wsuggest-override in the LLVM build
This patch adds Clang's new (and GCC's old) -Wsuggest-override to the warning flags for the LLVM build. The warning is a stronger form of -Winconsistent-missing-override which warns _everywhere_ that override is missing, not just in places where it's inconsistent within a class.

Some directories in the monorepo need the warning disabled for compatibility's, or sanity's, sake; in particular, libcxx/libcxxabi, and any code implementing or interoperating with googletest, googlemock, or google benchmark (which do not themselves use override). This patch adds -Wno-suggest-override to the relevant CMakeLists.txt's to accomplish this.

Differential Revision: https://reviews.llvm.org/D84126
2020-07-20 12:32:47 -07:00
Hiroshi Yamauchi
d2f68caf31 [PGO] Enable the extended value profile buckets for mem op sizes.
Following up D81682 and enable the new, extended value profile buckets for mem
op sizes.

Differential Revision: https://reviews.llvm.org/D83903
2020-07-20 12:05:09 -07:00
Hiroshi Yamauchi
821825f6a3 [PGO][PGSO] Remove a temporary flag used for gradual rollout.
Remove the temporary flag PGSOIRPassOrTestOnly and the guard code which was used
for the staged rollout. This is a cleanup (NFC) as it's now false by default.

Differential Revision: https://reviews.llvm.org/D84057
2020-07-20 11:12:11 -07:00
Mircea Trofin
3e712a8637 [llvm] Development-mode InlineAdvisor
Summary:
This is the InlineAdvisor used in 'development' mode. It enables two
scenarios:

 - loading models via a command-line parameter, thus allowing for rapid
 training iteration, where models can be used for the next exploration
 phase without requiring recompiling the compiler. This trades off some
 compilation speed for the added flexibility.

 - collecting training logs, in the form of tensorflow.SequenceExample
 protobufs. We generate these as textual protobufs, which simplifies
 generation and testing. The protobufs may then be readily consumed by a
 tensorflow-based training algorithm.

To speed up training, training logs may also be collected from the
'default' training policy. In that case, this InlineAdvisor does not
use a model.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140763.html

Reviewers: jdoerfert, davidxl

Subscribers: mgorny, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D83733
2020-07-20 11:01:56 -07:00
LLVM GN Syncbot
d8d3b91ab3 [gn build] Port e1270b16c94 2020-07-20 17:51:57 +00:00
Florian Hahn
992d1824c2 [Matrix] Add TileInfo abstraction for tiled matrix code-gen.
This patch adds a TileInfo abstraction and utilities to
create a 3-level loop nest for tiling.

Reviewers: anemet

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D77550
2020-07-20 18:49:08 +01:00
Victor Huang
5b0daad836 [PowerPC] Implement R_PPC64_REL24_NOTOC local calls, callee requires a TOC
The PC Relative code now allows for calls that are marked with the relocation
R_PPC64_REL24_NOTOC. This indicates that the caller does not have a valid TOC
pointer in R2 and does not require R2 to be restored after the call.

This patch is added to support local calls to callees that require a TOC

Reviewed By: sfertile, MaskRay, nemanjai, stefanp

Differential Revision: https://reviews.llvm.org/D83504
2020-07-20 17:46:49 +00:00
Yuanfang Chen
8dacd8c600 [NFC] remove unused llvm::deleter 2020-07-20 10:43:29 -07:00
Yuanfang Chen
52f25d528b [NFC] remove unused includes of SelectionDAGISel.h 2020-07-20 10:43:29 -07:00
Yuanfang Chen
5ffd4d5373 [NFC] remove unneeded TargetLoweringObjectFile init after 85c30f3374d9 2020-07-20 10:43:28 -07:00
Yuanfang Chen
bf8086d1c1 [llc] (almost) remove --print-machineinstrs
Its effect could be achieved by
`-stop-after`,`-print-after`,`-print-after-all`. But a few tests need to
print MIR after ISel which could not be done with
`-print-after`/`-stop-after` since isel pass does not have commandline name.
That's the reason `--print-machineinstrs` is downgraded to
`--print-after-isel` in this patch. `--print-after-isel` could be
removed after we switch to new pass manager since isel pass would have a
commandline text name to use `print-after` or equivalent switches.

The motivation of this patch is to reduce tests dependency on
would-be-deprecated feature.

Reviewed By: arsenm, dsanders

Differential Revision: https://reviews.llvm.org/D83275
2020-07-20 10:43:28 -07:00
Matt Arsenault
f08273f200 AMDGPU: Use MCRegister for preloaded arguments
Attempt to fix build error with ancient GCC
2020-07-20 13:34:28 -04:00
Fangrui Song
5a1c9a0071 [llvm-readobj] clang-format DwarfCFIEHPrinter.h, NFC
Pre-commit header ordering changes (and other minor clean-ups) before landing D84106.
2020-07-20 10:25:16 -07:00
Fangrui Song
6f16bde0e0 [LLVMgold.so] -plugin-opt=save-temps: save combined module to .lto.o instead of .o
This matches LLD and fixes https://sourceware.org/bugzilla/show_bug.cgi?id=26262#c1

.o is a bad choice for save-temps output because it is easy to override the bitcode file (*.o)

```
 # Use bfd for the example, -fuse-ld=gold is similar.
clang -flto -c a.c  # generate bitcode file a.o
clang -fuse-ld=bfd -flto a.o -o a -Wl,-plugin-opt=save-temps  # override a.o

 # The user repeats the command but get surprised, because a.o is now a combined module.
clang -fuse-ld=bfd -flto a.o -o a -Wl,-plugin-opt=save-temps
```

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D84132
2020-07-20 10:02:56 -07:00
Nick Desaulniers
72d9dcdd94 [ThinLTO] parse flags and blockcount summaries
Forked from pr/46523, we were having a hard time running llvm-extract on
IR from a thinLTO build of the Linux kernel.

$ llvm-extract --func jeq_imm jit-42f488b63a04fdaa931315bdadecb6d23e20529a.ll
llvm-extract: jit-42f488b63a04fdaa931315bdadecb6d23e20529a.ll:47463:8:
error: Expected 'gv', 'module', or 'typeid' at the start of summary
entry
^209 = flags: 8
       ^

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D82917
2020-07-20 09:50:22 -07:00
Matt Arsenault
6a998dc1d5 AMDGPU: Remove outdated fixme 2020-07-20 11:41:41 -04:00
Matt Arsenault
48dcb3085a AMDGPU: Fix not accounting for constantexpr uses of LDS globals
This was failing to add the size of LDS globals that weren't directly
used by an instruction. They could be used by constant expressions
which are transitively used by the function. This requires a better
search, but just abort on this for now for correctness.
2020-07-20 11:41:41 -04:00
Matt Arsenault
b0a70713f7 AMDGPU/GlobalISel: Initial Implementation of calls
Return values, and tail calls are not yet handled.
2020-07-20 11:13:22 -04:00
Matt Arsenault
b7a779e13a Verifier: Check byref address space for AMDGPU calling conventions 2020-07-20 11:13:11 -04:00
Matt Arsenault
413b267e1e Verifier: Disallow byval and similar for AMDGPU calling conventions
These imply stack-like semantics, which doesn't make any sense for
entry points.
2020-07-20 10:58:57 -04:00
Alok Kumar Sharma
0a592fd282 [DebugInfo] Support for DW_AT_associated and DW_AT_allocated.
Summary:
This support is needed for the Fortran array variables with pointer/allocatable
attribute. This support enables debugger to identify the status of variable
whether that is currently allocated/associated.

  for pointer array (before allocation/association)
  without DW_AT_associated

(gdb) pt ptr
type = integer (140737345375288:140737354129776)
(gdb) p ptr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_associated

(gdb) pt ptr
type = integer (:)
(gdb) p ptr
$1 = <not associated>

  for allocatable array (before allocation)

  without DW_AT_allocated

(gdb) pt arr
type = integer (140737345375288:140737354129776)
(gdb) p arr
value requires 35017956 bytes, which is more than max-value-size

  with DW_AT_allocated

(gdb) pt arr
type = integer, allocatable (:)
(gdb) p arr
$1 = <not allocated>

    Testing
- unit test cases added
- check-llvm
- check-debuginfo

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D83544
2020-07-20 19:54:35 +05:30
Matt Arsenault
ea505ad2f6 IR: Define byref parameter attribute
This allows tracking the in-memory type of a pointer argument to a
function for ABI purposes. This is essentially a stripped down version
of byval to remove some of the stack-copy implications in its
definition.

This includes the base IR changes, and some tests for places where it
should be treated similarly to byval. Codegen support will be in a
future patch.

My original attempt at solving some of these problems was to repurpose
byval with a different address space from the stack. However, it is
technically permitted for the callee to introduce a write to the
argument, although nothing does this in reality. There is also talk of
removing and replacing the byval attribute, so a new attribute would
need to take its place anyway.

This is intended avoid some optimization issues with the current
handling of aggregate arguments, as well as fixes inflexibilty in how
frontends can specify the kernel ABI. The most honest representation
of the amdgpu_kernel convention is to expose all kernel arguments as
loads from constant memory. Today, these are raw, SSA Argument values
and codegen is responsible for turning these into loads.

Background:

There currently isn't a satisfactory way to represent how arguments
for the amdgpu_kernel calling convention are passed. In reality,
arguments are passed in a single, flat, constant memory buffer
implicitly passed to the function. It is also illegal to call this
function in the IR, and this is only ever invoked by a driver of some
kind.

It does not make sense to have a stack passed parameter in this
context as is implied by byval. It is never valid to write to the
kernel arguments, as this would corrupt the inputs seen by other
dispatches of the kernel. These argumets are also not in the same
address space as the stack, so a copy is needed to an alloca. From a
source C-like language, the kernel parameters are invisible.
Semantically, a copy is always required from the constant argument
memory to a mutable variable.

The current clang calling convention lowering emits raw values,
including aggregates into the function argument list, since using
byval would not make sense. This has some unfortunate consequences for
the optimizer. In the aggregate case, we end up with an aggregate
store to alloca, which both SROA and instcombine turn into a store of
each aggregate field. The optimizer never pieces this back together to
see that this is really just a copy from constant memory, so we end up
stuck with expensive stack usage.

This also means the backend dictates the alignment of arguments, and
arbitrarily picks the LLVM IR ABI type alignment. By allowing an
explicit alignment, frontends can make better decisions. For example,
there's real no advantage to an aligment higher than 4, so a frontend
could choose to compact the argument layout. Similarly, there is a
high penalty to using an alignment lower than 4, so a frontend could
opt into more padding for small arguments.

Another design consideration is when it is appropriate to expose the
fact that these arguments are all really passed in adjacent
memory. Currently we have a late IR optimization pass in codegen to
rewrite the kernel argument values into explicit loads to enable
vectorization. In most programs, unrelated argument loads can be
merged together. However, exposing this property directly from the
frontend has some disadvantages. We still need a way to track the
original argument sizes and alignments to report to the driver. I find
using some side-channel, metadata mechanism to track this
unappealing. If the kernel arguments were exposed as a single buffer
to begin with, alias analysis would be unaware that the padding bits
betewen arguments are meaningless. Another family of problems is there
are still some gaps in replacing all of the available parameter
attributes with metadata equivalents once lowered to loads.

The immediate plan is to start using this new attribute to handle all
aggregate argumets for kernels. Long term, it makes sense to migrate
all kernel arguments, including scalars, to be passed indirectly in
the same manner.

Additional context is in D79744.
2020-07-20 10:23:09 -04:00
Simon Pilgrim
a3033adc1a MCFixup.h - remove unnecessary MCExpr.h include. NFCI.
Move the include down to files that actually depend on MCExpr definitions.

Also exposes an implicit dependency on MCContext in AVRAsmBackend.h
2020-07-20 15:17:19 +01:00
Simon Pilgrim
741fef4349 CodeGenDAGPatterns.h - remove unnecessary ComplexPattern forward declaration. NFCI.
This is defined in CodeGenTarget.h which we have to explicitly include already.
2020-07-20 15:17:19 +01:00
Simon Pilgrim
f042856ca0 CodeGenDAGPatterns.h - remove unused CodeGenHwModes.h include. NFCI. 2020-07-20 15:17:18 +01:00
Petar Avramovic
bd8687800b AMDGPU/GlobalISel: Legalize s16->s64 G_FPEXT
Legalize using narrowScalar as s16->s32 G_FPEXT
followed by s32->s64 G_FPEXT.

Differential Revision: https://reviews.llvm.org/D84030
2020-07-20 16:12:19 +02:00
Matt Arsenault
d494042b78 AMDGPU/GlobalISel: Remove outdated comment 2020-07-20 10:06:18 -04:00