1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00
Commit Graph

45660 Commits

Author SHA1 Message Date
Lang Hames
2d682bd2a2 [ORC] Introduce ExecutorAddress type, fix broken LLDB bot.
ExecutorAddressRange depended on JITTargetAddress, but JITTargetAddress is
defined in ExecutionEngine, which OrcShared should not depend on.

This seems like as good a time as any to introduce a new ExecutorAddress type
to eventually replace JITTargetAddress. For now it's just another uint64_t
alias, but it will soon be changed to a class type to provide greater type
safety.
2021-07-08 16:31:59 +10:00
Lang Hames
bee25fbe59 [ORC] Improve computeLocalDeps / computeNamedSymbolDependencies performance.
The computeNamedSymbolDependencies and computeLocalDeps methods on
ObjectLinkingLayerJITLinkContext are responsible for computing, for each symbol
in the current MaterializationResponsibility, the set of non-locally-scoped
symbols that are depended on. To calculate this we have to consider the effect
of chains of dependence through locally scoped symbols in the LinkGraph. E.g.

        .text
        .globl  foo
foo:
        callq   bar                    ## foo depneds on external 'bar'
        movq    Ltmp1(%rip), %rcx      ## foo depends on locally scoped 'Ltmp1'
        addl    (%rcx), %eax
        retq

        .data
Ltmp1:
        .quad   x                      ## Ltmp1 depends on external 'x'

In this example symbol 'foo' depends directly on 'bar', and indirectly on 'x'
via 'Ltmp1', which is locally scoped.

Performance of the existing implementations appears to have been mediocre:
Based on flame graphs posted by @drmeister (in #jit on the LLVM discord server)
the computeLocalDeps function was taking up a substantial amount of time when
starting up Clasp (https://github.com/clasp-developers/clasp).

This commit attempts to address the performance problems in three ways:

1. Using jitlink::Blocks instead of jitlink::Symbols as the nodes of the
dependencies-introduced-by-locally-scoped-symbols graph.

Using either Blocks or Symbols as nodes provides the same information, but since
there may be more than one locally scoped symbol per block the block-based
version of the dependence graph should always be a subgraph of the Symbol-based
version, and so faster to operate on.

2. Improved worklist management.

The older version of computeLocalDeps used a fixed worklist containing all
nodes, and iterated over this list propagating dependencies until no further
changes were required. The worklist was not sorted into a useful order before
the loop started.

The new version uses a variable work-stack, visiting nodes in DFS order and
only adding nodes when there is meaningful work to do on them.

Compared to the old version the new version avoids revisiting nodes which
haven't changed, and I suspect it converges more quickly (due to the DFS
ordering).

3. Laziness and caching.

Mappings of...

jitlink::Symbol* -> Interned Name (as SymbolStringPtr)
jitlink::Block* -> Immediate dependencies (as SymbolNameSet)
jitlink::Block* -> Transitive dependencies (as SymbolNameSet)

are all built lazily and cached while running computeNamedSymbolDependencies.

According to @drmeister these changes reduced Clasp startup time in his test
setup (averaged over a handful of starts) from 4.8 to 2.8 seconds (with
ORC/JITLink linking ~11,000 object files in that time), which seems like
enough to justify switching to the new algorithm in the absence of any other
perf numbers.
2021-07-08 16:31:59 +10:00
Lang Hames
760f860c3a [ORC] Replace MachOJITDylibInitializers::SectionExtent with ExecutorAddressRange
MachOJITDylibInitializers::SectionExtent represented the address range of a
section as an (address, size) pair. The new ExecutorAddressRange type
generalizes this to an address range (for any object, not necessarily a section)
represented as a (start-address, end-address) pair.

The aim is to express more of ORC (and the ORC runtime) in terms of simple types
that can be serialized/deserialized via SPS. This will simplify SPS-based RPC
involving arguments/return-values of these types.
2021-07-08 14:15:44 +10:00
Lang Hames
4c6599a274 [ORC] Fix file comments. 2021-07-08 14:15:44 +10:00
Stanislav Mekhanoshin
dc43bb3409 [AMDGPU] Disable garbage collection passes
Differential Revision: https://reviews.llvm.org/D105593
2021-07-07 15:47:57 -07:00
Arthur Eubanks
266a9a84be [OpaquePtr] Use ArgListEntry::IndirectType for lowering ABI attributes
Consolidate PreallocatedType and ByValType into IndirectType, and use that for inalloca.
2021-07-07 14:58:38 -07:00
Arthur Eubanks
b3ffc2a93b [OpaquePtr] Remove checking pointee type for byval/preallocated type
These currently always require a type parameter. The bitcode reader
already upgrades old bitcode without the type parameter to use the
pointee type.

In cases where the caller does not have byval but the callee does, we
need to follow CallBase::paramHasAttr() and also look at the callee for
the byval type so that CallBase::isByValArgument() and
CallBase::getParamByValType() are in sync. Do the same for preallocated.

While we're here add a corresponding version for inalloca since we'll
need it soon.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D104663
2021-07-07 14:28:55 -07:00
Nikita Popov
aec8b8bed1 [IR] Make some pointer element type accesses explicit (NFC)
Explicitly fetch the pointer element type in various deprecated
methods, so we can hopefully remove support from this from the
base GEP constructor.
2021-07-07 22:05:30 +02:00
Martin Storsjö
5f3a753cf4 [COFF] [CodeView] Add a few new enum values
These are undocumented, but are visible in the SDK headers since some
versions ago.

Differential Revision: https://reviews.llvm.org/D105513
2021-07-07 22:00:18 +03:00
Sander de Smalen
3bbfdfb241 [CostModel] Express cost(urem) as cost(div+mul+sub) when set to Expand.
The Legalizer expands the operations of urem/srem into a div+mul+sub or divrem
when those are legal/custom. This patch changes the cost-model to reflect that
cost.

Since there is no 'divrem' Instruction in LLVM IR, the cost of divrem
is assumed to be the same as div+mul+sub since the three operations will
need to be executed at runtime regardless.

Patch co-authored by David Sherwood (@david-arm)

Reviewed By: RKSimon, paulwalker-arm

Differential Revision: https://reviews.llvm.org/D103799
2021-07-07 14:40:28 +01:00
Johannes Doerfert
2f34f28211 [Attributor][FIX] Replace uses first, then values
Before we replaced value by registering all their uses. However, as we
replace a value old uses become stale. We now replace values explicitly
and keep track of "new values" when doing so to avoid replacing only
uses in stale/old values but not their replacements.
2021-07-06 22:43:51 -05:00
Johannes Doerfert
4f0b565d46 [Attributor] Introduce a helper function to deal with undef + none
We often need to deal with the value lattice that contains none and
undef as special values. A simple helper makes this much nicer.

Differential Revision: https://reviews.llvm.org/D103857
2021-07-06 22:41:21 -05:00
Johannes Doerfert
13dc82700d [Attributor] Simplify operands inside of simplification AAs first
When we do simplification via AAPotentialValues or AAValueConstantRange
we need to simplify the operands of an instruction we deconstruct first.
This does not only improve the result, see for example range.ll, but is
required as we allow outside AAs to provide simplification rules via
callbacks. If we do ignore the simplification rules and base other
simplifications on the IR instead we can create an inconsistent state.
2021-07-06 22:41:18 -05:00
Eli Friedman
b83eae9454 Recommit [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Recommitting with fix to MemoryDepChecker::isDependent.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 12:16:05 -07:00
Eli Friedman
61b59d3278 Revert "[ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers."
This reverts commit 74d6ce5d5f169e9cf3fac0eb1042602e286dd2b9.

Seeing crashes on buildbots in MemoryDepChecker::isDependent.
2021-07-06 11:17:13 -07:00
Eli Friedman
b011bc0424 [ScalarEvolution] Make getMinusSCEV() fail for unrelated pointers.
As part of making ScalarEvolution's handling of pointers consistent, we
want to forbid multiplying a pointer by -1 (or any other value). This
means we can't blindly subtract pointers.

There are a few ways we could deal with this:
1. We could completely forbid subtracting pointers in getMinusSCEV()
2. We could forbid subracting pointers with different pointer bases
(this patch).
3. We could try to ptrtoint pointer operands.

The option in this patch is more friendly to non-integral pointers: code
that works with normal pointers will also work with non-integral
pointers. And it seems like there are very few places that actually
benefit from the third option.

As a minimal patch, the ScalarEvolution implementation of getMinusSCEV
still ends up subtracting pointers if they have the same base.  This
should eliminate the shared pointer base, but eventually we'll need to
rewrite it to avoid negating the pointer base. I plan to do this as a
separate step to allow measuring the compile-time impact.

This doesn't cause obvious functional changes in most cases; the one
case that is significantly affected is ICmpZero handling in LSR (which
is the source of almost all the test changes).  The resulting changes
seem okay to me, but suggestions welcome.  As an alternative, I tried
explicitly ptrtoint'ing the operands, but the result doesn't seem
obviously better.

I deleted the test lsr-undef-in-binop.ll becuase I couldn't figure out
how to repair it to test what it was actually trying to test.

Differential Revision: https://reviews.llvm.org/D104806
2021-07-06 10:54:41 -07:00
Jeremy Morse
409363cd51 [DebugInfo][InstrRef][3/4] Produce DBG_INSTR_REFs for all variable locations
This patch emits DBG_INSTR_REFs for two remaining flavours of variable
locations that weren't supported: copies, and inter-block VRegs. There are
still some locations that must be represented by DBG_VALUE such as
constants, but they're mostly independent of optimisations.

For variable locations that refer to values defined in different blocks,
vregs are allocated before isel begins, but the defining instruction
might not exist until late in isel. To get around this, emit
DBG_INSTR_REFs in a "half done" state, where the first operand refers to a
VReg. Then at the end of isel, patch these back up to refer to
instructions, using the finalizeDebugInstrRefs method.

Copies are something that I complained about the original RFC, and I
really don't want to have to put instruction numbers on copies. They don't
define a value: they move them. To address this isel, salvageCopySSA
interprets:
 * COPYs,
 * SUBREG_TO_REG,
 * Anything that isCopyInstr thinks is a copy.
And follows chains of copies back to the defining instruction that they
read from. This relies on any physical registers that COPYs read being
defined in the same block, or being entry-block arguments. For the former
we can put an instruction number on the defining instruction; for the
latter we can drop a DBG_PHI that reads the incoming value.

Differential Revision: https://reviews.llvm.org/D88896
2021-07-06 18:31:38 +01:00
Kerry McLaughlin
8dd39c43b3 [LV] Prevent vectorization with unsupported element types.
This patch adds a TTI function, isElementTypeLegalForScalableVector, to query
whether it is possible to vectorize a given element type. This is called by
isLegalToVectorizeInstTypesForScalable to reject scalable vectorization if
any of the instruction types in the loop are unsupported, e.g:

  int foo(__int128_t* ptr, int N)
    #pragma clang loop vectorize_width(4, scalable)
    for (int i=0; i<N; ++i)
      ptr[i] = ptr[i] + 42;

This example currently crashes if we attempt to vectorize since i128 is not a
supported type for scalable vectorization.

Reviewed By: sdesmalen, david-arm

Differential Revision: https://reviews.llvm.org/D102253
2021-07-06 13:06:21 +01:00
Albion Fung
2776c1ab5d [PowerPC] Implament Load and Reserve and Store Conditional Builtins
This patch implaments the load and reserve and store conditional
builtins for the PowerPC target, in order to have feature parody with
xlC on AIX.

Differential revision: https://reviews.llvm.org/D105236
2021-07-05 21:35:41 -05:00
Caroline Concatto
1631d2fbaa [AArch64][CostModel] Add cost model for experimental.vector.splice
This patch adds a new  ShuffleKind SK_Splice and then handle the cost in
getShuffleCost, as in experimental.vector.reverse.

Differential Revision: https://reviews.llvm.org/D104630
2021-07-05 14:30:24 +01:00
Esme-Yi
11bbb4a8e4 [llvm-readobj][XCOFF] Add support for printing the String Table.
Summary: The patch adds the StringTable dumping to
llvm-readobj. Currently only XCOFF is supported.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104613
2021-07-05 04:16:58 +00:00
Nikita Popov
3cc10c45ba [IR] Deprecate GetElementPtrInst::CreateInBounds without element type
This API is not compatible with opaque pointers, the method
accepting an explicit pointer element type should be used instead.

Thankfully there were few in-tree users. The BPF case still ends
up using the pointer element type for now and needs something like
D105407 to avoid doing so.
2021-07-04 16:49:30 +02:00
Paul Walker
ba16635997 [NFC] Fix a few whitespace issues and typos. 2021-07-04 11:49:58 +01:00
Nikita Popov
ecd2dc975e [IRBuilder] Add type argument to CreateMaskedLoad/Gather
Same as other CreateLoad-style APIs, these need an explicit type
argument to support opaque pointers.

Differential Revision: https://reviews.llvm.org/D105395
2021-07-04 12:17:59 +02:00
Christopher Di Bella
b463417679 [llvm][iwyu] explicitly includes <functional> and <utility>
Compiling LLVM with Clang modules and libc++ identified that
`Support/Printable.h` and `ADL/SmallVector.h` were using features that
live in these headers.

Differential Revision: https://reviews.llvm.org/D105402
2021-07-04 06:02:11 +00:00
Simon Pilgrim
f73cebf7e4 [KnownBits] Merge const/non-const KnownBits::extractBits implementations. NFC.
These are identical and can be just const.
2021-07-03 19:00:25 +01:00
Craig Topper
1ddc2a3bd1 [SelectionDAG] Rename memory VT argument for getMaskedGather/getMaskedScatter from VT to MemVT.
Use getMemoryVT() in MGATHER/MSCATTER DAG combines instead of
using the passthru or store value VT for this argument.
2021-07-02 17:37:40 -07:00
Jonas Devlieghere
3020664b33 Revert "[DebugInfo] Enforce implicit constraints on distinct MDNodes"
This reverts commit 8cd35ad854ab4458fd509447359066ea3578b494.

It breaks `TestMembersAndLocalsWithSameName.py` on GreenDragon and
Mikael Holmén points out in D104827 that bitcode files created with the
patch cannot be parsed with binaries built before it.
2021-07-02 15:57:07 -07:00
Amara Emerson
128d2d791b [GlobalISel] Clean up CombinerHelper::apply* functions to return void.
For some reason we/I started writing these as returning bool when the return value
is actually ignored by the combiner.
2021-07-02 13:17:06 -07:00
Amara Emerson
b1924533d9 [GlobalISel] Add re-association combine for G_PTR_ADD to allow better addressing mode usage.
We're trying to match a few pointer computation patterns here for
re-association opportunities.
1) Isolating a constant operand to be on the RHS, e.g.:
   G_PTR_ADD(BASE, G_ADD(X, C)) -> G_PTR_ADD(G_PTR_ADD(BASE, X), C)

2) Folding two constants in each sub-tree as long as such folding
   doesn't break a legal addressing mode.
   G_PTR_ADD(G_PTR_ADD(BASE, C1), C2) -> G_PTR_ADD(BASE, C1+C2)

AArch64 code size improvements on CTMark with -Os:
Program              before  after   diff
 pairlocalalign      251048  251044 -0.0%
 consumer-typeset    421820  421812 -0.0%
 kc                  431348  431320 -0.0%
 SPASS               413404  413300 -0.0%
 clamscan            384396  384220 -0.0%
 tramp3d-v4          370640  370412 -0.1%
 lencod              432096  431772 -0.1%
 bullet              479400  478796 -0.1%
 sqlite3             288504  288072 -0.1%
 7zip-benchmark      573796  570768 -0.5%
 Geomean difference                 -0.1%

Differential Revision: https://reviews.llvm.org/D105069
2021-07-02 12:31:21 -07:00
Krzysztof Parzyszek
9abc810a43 [OpaquePtr] Add type parameter to emitLoadLinked
Differential Revision: https://reviews.llvm.org/D105353
2021-07-02 13:07:40 -05:00
Jon Roelofs
fa1c32679f [Intrinsics] Make MemCpyInlineInst a MemCpyInst
This opens up more optimization opportunities in passes that already handle MemCpyInst's.

Differential revision: https://reviews.llvm.org/D105247
2021-07-02 10:25:24 -07:00
Jacob Hegna
721423a975 Unpack the CostEstimate feature in ML inlining models.
This change yields an additional 2% size reduction on an internal search
binary, and an additional 0.5% size reduction on fuchsia.

Differential Revision: https://reviews.llvm.org/D104751
2021-07-02 16:57:16 +00:00
Jinsong Ji
1ed15bd392 [AIX] Use AsmParser to do inline asm parsing
Add a flag so that target can choose to use AsmParser for parsing inline asm.
And set the flag by default for AIX.

-no-intergrated-as will override this default if specified explicitly.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D105314
2021-07-02 16:12:21 +00:00
Alex Richardson
a73a5b4199 Place the BlockAddress type in the address space of the containing function
While this should not matter for most architectures (where the program
address space is 0), it is important for CHERI (and therefore Arm Morello).
We use address space 200 for all of our code pointers and without this
change we assert in the SelectionDAG handling of BlockAddress nodes.

It is also useful for AVR: previously programs targeting
AVR that attempt to read their own machine code
via a pointer to a label would instead read from RAM
using a pointer relative to the the start of program flash.

Reviewed By: dylanmckay, theraven
Differential Revision: https://reviews.llvm.org/D48803
2021-07-02 12:17:55 +01:00
Roman Lebedev
5bd901b404 Revert "[WebAssembly] Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 4facbf213c51e4add2e8c19b08d5e58ad71c72de.

```
********************
FAIL: LLVM :: CodeGen/WebAssembly/funcref-call.ll (44466 of 44468)
******************** TEST 'LLVM :: CodeGen/WebAssembly/funcref-call.ll' FAILED ********************
Script:
--
: 'RUN: at line 1';   /builddirs/llvm-project/build-Clang12/bin/llc < /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll --mtriple=wasm32-unknown-unknown -asm-verbose=false -mattr=+reference-types | /builddirs/llvm-project/build-Clang12/bin/FileCheck /repositories/llvm-project/llvm/test/CodeGen/WebAssembly/funcref-call.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: /repositories/llvm-project/llvm/include/llvm/Support/LowLevelTypeImpl.h:44: static llvm::LLT llvm::LLT::scalar(unsigned int): Assertion `SizeInBits > 0 && "invalid scalar size"' failed.

```
2021-07-02 11:49:51 +03:00
Paulo Matos
e346ccc104 [WebAssembly] Implementation of global.get/set for reftypes in LLVM IR
Reland of 31859f896.

This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Differential Revision: https://reviews.llvm.org/D104797
2021-07-02 09:46:28 +02:00
Lang Hames
85a8d3c7b3 [ORC] Rename SPSTargetAddress to SPSExecutorAddress.
Also removes SPSTagTargetAddress, which was accidentally introduced at some
point (and never used).
2021-07-02 12:40:14 +10:00
Valentin Churavy
0b1b7443f1 [Orc] At CBindings for LazyRexports
At C bindings and an example for LLJIT with lazy reexports

Differential Revision: https://reviews.llvm.org/D104672
2021-07-01 21:52:05 +02:00
Nikita Popov
6203a57d08 [OpaquePtr] Support opaque pointers in intrinsic type check
This adds support for opaque pointers in intrinsic type checks
of IIT kind Pointer and PtrToElt.

This is less straight-forward than it might initially seem, because
we should only accept opaque pointers here in --force-opaque-pointers
mode. Otherwise, there would be more than one valid type signature
for a given intrinsic name.

Differential Revision: https://reviews.llvm.org/D105155
2021-07-01 18:26:41 +02:00
Matt Arsenault
20d89b9242 GlobalISel: Use LLT in call lowering callbacks
This preserves the memory type so the lowerings can rely on them.
2021-07-01 12:15:54 -04:00
Hussain Kadhem
7eddb43fa0 [VP] Implementation of intrinsic and SDNode definitions for VP load, store, gather, scatter.
This patch adds intrinsic definitions and SDNodes for predicated
load/store/gather/scatter, based on the work done in D57504.

Reviewed By: simoll, craig.topper

Differential Revision: https://reviews.llvm.org/D99355
2021-07-01 13:34:44 +02:00
Jeremy Morse
7a8e30eba0 [DebugInfo][InstrRef][1/4] Support transformations that widen values
Very late in compilation, backends like X86 will perform optimisations like
this:

    $cx = MOV16rm $rax, ...
    ->
    $rcx = MOV64rm $rax, ...

Widening the load from 16 bits to 64 bits. SEeing how the lower 16 bits
remain the same, this doesn't affect execution. However, any debug
instruction reference to the defined operand now refers to a 64 bit value,
nto a 16 bit one, which might be unexpected. Elsewhere in codegen, there's
often this pattern:

    CALL64pcrel32 @foo, implicit-def $rax
    %0:gr64 = COPY $rax
    %1:gr32 = COPY %0.sub_32bit

Where we want to refer to the definition of $eax by the call, but don't
want to refer the copies (they don't define values in the way
LiveDebugValues sees it). To solve this, add a subregister field to the
existing "substitutions" facility, so that we can describe a field within
a larger value definition. I would imagine that this would be used most
often when a value is widened, and we need to refer to the original,
narrower definition.

Differential Revision: https://reviews.llvm.org/D88891
2021-07-01 11:19:27 +01:00
Christian Kühnel
70970300a3 added some example code for llvm::Expected<T>
Since I had some fun understanding how to properly use llvm::Expected<T> I added some code examples that I would have liked to see when learning to use it.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D105014
2021-07-01 09:57:20 +00:00
Andrzej Warzynski
41a27c03c7 [flang] Revert "PoC for Flang Driver Plugins"
This patch has not been reviewed and was commited by accident.

This reverts commit 788a5d4afe6407e647454a9832a7b4a27fba06bf.
2021-07-01 08:27:31 +00:00
Lang Hames
6567b76038 [ORC] Add wrapper-function support methods to ExecutorProcessControl.
Adds support for both synchronous and asynchronous calls to wrapper functions
using SPS (Simple Packed Serialization). Also adds support for wrapping
functions on the JIT side in SPS-based wrappers that can be called from the
executor.

These new methods simplify calls between the JIT and Executor, and will be used
in upcoming ORC runtime patches to enable communication between ORC and the
runtime.
2021-07-01 18:21:49 +10:00
Stuart Ellis
c930f37268 PoC for Flang Driver Plugins 2021-07-01 08:10:40 +00:00
Roman Lebedev
e9c11e84f4 [NFC][PassBuilder] addVectorPasses(): clarify that 'IsLTO' is actually 'IsFullLTO'
I.e. it will be `false` for thin lto.
2021-07-01 10:09:24 +03:00
Qiu Chaofan
a315353f43 [NFC][Scheduler] Refactor tryCandidate to return boolean
This patch changes return type of tryCandidate from void to bool:

1. Methods in some targets already follow this convention.
2. This would help if some target wants to re-use generic code.
3. It looks more intuitive if these try-method returns the same type.

We may need to change return type of them from bool to some enum
further, to make it less confusing.

Reviewed By: foad

Differential Revision: https://reviews.llvm.org/D103951
2021-07-01 14:31:47 +08:00
Lang Hames
a397416183 [ORC] Rename TargetProcessControl to ExecutorProcessControl. NFC.
This is a first step towards consistently using the term 'executor' for the
process that executes JIT'd code. I've opted for 'executor' as the preferred
term over 'target' as target is already heavily overloaded ("the target
machine for the executor" is much clearer than "the target machine for the
target").
2021-07-01 13:31:12 +10:00
Matt Arsenault
d665475981 GlobalISel: Use LLT in memory legality queries
This enables proper lowering of non-byte sized loads. We still aren't
faithfully preserving memory types everywhere, so the legality checks
still only consider the size.
2021-06-30 17:44:13 -04:00
Jonas Paulsson
38b768656f [MCStreamer] Move emission of attributes section into MCELFStreamer
Enable the emission of a GNU attributes section by reusing the code for
emitting the ARM build attributes section.

The GNU attributes follow the exact same section format as the ARM
BuildAttributes section, so this can be factored out and reused for GNU
attributes generally.

The immediate motivation for this is to emit a GNU attributes section for the
vector ABI on SystemZ (https://reviews.llvm.org/D105067).

Review: Logan Chien, Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D102894
2021-06-30 16:00:27 -05:00
Valentin Churavy
0c3065489c [Orc] Fix name of LLVMOrcIRTransformLayerSetTransform
In https://reviews.llvm.org/D103855 we added access to IRTransformLayer, but I
just noticed that the function name is following the wrong pattern.

Differential Revision: https://reviews.llvm.org/D104840
2021-06-30 21:43:34 +02:00
Jon Roelofs
b3511ee3cf [GISel] Support llvm.memcpy.inline
Differential revision: https://reviews.llvm.org/D105072
2021-06-30 12:39:05 -07:00
Bradley Smith
ef187efe53 [TargetLowering][AArch64][SVE] Take into account accessed type when clamping address
When clamping the index for a memory access to a stacked vector we must
take into account the entire type being accessed, not just assume that
we are accessing only a single element.

Differential Revision: https://reviews.llvm.org/D105016
2021-06-30 13:30:18 +01:00
Steffen Larsen
e40bb79a12 [Clang][NVPTX] Add NVPTX intrinsics and builtins for CUDA PTX 6.5 and 7.0 WMMA and MMA instructions
Adds NVPTX builtins and intrinsics for the CUDA PTX `wmma.load`, `wmma.store`, `wmma.mma`, and `mma` instructions added in PTX 6.5 and 7.0.

PTX ISA description of

  - `wmma.load`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-ld
  - `wmma.store`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-st
  - `wmma.mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma
  - `mma`: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-mma

Overview of `wmma.mma` and `mma` matrix shape/type combinations added with specific PTX versions: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-shape

Authored-by: Steffen Larsen <steffen.larsen@codeplay.com>
Co-Authored-by: Stuart Adams <stuart.adams@codeplay.com>

Reviewed By: tra

Differential Revision: https://reviews.llvm.org/D104847
2021-06-29 15:44:07 -07:00
Matt Arsenault
aa0b4488a5 CodeGen: Store LLT instead of uint64_t in MachineMemOperand
GlobalISel is relying on regular MachineMemOperands to track all of
the memory properties of accesses. Just the raw byte size is
insufficent to disambiguate all situations. For example, if we need to
split an unaligned extending load, we need to know the number of bits
in the original source value and can't infer it from the result
type. This is also a problem for extending vector loads.

This does decrease the maximum representable size from the full
uint64_t bytes to a maximum of 16-bits. No in tree testcases hit this,
other than places using UINT64_MAX for unknown sizes. This may be an
issue for G_MEMCPY and co., although they can just use unknown size
for large static sizes. This also has potential for backend abuse by
relying on the type when it really shouldn't be relevant after
selection.

This does not include the necessary MIR printer/parser changes to
represent this.
2021-06-29 17:38:51 -04:00
Jacob Hegna
f4bf136cf5 [NFC] clang-format on InlineCost.cpp and InlineAdvisor.h. 2021-06-29 18:15:27 +00:00
Nick Desaulniers
fd64c3a741 [Inline] prevent inlining on noprofile mismatch
Similar to
commit bc044a88ee3c ("[Inline] prevent inlining on stack protector mismatch")

The noprofile function attribute is meant to prevent compiler
instrumentation from being inserted into a function. Inlining may defeat
the developer's intent. If the caller and callee don't either BOTH have
the attribute or BOTH lack the attribute, suppress inline substitution.

This matches behavior being proposed in GCC:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573511.html
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80223

Add LangRef entry for noprofile fn attr, similar to text added in D93422
and D104944.

Reviewed By: MaskRay, melver, phosek

Differential Revision: https://reviews.llvm.org/D104810
2021-06-29 10:32:03 -07:00
Johannes Doerfert
ab6a6eee6b [Attributor][NFCI] Make the state of AAValueSimplify explicit
As we have done with other states we want the AAValueSimplify state to
be explicit to use it more easily in our helpers.
2021-06-29 09:38:22 -05:00
Florian Hahn
8002fe7d67 [BasicAA] Be more careful with modulo ops on VariableGEPIndex.
(V * Scale) % X may not produce the same result for any possible value
of V, e.g. if the multiplication overflows. This means we currently
incorrectly determine NoAlias in some cases.

This patch updates LinearExpression to track whether the expression
has NSW and uses that to adjust the scale used for alias checks.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99424
2021-06-29 09:22:36 +01:00
Michael Liao
58a44149ab [MIRParser] Add machine metadata.
- Add standalone metadata parsing support so that machine metadata nodes
  could be populated before and accessed during MIR is parsed.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103282
2021-06-28 22:29:36 -04:00
Scott Linder
47e3a5ca06 [DebugInfo] Enforce implicit constraints on distinct MDNodes
Add UNIQUED and DISTINCT properties in Metadata.def and use them to
implement restrictions on the `distinct` property of MDNodes:

* DIExpression can currently be parsed from IR or read from bitcode
  as `distinct`, but this property is silently dropped when printing
  to IR. This causes accepted IR to fail to round-trip. As DIExpression
  appears inline at each use in the canonical form of IR, it cannot
  actually be `distinct` anyway, as there is no syntax to describe it.
* Similarly, DIArgList is conceptually always uniqued. It is currently
  restricted to only appearing in contexts where there is no syntax for
  `distinct`, but for consistency it is treated equivalently to
  DIExpression in this patch.
* DICompileUnit is already restricted to always being `distinct`, but
  along with adding general support for the inverse restriction I went
  ahead and described this in Metadata.def and updated the parser to be
  general. Future nodes which have this restriction can share this
  support.

The new UNIQUED property applies to DIExpression and DIArgList, and
forbids them to be `distinct`. It also implies they are canonically
printed inline at each use, rather than via MDNode ID.

The new DISTINCT property applies to DICompileUnit, and requires it to
be `distinct`.

A potential alternative change is to forbid the non-inline syntax for
DIExpression entirely, as is done with DIArgList implicitly by requiring
it appear in the context of a function. For example, we would forbid:

    !named = !{!0}
    !0 = !DIExpression()

Instead we would only accept the equivalent inlined version:

    !named = !{!DIExpression()}

This essentially removes the ability to create a `distinct` DIExpression
by construction, as there is no syntax for `distinct` inline. If this
patch is accepted as-is, the result would be that the non-canonical
version is accepted, but the following would be an error and produce a diagnostic:

    !named = !{!0}
    ; error: 'distinct' not allowed for !DIExpression()
    !0 = distinct !DIExpression()

Also update some documentation to consistently use the inline syntax for
DIExpression, and to describe the restrictions on `distinct` for nodes
where applicable.

Reviewed By: StephenTozer, t-tye

Differential Revision: https://reviews.llvm.org/D104827
2021-06-28 21:20:04 +00:00
Scott Linder
93375cf94f [ADT] Add makeVisitor to STLExtras.h
Relands patch reverted by 61242c0addb120294211d24a97ed89837418cb36
The original patch mistakenly included unrelated tests.

Adds a utility to combine multiple Callables into a single Callable.
This is useful to make constructing a visitor for `std::visit`-like
functions more natural; functions like this will be added in future
patches.

Intended to supercede https://reviews.llvm.org/D99560 by
perfectly-forwarding the combined Callables.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100670
2021-06-28 20:07:11 +00:00
Scott Linder
28e61dd3f9 Revert "[ADT] Add makeVisitor to STLExtras.h"
This reverts commit 14a8aa615597ef0aa424ac9545906bf8b9865063.

Mistakenly landed this before a patch it should depend on was accepted.
2021-06-28 19:51:25 +00:00
Scott Linder
a106319b70 [ADT] Add makeVisitor to STLExtras.h
Adds a utility to combine multiple Callables into a single Callable.
This is useful to make constructing a visitor for `std::visit`-like
functions more natural; functions like this will be added in future
patches.

Intended to supercede https://reviews.llvm.org/D99560 by
perfectly-forwarding the combined Callables.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D100670
2021-06-28 19:35:42 +00:00
Akira Hatanaka
252bda7ebd [ObjC][ARC] Ignore operand bundle "clang.arc.attachedcall" on a call if
the call's return type is void

Instead of trying hard to prevent global optimization passes such as
deadargelim from changing the return type to void, just ignore the
bundle if the return type is void. clang currently emits calls to
@llvm.objc.clang.arc.noop.use, which consumes the function call result,
immediately after the function call to prevent changes to the return
type, but optimization passes can delete the call to
@llvm.objc.clang.arc.noop.use if the function call doesn't return, which
enables deadargelim to change the return type.

rdar://76671438

Differential Revision: https://reviews.llvm.org/D103062
2021-06-28 11:02:30 -07:00
Melanie Blower
423a70f3f3 [llvm][clang][fpenv] Create new intrinsic llvm.arith.fence to control FP optimization at expression level
This intrinsic blocks floating point transformations by the optimizer.

Author: Pengfei

Reviewed By: LuoYuanke, Andy Kaylor, Craig Topper, kpn

Differential Revision: https://reviews.llvm.org/D99675
2021-06-28 12:26:52 -04:00
Sander de Smalen
a82143ea32 Reland [GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize.
This patch relands https://reviews.llvm.org/D104454, but fixes some failing
builds on Mac OS which apparently has a different definition for size_t,
that caused 'ambiguous operator overload' for the implicit conversion
of TypeSize to a scalar value.

This reverts commit b732e6c9a8438e5204ac96c8ca76f9b11abf98ff.
2021-06-28 15:24:27 +01:00
Brendon Cahoon
16a6bb6581 [AMDGPU][GlobalISel] Legalize and select G_SBFX and G_UBFX
Adds legalizer, register bank select, and instruction
select support for G_SBFX and G_UBFX. These opcodes generate
scalar or vector ALU bitfield extract instructions for
AMDGPU. The instructions allow both constant or register
values for the offset and width operands.

The 32-bit scalar version is expanded to a sequence that
combines the offset and width into a single register.

There are no 64-bit vgpr bitfield extract instructions, so the
operations are expanded to a sequence of instructions that
implement the operation. If the width is a constant,
then the 32-bit bitfield extract instructions are used.

Moved the AArch64 specific code for creating G_SBFX to
CombinerHelper.cpp so that it can be used by other targets.
Only bitfield extracts with constant offset and width values
are handled currently.

Differential Revision: https://reviews.llvm.org/D100149
2021-06-28 09:06:44 -04:00
Jan Kratochvil
d1dd2f5d77 llvm-dwarfdump: Print warnings on invalid DWARF
llvm-dwarfdump was silent even when the format of DWARF was invalid
and/or llvm-dwarfdump did not understand/support some of the constructs.
This can be pretty confusing as llvm-dwarfdump is a tool for DWARF
producers+consumers development.

Review comments also by @dblaikie.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D104271
2021-06-27 11:38:35 +02:00
David Green
6a316cf978 [ISel] Port AArch64 SABD and UABD to DAGCombine
This ports the AArch64 SABD and USBD over to DAG Combine, where they can be
used by more backends (notably MVE in a follow-up patch). The matching code
has changed very little, just to handle legal operations and types
differently. It selects from (ABS (SUB (EXTEND a), (EXTEND b))), producing
a ubds/abdu which is zexted to the original type.

Differential Revision: https://reviews.llvm.org/D91937
2021-06-26 19:34:16 +01:00
Joseph Huber
b8d800fd9c [OpenMP] Change OpenMPOpt to check openmp metadata
The metadata added in D102361 introduces a module flag that we can check
to determine if the module was compiled with `-fopenmp` enables. We can
now check for the precense of this instead of scanning the call graph
for OpenMP runtime functions.

Depends on D102361

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102423
2021-06-25 16:34:22 -04:00
Nikita Popov
33e01a9045 [IR] Add Type::isOpaquePointerTy() helper (NFC)
Shortcut to check for opaque pointers without a cast to PointerType.
2021-06-25 20:56:59 +02:00
Sander de Smalen
4d07cbe876 Revert "[GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize."
This patch seems to be causing build errors, reverting it for now.

This reverts commit aeab9d9570ac8cb554aff6e1af24a471fdf5b4e5.
2021-06-25 17:37:16 +01:00
Sander de Smalen
9d34fb6e49 [GlobalISel] NFC: Have LLT::getSizeInBits/Bytes return a TypeSize.
To reflect that the size may be scalable, a TypeSize is returned
instead of an unsigned. In places where the result is used,
it currently relies on an implicit cast of TypeSize -> uint64_t,
which asserts that the type is not scalable.

This patch is NFC for fixed-width vectors.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104454
2021-06-25 17:06:50 +01:00
Sander de Smalen
5eab663b62 [GlobalISel] NFC: Change LLT::changeNumElements to LLT::changeElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104453
2021-06-25 15:54:00 +01:00
Sander de Smalen
88c55d538f [GlobalISel] NFC: Change LLT::scalarOrVector to take ElementCount.
Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104452
2021-06-25 11:26:16 +01:00
serge-sans-paille
b7c46d66e4 [llvm-cov] Enforce alignment of function records
Function Records are required to be aligned on 8 bytes. This is enforced for each
records except the first, when one relies on the default alignment within an
std::string. There's no such guarantee, and indeed on 32 bits for some
implementation of std::string this is not enforced.

Provide a portable implementation based on llvm's MemoryBuffer.

Differential Revision: https://reviews.llvm.org/D104745
2021-06-25 10:56:06 +02:00
Tony Tye
7b3b9c00af [AMDGPU] Reserve AMDGPU ELF e_flags machine 0x43
Reviewed By: kzhuravl, rampitec

Differential Revision: https://reviews.llvm.org/D104872
2021-06-24 22:51:47 +00:00
Fangrui Song
47a9b3b42d [OptTable] Rename PrintHelp to printHelp
To be consistent with other member functions and match the coding standard.
2021-06-24 14:47:03 -07:00
Martin Storsjö
1d9cb8abdf [ADT] Complete the StringRef case insensitive method renaming
Remove the old name for the methods. These were only left behind to
ease the transition for downstreams.

Differential Revision: https://reviews.llvm.org/D104820
2021-06-25 00:22:02 +03:00
Martin Storsjö
9d14adb9f6 [llvm] Rename StringRef _lower() method calls to _insensitive()
This is a mechanical change. This actually also renames the
similarly named methods in the SmallString class, however these
methods don't seem to be used outside of the llvm subproject, so
this doesn't break building of the rest of the monorepo.
2021-06-25 00:22:01 +03:00
Martin Storsjö
24c3cf43d7 [ADT] Rename StringRef case insensitive methods for clarity
Rename functions with the `xx_lower()` names to `xx_insensitive()`.
This was requested during the review of D104218.

Test names and variables in llvm/unittests/ADT/StringRefTest.cpp
that refer to "lower" are renamed to "insensitive" correspondingly.

Unused function aliases with the former method names are left
in place (without any deprecation attributes) for transition purposes.

All references within the monorepo will be changed (with essentially
mechanical changes), and then the old names will be removed in a
later commit.

Also remove the superfluous method names at the start of doxygen
comments, for the methods that are touched here. (There are more
occurrances of this left in other methods though.) Also remove
duplicate doxygen comments from the implementation file.

Differential Revision: https://reviews.llvm.org/D104819
2021-06-25 00:22:00 +03:00
Aakanksha Patil
d4359ff02a [AMDGPU] Add gfx1035 target
Differential Revision: https://reviews.llvm.org/D104804
2021-06-24 14:32:41 -04:00
Anirudh Prasad
6fc759537e [AsmParser][SystemZ][z/OS] Support for emitting labels in upper case
- Currently, the emitting of labels in the parsePrimaryExpr function is case independent. It just takes the identifier and emits it.
- However, for HLASM the emitting of labels is case independent. We are emitting them in the upper case only, to enforce case independency. So we need to ensure that at the time of parsing the label we are emitting the upper case (in `parseAsHLASMLabel`), but also, when we are processing a PC-relative relocatable expression, we need to ensure we emit it in upper case (in `parsePrimaryExpr`)
- To achieve this a new MCAsmInfo attribute has been introduced which corresponding targets can override if needed.

Reviewed By: abhina.sreeskantharajan, uweigand

Differential Revision: https://reviews.llvm.org/D104715
2021-06-24 12:50:11 -04:00
Alexander Yermolovich
d12ae1eaf8 [LLD][LLVM] CG Graph profile using relocations
Currently when .llvm.call-graph-profile is created by llvm it explicitly encodes the symbol indices. This section is basically a black box for post processing tools. For example, if we run strip -s on the object files the symbol table changes, but indices in that section do not. In non-visible behavior indices point to wrong symbols. The visible behavior indices point outside of Symbol table: "invalid symbol index".

This patch changes the format by using R_*_NONE relocations to indicate the from/to symbols. The Frequency (Weight) will still be in the .llvm.call-graph-profile, but symbol information will be in relocation section. In LLD information from both sections is used to reconstruct call graph profile. Relocations themselves will never be applied.

With this approach post processing tools that handle relocations correctly work for this section also. Tools can add/remove symbols and as long as they handle relocation sections with this approach information stays correct.

Doing a quick experiment with clang-13.
The size went up from 107KB to 322KB, aggregate of all the input sections. Size of clang-13 binary is ~118MB. For users of -fprofile-use/-fprofile-sample-use the size of object files will go up slightly, it will not impact final binary size.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D104080
2021-06-24 09:09:33 -07:00
Brendon Cahoon
5e0256758b [GlobalISel] Describe undefined values for G_SBFX/G_UBFX operands
Differential Revision: https://reviews.llvm.org/D104245
2021-06-24 09:31:41 -04:00
Sander de Smalen
ac11cfc716 [GlobalISel] NFC: Change LLT::vector to take ElementCount.
This also adds new interfaces for the fixed- and scalable case:
* LLT::fixed_vector
* LLT::scalable_vector

The strategy for migrating to the new interfaces was as follows:
* If the new LLT is a (modified) clone of another LLT, taking the
  same number of elements, then use LLT::vector(OtherTy.getElementCount())
  or if the number of elements is halfed/doubled, it uses .divideCoefficientBy(2)
  or operator*. That is because there is no reason to specifically restrict
  the types to 'fixed_vector'.
* If the algorithm works on the number of elements (as unsigned), then
  just use fixed_vector. This will need to be fixed up in the future when
  modifying the algorithm to also work for scalable vectors, and will need
  then need additional tests to confirm the behaviour works the same for
  scalable vectors.
* If the test used the '/*Scalable=*/true` flag of LLT::vector, then
  this is replaced by LLT::scalable_vector.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D104451
2021-06-24 11:26:12 +01:00
Stephen Tozer
782c047ef4 Partial Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands"
This is a partial reapply of the original commit and the followup commit
that were previously reverted; this reapply also includes a small fix
for a potential source of non-determinism, but also has a small change
to turn off variadic debug value salvaging, to ensure that any future
revert/reapply steps to disable and renable this feature do not risk
causing conflicts.

Differential Revision: https://reviews.llvm.org/D91722

This reverts commit 386b66b2fc297cda121a3cc8a36887a6ecbcfc68.
2021-06-24 09:46:38 +01:00
Carl Ritson
e6a4177023 [ValueTypes] Define MVTs for v3i64/v3f64 to complement v6i32/v6f32
Having type symmetry with these is somewhat necessary when implementing support for 192-bit values.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D104621
2021-06-24 12:41:22 +09:00
Nikita Popov
60a7807d51 [PatternMatch] Make m_VScale compatible with opaque pointers
Use GEP source type instead of pointer element type.
2021-06-23 23:02:13 +02:00
Eli Friedman
8973883ff0 [NFC][ScalarEvolution] Fix SCEVNAryExpr::getType().
SCEVNAryExpr::getType() could return the wrong type for a SCEVAddExpr.
Remove it, and add getType() methods to the relevant subclasses.

NFC because nothing uses it directly, as far as I know; this is just
future-proofing.
2021-06-23 12:55:59 -07:00
Cyndy Ishida
b270097e7d [TextAPI] add symbol name prefixes to central location, NFC
These prefixes are used for printing the symbols coming from tbd files
and they were redundant across locations
2021-06-23 11:21:00 -07:00
Kuter Dinel
16d688b628 [Attributor] Derive AAFunctionReachability attribute.
This attribute uses Attributor's internal 'optimistic' call graph
information to answer queries about function call reachability.

Functions can become reachable over time as new call edges are
discovered.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104599
2021-06-23 20:43:10 +03:00
Nikita Popov
94c11807a4 [LAA] Make getPointersDiff() API compatible with opaque pointers
Make getPointersDiff() and sortPtrAccesses() compatible with opaque
pointers by explicitly passing in the element type instead of
determining it from the pointer element type.

The SLPVectorizer result is slightly non-optimal in that unnecessary
pointer bitcasts are added.

Differential Revision: https://reviews.llvm.org/D104784
2021-06-23 18:44:34 +02:00
Tomasz Miąsko
5a6e96d2d0 [Demangle][Rust] Hide implementation details NFC
Move content of the "public" header into the implementation file.

This also renames two enumerations that were previously used through
`rust_demangle::` scope, to avoid breaking a build bot with older
version of GCC that rejects uses of enumerator through `E::A` if there
is a variable with the same name as enumeration `E` in the scope.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D104362
2021-06-23 18:28:16 +02:00
Nikita Popov
3d23594a81 [TTI] Make assertion compatible with opaque pointers
Dropping the TODO here because it applies to all uses of this method.
2021-06-23 12:21:54 +02:00
River Riddle
deae9b5f50 [mlir] Add a ThreadPool to MLIRContext and refactor MLIR threading usage
This revision refactors the usage of multithreaded utilities in MLIR to use a common
thread pool within the MLIR context, in addition to a new utility that makes writing
multi-threaded code in MLIR less error prone. Using a unified thread pool brings about
several advantages:

* Better thread usage and more control
We currently use the static llvm threading utilities, which do not allow multiple
levels of asynchronous scheduling (even if there are open threads). This is due to
how the current TaskGroup structure works, which only allows one truly multithreaded
instance at a time. By having our own ThreadPool we gain more control and flexibility
over our job/thread scheduling, and in a followup can enable threading more parts of
the compiler.

* The static nature of TaskGroup causes issues in certain configurations
Due to the static nature of TaskGroup, there have been quite a few problems related to
destruction that have caused several downstream projects to disable threading. See
D104207 for discussion on some related fallout. By having a ThreadPool scoped to
the context, we don't have to worry about destruction and can ensure that any
additional MLIR thread usage ends when the context is destroyed.

Differential Revision: https://reviews.llvm.org/D104516
2021-06-23 01:29:24 +00:00
Jon Roelofs
f2b70884ff [Remarks] Make memsize remarks report as an analysis, not a missed opportunity.
Differential revision: https://reviews.llvm.org/D104078
2021-06-22 18:22:47 -07:00
Nikita Popov
09e246902d [OpaquePtr] Support changing load type in InstCombine
When the load type is changed to ptr, we need the load pointer type
to also be ptr, because it's not allowed to create a pointer to an
opaque pointer. This is achieved by adjusting the getPointerTo() API
to return an opaque pointer for an opaque pointer base type.

Differential Revision: https://reviews.llvm.org/D104718
2021-06-22 21:16:15 +02:00
Joseph Huber
141815765c [Attributor] Add an option to increase the max number of iterations
Right now the Attributor defaults to 32 fixed point iterations unless it is set
explicitly by a command line flag. This patch allows this to be configured when
the attributor instance is created. The maximum is then increased in OpenMPOpt
if the target is a kernel. This is because the globalization analysis can result
in larger iteration counts due to many dependent instances running at once.

Depends on D102444

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104416
2021-06-22 14:38:25 -04:00
Joseph Huber
0fbe411307 [Attributor] Add interface to emit remarks in Attributor
Summary:
This patch adds support for the Attributor to emit remarks on behalf of some
other pass. The attributor can now optionally take a callback function that
returns an OptimizationRemarkEmitter object when given a Function pointer. If
this is availible then a remark will be emitted for the corresponding pass
name.

Depends on D102197

Reviewed By: sstefan1 thegameg

Differential Revision: https://reviews.llvm.org/D102444
2021-06-22 14:12:46 -04:00
Joseph Huber
4df09b164e [OpenMP] Enable HeapToStack conversion in OpenMPOpt for new RTL globalization calls
Summary:
The changes to globalization introduced in D97680 introduce a large amount of overhead by default. The old globalization method would always ignore globalization code if executing in SPMD mode. This wasn't strictly correct as data sharing is still possible in SPMD mode. The new interface is correct but introduces globalization code even when unnecessary. This optimization will use the existing HeapToStack transformation in the attributor to allow for unneeded globalization to be replaced with thread-private stack memory. This is done using the newly introduced library instances for the RTL functions added in D102087.

Depends on D97818

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102197
2021-06-22 13:23:05 -04:00
Joseph Huber
76b50aa3c4 [OpenMP] Add new OpenMP globalization functions to library info
Summary:
The changes to globalization introduced in D97680 created two new functions to
push / pop shareably memory on the GPU, __kmpc_alloc_shared and
__kmpc_free_shared. This patch adds these new runtime functions to the
library info so they can be used by the HeapToStack attributor interface. This
optimization replaces malloc / free pairs with stack memory if legal.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D102087
2021-06-22 13:23:05 -04:00
Fangrui Song
61ffec433f Improve the diagnostic of DiagnosticInfoResourceLimit (and warn-stack-size in particular)
Before: `warning: stack size limit exceeded (888) in main`
After: `warning: stack frame size (888) exceeds limit (100) in function 'main'` (the -Wframe-larger-than limit will be mentioned)

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D104667
2021-06-22 09:55:20 -07:00
Joseph Huber
cbac628d6a [OpenMP] Internalize functions in OpenMPOpt to improve IPO passes
Summary:
Currently the attributor needs to give up if a function has external linkage.
This means that the optimization introduced in D97818 will only apply to static
functions. This change uses the Attributor to internalize OpenMP device
routines by making a copy of each function with private linkage and replacing
the uses in the module with it. This allows for the optimization to be applied
to any regular function.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D102824
2021-06-22 12:38:10 -04:00
Nikita Popov
80dafbe344 [OpaquePtr] Handle addrspacecasts in InstCombine
This adds support for addrspace casts involving opaque pointers to
InstCombine, as well as the isEliminableCastPair() helper
(otherwise the assertion failure would just move there).

Add PointerType::hasSameElementTypeAs() to hide the element type
details.

Differential Revision: https://reviews.llvm.org/D104668
2021-06-22 17:45:30 +02:00
Joseph Huber
3aea5cddbb [OpenMP] Simplify GPU memory globalization
Summary:
Memory globalization is required to maintain OpenMP standard semantics for data sharing between
worker and master threads. The GPU cannot share data between its threads so must allocate global or
shared memory to store the data in. Currently this is implemented fully in the frontend using the
`__kmpc_data_sharing_push_stack` and __kmpc_data_sharing_pop_stack` functions to emulate standard
CPU stack sharing. The front-end scans the target region for variables that escape the region and
must be shared between the threads. Each variable then has a field created for it in a global record
type.

This patch replaces this functinality with a single allocation command, effectively mimicing an
alloca instruction for the variables that must be shared between the threads. This will be much
slower than the current solution, but makes it much easier to optimize as we can analyze each
variable independently and determine if it is not captured. In the future, we can replace these
calls with an `alloca` and small allocations can be pushed to shared memory.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D97680
2021-06-22 10:52:46 -04:00
Martin Storsjö
a307928fe5 [ADT] Add StringRef consume_front_lower and consume_back_lower
These serve as a convenient combination of consume_front/back and
startswith_lower/endswith_lower, consistent with other existing
case insensitive methods named <operation>_lower.

Differential Revision: https://reviews.llvm.org/D104218
2021-06-22 12:38:08 +03:00
Sander de Smalen
fd053d5ffe [GlobalISel] Add scalable property to LLT types.
This patch aims to add the scalable property to LLT. The rest of the
patch-series changes the interfaces to take/return ElementCount and
TypeSize, which both have the ability to represent the scalable property.

The changes are mostly mechanical and aim to be non-functional changes
for fixed-width vectors.

For scalable vectors some unit tests have been added, but no effort has
been put into making any of the GlobalISel algorithms work with scalable
vectors yet. That will be left as future work.

The work is split into a series of 5 patches to make reviews easier.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D104450
2021-06-22 08:43:34 +01:00
Heejin Ahn
68b04daf84 [WebAssembly] Make tag attribute's encoding uint8
This changes the encoding of the `attribute` field, which currently only
contains the value `0` denoting this tag is for an exception, from
`varuint32` to `uint8`. This field is effectively unused at the moment
and reserved for future use, and it is not likely to need `varuint32`
even in future.
See https://github.com/WebAssembly/exception-handling/pull/162.

This does not change any encoded binaries because `0` is encoded in the
same way both in `varuint32` and `uint8`.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104571
2021-06-21 21:22:39 -07:00
Eli Friedman
0c81356419 Rename MachineMemOperand::getOrdering -> getSuccessOrdering.
Since this method can apply to cmpxchg operations, make sure it's clear
what value we're actually retrieving.  This will help ensure we don't
accidentally ignore the failure ordering of cmpxchg in the future.

We could potentially introduce a getOrdering() method on AtomicSDNode
that asserts the operation isn't cmpxchg, but not sure that's
worthwhile.

Differential Revision: https://reviews.llvm.org/D103338
2021-06-21 16:49:27 -07:00
Nick Desaulniers
2aca733d9e [IR] convert warn-stack-size from module flag to fn attr
Otherwise, this causes issues when building with LTO for object files
that use different values.

Link: https://github.com/ClangBuiltLinux/linux/issues/1395

Reviewed By: dblaikie, MaskRay

Differential Revision: https://reviews.llvm.org/D104342
2021-06-21 15:09:25 -07:00
Rong Xu
2d9e36a4c2 [SampleFDO] Make FSDiscriminator flag part of function parameters
Add a parameter of IsFSDiscriminator to function
getBaseDiscriminatorFromDiscriminator().

This function currently checks the internal flag of
--enable-fs-discriminator. This is not good because we might
change the default value of the internal flag.

Note that we have a default parameter. This is just
because create_afdo_tool has a call-site to it.
I will remove the default parameter in a later patch.

Differential Revision: https://reviews.llvm.org/D104584
2021-06-21 14:37:45 -07:00
Nikita Popov
1872603909 [LoopUnroll] Don't modify TripCount/TripMultiple in computeUnrollCount() (NFCI)
As these are no longer passed to UnrollLoop(), there is no need to
modify them in computeUnrollCount(). Make them non-reference parameters.

Differential Revision: https://reviews.llvm.org/D104590
2021-06-21 21:34:17 +02:00
Nikita Popov
7f56d08fc8 [OpaquePtr] Return opaque pointer from opaque pointer GEP
For a GEP on an opaque pointer, also return an opaque pointer (or
vector of opaque pointer) result.

This requires explicitly enumerating the GEP source element type,
because it is now no longer implicitly enumerated as part of either
the source or result pointer types.

Differential Revision: https://reviews.llvm.org/D104652
2021-06-21 18:36:32 +02:00
Sebastian Neubauer
a7a80ebf9c [NFC] Fix typo 2021-06-21 14:59:30 +02:00
Fangrui Song
9e8233e08c [llvm-cov gcov] Support GCC 12 format
GCC 12 will change the length field to represent the number of bytes instead of
32-bit words. This avoids padding for strings.
2021-06-19 22:51:20 -07:00
Fangrui Song
f02bea7812 [llvm-cov gcov] Change case to match the prevailing style && replace getString with readString 2021-06-19 22:50:52 -07:00
Michael Liao
d21f701c76 [MIRPrinter] Add machine metadata support.
- Distinct metadata needs generating in the codegen to attach correct
  AAInfo on the loads/stores after lowering, merging, and other relevant
  transformations.
- This patch adds 'MachhineModuleSlotTracker' to help assign slot
  numbers to these newly generated unnamed metadata nodes.
- To help 'MachhineModuleSlotTracker' track machine metadata, the
  original 'SlotTracker' is rebased from 'AbstractSlotTrackerStorage',
  which provides basic interfaces to create/retrive metadata slots. In
  addition, once LLVM IR is processsed, additional hooks are also
  introduced to help collect machine metadata and assign them slot
  numbers.
- Finally, if there is any such machine metadata, 'MIRPrinter' outputs
  an additional 'machineMetadataNodes' field containing all the
  definition of those nodes.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D103205
2021-06-19 12:48:08 -04:00
Tomas Matheson
e797f6f6ee Allow building for release with EXPENSIVE_CHECKS
D97225 moved LazyCallGraph verify() calls behind EXPENSIVE_CHECKS,
but verity() is defined for debug builds only so this had the unintended
effect of breaking release builds with EXPENSIVE_CHECKS.

Fix by enabling verify() for both debug and EXPENSIVE_CHECKS.

Differential Revision: https://reviews.llvm.org/D104514
2021-06-19 17:02:11 +01:00
Nikita Popov
272334cfc0 [LoopUnroll] Push runtime unrolling decision up into tryToUnrollLoop()
Currently, UnrollLoop() is passed an AllowRuntime flag and decides
itself whether runtime unrolling should be used or not. This patch
pushes the decision into the caller and allows us to eliminate the
ULO.TripCount and ULO.TripMultiple parameters.

Differential Revision: https://reviews.llvm.org/D104487
2021-06-19 09:25:57 +02:00
Lang Hames
b6ca0c60bb [ORC][C-bindings] Add access to LLJIT IRTransformLayer, ThreadSafeModule utils.
This patch was derived from Valentin Churavy's work in
https://reviews.llvm.org/D104480. It adds support for setting the transform on
an IRTransformLayer, and for accessing the IRTransformLayer in LLJIT. It also
adds access to the ThreadSafeModule::withModuleDo method for thread-safe
access to modules.

A new example has been added to show how to use these APIs to optimize a module
during materialization.

Thanks Valentin!

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D103855
2021-06-19 11:50:27 +10:00
Hongtao Yu
7fbb587058 [CSSPGO] Undoing the concept of dangling pseudo probe
As a follow-up to https://reviews.llvm.org/D104129, I'm cleaning up the danling probe related code in both the compiler and llvm-profgen.

I'm seeing a 5% size win for the pseudo_probe section for SPEC2017 and 10% for Ciner. Certain benchmark such as 602.gcc has a 20% size win. No obvious difference seen on build time for SPEC2017 and Cinder.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D104477
2021-06-18 15:14:11 -07:00
Krzysztof Parzyszek
f7732c65f0 Revert "Delay initialization of OptBisect"
This reverts commit ec91df8d8195b8b759a89734dba227da1eaa729f.

It was committed by accident.
2021-06-18 13:16:45 -05:00
Krzysztof Parzyszek
de5add3739 Delay initialization of OptBisect
When LLVM is used in other projects, it may happen that global cons-
tructors will execute before the call to ParseCommandLineOptions.
Since OptBisect is initialized via a constructor, and has no ability
to be updated at a later time, passing "-opt-bisect-limit" to the
parse function may have no effect.

To avoid this problem use a cl::cb (callback) to set the bisection
limit when the option is actually processed.

Differential Revision: https://reviews.llvm.org/D104551
2021-06-18 13:15:19 -05:00
Lang Hames
d2647ecc04 [ORC][C-bindings] Re-order object transform function arguments.
ObjInOut is an in-out parameter not a return value argument, so by convention
it should come after the context value (Ctx).
2021-06-18 22:12:39 +10:00
Lang Hames
eb023bd323 [ORC] Use uint8_t rather than char for RPC wrapper-function calls.
This partially reverts 838490de7ed, which broke some Solaris bots. Apparently
Solaris defines int8_t as char rather than signed char, which made the
SerializationTypeName<char> specialization a redefinition.

This partial revert isolates use of uint8_t buffers to ORC-RPC handling of
wrapper functions only. The TargetProcessControl::runWrapper method will
continue to use char buffers.
2021-06-18 21:56:09 +10:00
Lang Hames
04c9dcdc5d [ORC] Add support for dumping objects to the C API.
Provides ObjectTransformLayer APIs, a getter to access the
ObjectTransformLayer member of LLJIT, and the DumpObjects utility
to make construction of a dump-to-disk transform easy.

An example showing how the new APIs can be used has been added in
llvm/examples/OrcV2Examples/OrcV2CBindingsDumpObjects.
2021-06-18 20:56:45 +10:00
Johannes Doerfert
4b0523df3a [Attributor] Allow to skip the initial update for a new AA
Users might want to run initialize for a set of AAs without an
intermediate update step. Running update eagerly is not a requirement
anyway so we make it optional.
2021-06-18 01:07:53 -05:00
Johannes Doerfert
c50c80812f [Attributor] Use a centralized value simplification interface
To allow outside AAs that simplify values we need to ensure all value
simplification goes through the Attributor, not AAValueSimplify (or any
of the other AAs we have already like AAPotentialValues). This patch
also introduces an interface for the outside AAs to register
simplification callbacks for an IRPosition. To make this work as
expected we have to pass IRPositions instead of Values in
AAValueSimplify, which makes sense by itself.
2021-06-18 01:07:53 -05:00
Johannes Doerfert
99cca18714 [Attributor] Make sure Heap2Stack works properly on a GPU target
If the target stack is not accessible between different running
"threads" we have to make sure not to create allocas for mallocs
that might be used by multiple "threads". The "use check" is
sufficient to prevent this but if we apply the "free check" we have
to make sure the pointer is not communicated to others before
the free is reached.

Differential Revision: https://reviews.llvm.org/D98608
2021-06-18 01:07:52 -05:00
Johannes Doerfert
1d462a0452 [OpenMP][NFC] Expose AAExecutionDomain and rename its getter
The initial use for AAExecutionDomain was to determine if a single
thread executes a block. While this is sometimes informative most
of the time, and for other reasons, we actually want to know if it
is the "initial thread". Thus, the thread that started execution on
the current device. The deduction needs to be adjusted in a follow
up as the methods we use right not are looking for the OpenMP thread
id which is resets whenever a thread enters a parallel region. What
we basically want is to look for `llvm.nvvm.read.ptx.sreg.ntid.x` and
equivalent functions.
2021-06-18 01:07:52 -05:00
Johannes Doerfert
ffbe2b6b3c [Attributor][NFC] AAReachability is currently stateless, don't invalidate it
We invalidated AAReachabilityImpl directly which is not helpful and
confusing as we still used it regardless. We now avoid invalidating it
(not needed anyway) and add checks for the state. This has by itself no
actual effect but prepares for later extensions.
2021-06-18 01:07:51 -05:00
Heejin Ahn
f7b0205560 [WebAssembly] Rename event to tag
We recently decided to change 'event' to 'tag', and 'event section' to
'tag section', out of the rationale that the section contains a
generalized tag that references a type, which may be used for something
other than exceptions, and the name 'event' can be confusing in the web
context.

See
- https://github.com/WebAssembly/exception-handling/issues/159#issuecomment-857910130
- https://github.com/WebAssembly/exception-handling/pull/161

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D104423
2021-06-17 20:34:19 -07:00
Kuter Dinel
319a05fb4c [FIX][Attributor] Fix broken build due to missing virtual deconstructors.
The lack some virtual deconstructors where causing some builds bots to fail.
This patch fixes that.

Problematic commit:
https://reviews.llvm.org/rGeaf1b6810ce0f40008b2b1d902750eafa3e198d3

Build bot:
https://lab.llvm.org/buildbot/#/builders/18/builds/1741
2021-06-18 07:32:51 +03:00
Kuter Dinel
5e7d306b6b [Attributor] Derive AACallEdges attribute
This attribute computes the optimistic live call edges using the attributor
liveness information. This attribute will be used for deriving a
inter-procedural function reachability attribute.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D104059
2021-06-18 03:29:22 +03:00
Jon Roelofs
921bd72ae7 [GISel] Eliminate redundant bitmasking
This was a GISel vs SDAG regression that showed up at -Os on arm64 in:
SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding.test

https://llvm.godbolt.org/z/aecjodsjG

Differential revision: https://reviews.llvm.org/D103334
2021-06-17 12:53:00 -07:00
Jorge Gorbe Moya
70dce4754b Revert "[NFC] Remove checking pointee type for byval/preallocated type"
This reverts commit 738abfdbea21acd2597d83ad3390daf5696b6d07.
2021-06-17 12:29:23 -07:00
Saleem Abdulrasool
f56e4f6d3d RISCV: adjust handling of relocation emission for RISCV
This re-architects the RISCV relocation handling to bring the
implementation closer in line with the implementation in binutils.  We
would previously aggressively resolve the relocation.  With this
restructuring, we always will emit a paired relocation for any symbolic
difference of the type of S±T[±C] where S and T are labels and C is a
constant.

GAS has a special target hook controlled by `RELOC_EXPANSION_POSSIBLE`
which indicates that a fixup may be expanded into multiple relocations.
This is used by the RISCV backend to always emit a paired relocation -
either ADD[WIDTH] + SUB[WIDTH] for text relocations or SET[WIDTH] +
SUB[WIDTH] for a debug info relocation.  Irrespective of whether linker
relaxation support is enabled, symbolic difference is always emitted as
a paired relocation.

This change also sinks the target specific behaviour down into the
target specific area rather than exposing it to the shared relocation
handling.  In the process, we also sink the "special" handling for debug
information down into the RISCV target.  Although this improves the path
for the other targets, this is not necessarily entirely ideal either.
The changes in the debug info emission could be done through another
type of hook as this functionality would be required by any other target
which wishes to do linker relaxation.  However, as there are no other
targets in LLVM which currently do this, this is a reasonable thing to
do until such time as the code needs to be shared.

Improve the handling of the relocation (and add a reduced test case from
the Linux kernel) to ensure that we handle complex expressions for
symbolic difference.  This ensures that we correct relocate symbols with
the adddends normalized and associated with the addition portion of the
paired relocation.

This change also addresses some review comments from Alex Bradbury about
the relocations meant for use in the DWARF CFA being named incorrectly
(using ADD6 instead of SET6) in the original change which introduced the
relocation type.

This resolves the issues with the symbolic difference emission
sufficiently to enable building the Linux kernel with clang+IAS+lld
(without linker relaxation).

Resolves PR50153, PR50156!
Fixes: ClangBuiltLinux/linux#1023, ClangBuiltLinux/linux#1143

Reviewed By: nickdesaulniers, maskray

Differential Revision: https://reviews.llvm.org/D103539
2021-06-17 08:20:02 -07:00
Stephen Tozer
84970078e4 Reapply "[DebugInfo] Prevent non-determinism when updating DIArgList users of a value"
Reapply the commit which previously caused build failures due to the
mismatched template arguments between the return type and the returned
SmallVector.

This reverts commit e8991caea8690ec2d17b0b7e1c29bf0da6609076.
2021-06-17 16:16:55 +01:00
Guillaume Chatelet
913b337ddc [llvm] fix typo in comment 2021-06-17 14:30:52 +00:00
David Green
751ee64aee [InterleaveAccess] Copy fast math flags when adjusting binary operators in interleave access pass
The Interleave Access pass will convert shuffle(binop(load, load)) to
binop(shuffle(load), shuffle(load)), in order to create more
interleaving load patterns (VLD2/3/4) that might have been messed up by
instcombine. As shown in D104247 we were missing copying IR flags to the
new instruction though, which should just be kept the same as the
original instruction.

Differential Revision: https://reviews.llvm.org/D104255
2021-06-17 09:53:33 +01:00
Bjorn Pettersson
29ffba4b56 Update @llvm.powi to handle different int sizes for the exponent
This can be seen as a follow up to commit 0ee439b705e82a4fe20e2,
that changed the second argument of __powidf2, __powisf2 and
__powitf2 in compiler-rt from si_int to int. That was to align with
how those runtimes are defined in libgcc.
One thing that seem to have been missing in that patch was to make
sure that the rest of LLVM also handle that the argument now depends
on the size of int (not using the si_int machine mode for 32-bit).
When using __builtin_powi for a target with 16-bit int clang crashed.
And when emitting libcalls to those rtlib functions, typically when
lowering @llvm.powi), the backend would always prepare the exponent
argument as an i32 which caused miscompiles when the rtlib was
compiled with 16-bit int.

The solution used here is to use an overloaded type for the second
argument in @llvm.powi. This way clang can use the "correct" type
when lowering __builtin_powi, and then later when emitting the libcall
it is assumed that the type used in @llvm.powi matches the rtlib
function.

One thing that needed some extra attention was that when vectorizing
calls several passes did not support that several arguments could
be overloaded in the intrinsics. This patch allows overload of a
scalar operand by adding hasVectorInstrinsicOverloadedScalarOpd, with
an entry for powi.

Differential Revision: https://reviews.llvm.org/D99439
2021-06-17 09:38:28 +02:00
Lang Hames
9d62b78ea1 [ORC] Switch from uint8_t to char buffers for TargetProcessControl::runWrapper.
This matches WrapperFunctionResult's char buffer, cutting down on the number of
pointer casts needed.
2021-06-17 13:27:09 +10:00
Adrian Prantl
be89f7fc66 Move the definition of LLVM_SUPPORT_XCODE_SIGNPOSTS into llvm-config.h
since it is now used by a public header file (Signposts.h).
This fixes the standalone LLDB build.
2021-06-16 14:40:37 -07:00
Min-Yih Hsu
341aadfd56 [MCA] Anchoring the vtable of CustomBehaviour
Put the dtor of mca::CustomBehaviour into the cpp file to avoid
undefined vtable when linking libLLVMMCACustomBehaviourAMDGPU as shared
library.

Differential Revision: https://reviews.llvm.org/D104401
2021-06-16 12:43:58 -07:00
Hongtao Yu
45a66978f9 [CSSPGO] Report zero-count probe in profile instead of dangling probes.
Previously dangling samples were represented by INT64_MAX in sample profile while probes never executed were not reported. This was based on an observation that dangling probes were only at a smaller portion than zero-count probes. However, with compiler optimizations, dangling probes end up becoming at large portion of all probes in general and reporting them does not make sense from profile size point of view. This change flips sample reporting by reporting zero-count probes instead. This enabled dangling probe to be represented by none (missing entry in profile). This has a couple benefits:

1. Reducing sample profile size in optimize mode, even when the number of non-executed probes outperform the number of dangling probes, since INT64_MAX takes more space over 0 to encode.

2. Binary size savings. No need to encode dangling probe anymore, since missing probes are treated as dangling in the profile reader.

3. Reducing compiler work to track dangling probes. However, for probes that are real dead and removed, we still need the compiler to identify them so that they can be reported as zero-count, instead of mistreated as dangling probes.

4. Improving counts quality by respecting the counts already collected on the non-dangling copy of a probe. A probe, when duplicated, gets two copies at runtime. If one of them is dangling while the other is not, merging the two probes at profile generation time will cause the real samples collected on the non-dangling one to be discarded. Not reporting the dangling counterpart will keep the real samples.

5. Better readability.

6. Be consistent with non-CS dwarf line number based profile. Zero counts are trusted by the compiler counts inferencer while missing counts will be inferred by the compiler.

Note that the current patch does include any work for #3. There will be follow-up changes.

For #1, I've seen for a large Facebook service, the text profile is reduced by 7%. For extbinary profile, the size of  LBRProfileSection is reduced by 35%.

For #4, I have seen general counts quality for SPEC2017 is improved by 10%.

Reviewed By: wenlei, wlei, wmi

Differential Revision: https://reviews.llvm.org/D104129
2021-06-16 11:45:29 -07:00
Sushma Unnibhavi
900cbf02d0 [M68k][GloballSel] Adding initial GlobalISel infrastructure
Wiring up GlobalISel for the M68k backend

Differential Revision: https://reviews.llvm.org/D101819
2021-06-16 10:48:38 -06:00
Patrick Holland
449e2cbd5e Reapply "[MCA] Adding the CustomBehaviour class to llvm-mca".
The original change was pushed in main as commit f7a23ecece52.
It was then reverted by commit a04f01bab2 because it caused linker failures
on buildbots that don't build the AMDGPU target.

--

Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.

More details are available in the original commit log message (f7a23ecece52).

Differential Revision: https://reviews.llvm.org/D104149
2021-06-16 16:54:48 +01:00
David Spickett
0a8120a8f5 [llvm][AArch64] Handle arrays of struct properly (from IR)
This only applies to FastIsel. GlobalIsel seems to sidestep
the issue.

This fixes https://bugs.llvm.org/show_bug.cgi?id=46996

One of the things we do in llvm is decide if a type needs
consecutive registers. Previously, we just checked if it
was an array or not.
(plus an SVE specific check that is not changing here)

This causes some confusion when you arbitrary IR like:
```
%T1 = type { double, i1 };
define [ 1 x %T1 ] @foo() {
entry:
  ret [ 1 x %T1 ] zeroinitializer
}
```

We see it is an array so we call CC_AArch64_Custom_Block
which bails out when it sees the i1, a type we don't want
to put into a block.

This leaves the location of the double in some kind of
intermediate state and leads to odd codegen. Which then crashes
the backend because it doesn't know how to implement
what it's been asked for.

You get this:
```
  renamable $d0 = FMOVD0
  $w0 = COPY killed renamable $d0
```

Rather than this:
```
  $d0 = FMOVD0
  $w0 = COPY $wzr
```

The backend knows how to copy 64 bit to 64 bit registers,
but not 64 to 32. It can certainly be taught how but the real
issue seems to be us even trying to assign a register block
in the first place.

This change makes the logic of
AArch64TargetLowering::functionArgumentNeedsConsecutiveRegisters
a bit more in depth. If we find an array, also check that all the
nested aggregates in that array have a single member type.

Then CC_AArch64_Custom_Block's assumption of a type that looks
like [ N x type ] will be valid and we get the expected codegen.

New tests have been added to exercise these situations. Note that
some of the output is not ABI compliant. The aim of this change is
to simply handle these situations and not to make our processing
of arbitrary IR ABI compliant.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D104123
2021-06-16 13:56:01 +00:00
Andrea Di Biagio
a8b232ce81 [MCA][InstrBuilder] Always check for implicit uses of resource units (PR50725).
When instructions are issued to the underlying pipeline resources, the
mca::ResourceManager should also check for the presence of extra uses induced by
the explicit consumption of multiple partially overlapping group resources.

Fixes PR50725
2021-06-16 14:51:12 +01:00
James Henderson
a516a14043 [yaml2obj][obj2yaml] Support custom ELF section header string table name
This patch adds support for a new field in the FileHeader, which states
the name to use for the section header string table. This also allows
combining the string table with another string table in the object, e.g.
the symbol name string table. The field is optional. By default,
.shstrtab will continue to be used.

This partially fixes https://bugs.llvm.org/show_bug.cgi?id=50506.

Reviewed by: Higuoxing

Differential Revision: https://reviews.llvm.org/D104035
2021-06-16 10:02:23 +01:00
Lang Hames
5fecb17e96 [ORC] Switch to WrapperFunction utility for calls to registration functions.
Addresses FIXMEs in TPC-based EH-frame and debug object registration code by
replacing manual argument serialization with WrapperFunction utility calls.
2021-06-16 18:05:58 +10:00
Rong Xu
022ca8be28 [SampleFDO] Place the discriminator flag variable into the used list.
We create flag variable "__llvm_fs_discriminator__" in the binary
to indicate that FSAFDO hierarchical discriminators are used.

This variable might be GC'ed by the linker since it is not explicitly
reference. I initially added the var to the use list in pass
MIRFSDiscriminator but it did not work. It turned out the used global
list is collected in lowering (before MIR pass) and then emitted in
the end of pass pipeline.

Here I add the variable to the use list in IR level's AddDiscriminators
pass. The machine level code is still keep in the case IR's
AddDiscriminators is not invoked. If this is the case, this just use
-Wl,--export-dynamic-symbol=__llvm_fs_discriminator__
to force the emit.

Differential Revision: https://reviews.llvm.org/D103988
2021-06-15 21:51:04 -07:00
Andrea Di Biagio
f1dc7da2e3 Revert "[MCA] Adding the CustomBehaviour class to llvm-mca"
This reverts commit f7a23ecece524564a0c3e09787142cc6061027bb.

It appears to breaks buildbots that don't build the AMDGPU backend.
2021-06-15 21:41:36 +01:00
Patrick Holland
e52d4f2208 [MCA] Adding the CustomBehaviour class to llvm-mca
Some instructions are not defined well enough within the target’s scheduling
model for llvm-mca to be able to properly simulate its behaviour. The ideal
solution to this situation is to modify the scheduling model, but that’s not
always a viable strategy. Maybe other parts of the backend depend on that
instruction being modelled the way that it is. Or maybe the instruction is quite
complex and it’s difficult to fully capture its behaviour with tablegen. The
CustomBehaviour class (which I will refer to as CB frequently) is designed to
provide intuitive scaffolding for developers to implement the correct modelling
for these instructions.

Implementation details:

llvm-mca does its best to extract relevant register, resource, and memory
information from every MCInst when lowering them to an mca::Instruction. It then
uses this information to detect dependencies and simulate stalls within the
pipeline. For some instructions, the information that gets captured within the
mca::Instruction is not enough for mca to simulate them properly. In these
cases, there are two main possibilities:

1. The instruction has a dependency that isn’t detected by mca.
2. mca is incorrectly enforcing a dependency that shouldn’t exist.

For the rest of this discussion, I will be focusing on (1), but I have put some
thought into (2) and I may revisit it in the future.

So we have an instruction that has dependencies that aren’t picked up by mca.
The basic idea for both pipelines in mca is that when an instruction wants to be
dispatched, we first check for register hazards and then we check for resource
hazards. This is where CB is injected. If no register or resource hazards have
been detected, we make a call to CustomBehaviour::checkCustomHazard() to give
the target specific CB the chance to detect and enforce any custom dependencies.

The return value for checkCustomHazaard() is an unsigned int representing the
(minimum) number of cycles that the instruction needs to stall for. It’s fine to
underestimate this value because when StallCycles gets down to 0, we’ll end up
checking for all the hazards again before the instruction is actually
dispatched. However, it’s important not to overestimate the value and the more
accurate your estimate is, the more efficient mca’s execution can be.

In general, for checkCustomHazard() to be able to detect these custom
dependencies, it needs information about the current instruction and also all of
the instructions that are still executing within the pipeline. The mca pipeline
uses mca::Instruction rather than MCInst and the current information encoded
within each mca::Instruction isn’t sufficient for my use cases. I had to add a
few extra attributes to the mca::Instruction class and have them get set by the
MCInst during instruction building. For example, the current mca::Instruction
doesn’t know its opcode, and it also doesn’t know anything about its immediate
operands (both of which I had to add to the class).

With information about the current instruction, a list of all currently
executing instructions, and some target specific objects (MCSubtargetInfo and
MCInstrInfo which the base CB class has references to), developers should be
able to detect and enforce most custom dependencies within checkCustomHazard. If
you need more information than is present in the mca::Instruction, feel free to
add attributes to that class and have them set during the lowering sequence from
MCInst.

Fortunately, in the in-order pipeline, it’s very convenient for us to pass these
arguments to checkCustomHazard. The hazard checking is taken care of within
InOrderIssueStage::canExecute(). This function takes a const InstRef as a
parameter (representing the instruction that currently wants to be dispatched)
and the InOrderIssueStage class maintains a SmallVector<InstRef, 4> which holds
all of the currently executing instructions. For the out-of-order pipeline, it’s
a bit trickier to get the list of executing instructions and this is why I have
held off on implementing it myself. This is the main topic I will bring up when
I eventually make a post to discuss and ask for feedback.

CB is a base class where targets implement their own derived classes. If a
target specific CB does not exist (or we pass in the -disable-cb flag), the base
class is used. This base class trivially returns 0 from its checkCustomHazard()
implementation (meaning that the current instruction needs to stall for 0 cycles
aka no hazard is detected). For this reason, targets or users who choose not to
use CB shouldn’t see any negative impacts to accuracy or performance (in
comparison to pre-patch llvm-mca).

Differential Revision: https://reviews.llvm.org/D104149
2021-06-15 21:30:48 +01:00
Vitaly Buka
5f8c02be72 [NFC] Fix "unused variable" warning 2021-06-15 12:59:05 -07:00
Jinsong Ji
344955e439 [NFC] Update renamed option in comments
c98ebda325c996b3a12f4fded0368734dc0fe28a Rename fp-op fusion option (yet
again) for compatibility with GCC option.

The comment in the header should be updated too to avoid confusion.
2021-06-15 19:44:31 +00:00
Duncan P. N. Exon Smith
99e33d3043 Support: Remove F_{None,Text,Append} compatibility synonyms, NFC
Remove the compatibility spellings of `OF_{None,Text,Append}` that
were left behind by 1f67a3cba9b09636c56e2109d8a35ae96dc15782.

No functionality change here, just an API cleanup.

Differential Revision: https://reviews.llvm.org/D101506
2021-06-15 12:04:09 -07:00
Roman Lebedev
10ca53ce65 [NewPM] Remove SpeculateAroundPHIs pass
Addition of this pass has been botched.
There is no particular reason why it had to be sold as an inseparable part
of new-pm transition. It was added when old-pm was still the default,
and very *very* few users were actually tracking new-pm,
so it's effects weren't measured.

Which means, some of the turnoil of the new-pm transition
are actually likely regressions due to this pass.

Likewise, there has been a number of post-commit feedback
(post new-pm switch), namely
* https://reviews.llvm.org/D37467#2787157 (regresses HW-loops)
* https://reviews.llvm.org/D37467#2787259 (should not be in middle-end, should run after LSR, not before)
* https://reviews.llvm.org/D95789 (an attempt to fix bad loop backedge metadata)
and in the half year past, the pass authors (google) still haven't found time to respond to any of that.

Hereby it is proposed to backout the pass from the pipeline,
until someone who cares about it can address the issues reported,
and properly start the process of adding a new pass into the pipeline,
with proper performance evaluation.

Furthermore, neither google nor facebook reports any perf changes
from this change, so i'm dropping the pass completely.
It can always be re-reverted should/if anyone want to pick it up again.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D104099
2021-06-15 20:35:55 +03:00
Lang Hames
792e37b227 [ORC] Fix missing std::move. 2021-06-15 21:42:58 +10:00
Lang Hames
0c8cd2ec6f [ORC] Fix narrowing-in-initializer-list warnings. 2021-06-15 21:39:16 +10:00
Lang Hames
8b99b0a62e [ORC] Fix missing function in unit test. 2021-06-15 21:39:00 +10:00
Lang Hames
8d05185ee9 [ORC] Make WrapperFunctionResult's ValuePtr member non-const.
The const qualifier was a hangover from an earlier iteration that allowed
wrapper functions to return pointers to const memory. This feature has
been removed, so there's no reason for this to be const any more, and
removing it eliminates const-cast warnings.
2021-06-15 21:24:12 +10:00
Lang Hames
e11b1aca83 [ORC] Port WrapperFunctionUtils and SimplePackedSerialization from ORC runtime.
Replace the existing WrapperFunctionResult type in
llvm/include/ExecutionEngine/Orc/Shared/TargetProcessControlTypes.h with a
version adapted from the ORC runtime's implementation.

Also introduce the SimplePackedSerialization scheme (also adapted from the ORC
runtime's implementation) for wrapper functions to avoid manual serialization
and deserialization for calls to runtime functions involving common types.
2021-06-15 21:13:57 +10:00
Neil Henning
8521aa2a65 ABI breaking changes fixes.
This commit mostly just replaces bad uses of `NDEBUG` with uses of
`LLVM_ENABLE_ABI_BREAKING_CHANGES` - the safe way to include ABI
breaking changes (normally extra struct elements in headers).

Differential Revision: https://reviews.llvm.org/D104216
2021-06-15 11:08:13 +01:00
Jay Foad
c3a38401e0 [IR] Remove forward declaration of GraphTraits from Type.h
This has been unnecessary since r352353 removed GraphTraits
specializations for Type, except that a couple of other headers were
accidentally relying on this declaration.

Differential Revision: https://reviews.llvm.org/D104119
2021-06-15 09:23:45 +01:00
Adrian Prantl
dfb1691713 Allow signposts to take advantage of deferred string substitution
One nice feature of the os_signpost API is that format string
substitutions happen in the consumer, not the logging
application. LLVM's current Signpost class doesn't take advantage of
this though and instead always uses a static "Begin/End %s" format
string.

This patch uses variadic macros to allow the API to be used as
intended. Unfortunately, the primary use-case I had in mind (the
LLDB_SCOPED_TIMER() macro) does not get much better from this, because
__PRETTY_FUNCTION__ is *not* a macro, but a static string, so
signposts created by LLDB_SCOPED_TIMER() still use a static "%s"
format string. At least LLDB_SCOPED_TIMERF() works as intended.

This reapplies the previously reverted patch with additional include
order fixes for non-modular builds of LLDB.

Differential Revision: https://reviews.llvm.org/D103575
2021-06-14 16:53:41 -07:00
Adrian Prantl
299552b38b Revert "Allow signposts to take advantage of deferred string substitution"
This reverts commit 03841edde7eee21d1d450041ab9a113a7e1be869.

Unfortunately this still breaks the LLDB standalone bot.
2021-06-14 16:09:04 -07:00
Adrian Prantl
770268a3e9 Allow signposts to take advantage of deferred string substitution
One nice feature of the os_signpost API is that format string
substitutions happen in the consumer, not the logging
application. LLVM's current Signpost class doesn't take advantage of
this though and instead always uses a static "Begin/End %s" format
string.

This patch uses variadic macros to allow the API to be used as
intended. Unfortunately, the primary use-case I had in mind (the
LLDB_SCOPED_TIMER() macro) does not get much better from this, because
__PRETTY_FUNCTION__ is *not* a macro, but a static string, so
signposts created by LLDB_SCOPED_TIMER() still use a static "%s"
format string. At least LLDB_SCOPED_TIMERF() works as intended.

This reapplies the previsously reverted patch with additional MachO.h
macro #undefs.

Differential Revision: https://reviews.llvm.org/D103575
2021-06-14 14:19:41 -07:00
wlei
c4ed78c10b [CSSPGO] Aggregation by the last K context frames for cold profiles
This change provides the option to merge and aggregate cold context by the last k frames instead of context-less name. By default K = 1 means the context-less one.

This is for better perf tuning. The more selective merging and trimming will rely on llvm-profgen's preinliner.

Reviewed By: wenlei, hoy

Differential Revision: https://reviews.llvm.org/D104131
2021-06-14 10:33:43 -07:00
zhijian
ad7e1ecf68 [AIX][XCOFF] emit vector info of traceback table.
Summary:

emit vector info of traceback table.

Reviewers: Jason Liu,Hubert Tong
Differential Revision: https://reviews.llvm.org/D93659
2021-06-14 11:15:22 -04:00
Florian Hahn
bc6a656349 [ADT] Use unnamed argument for unused arg in StringMapEntryStorage.
This silences an 'unsused argument' warning.

Similar to c2006f857d80f54b90ed7d911d3e7acf4f46001b.
2021-06-14 15:54:57 +01:00
Jeroen Dobbelaere
c08eaddde6 Intrinsic::getName: require a Module argument
Ensure that we provide a `Module` when checking if a rename of an intrinsic is necessary.

This fixes the issue that was detected by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=32288
(as mentioned by @fhahn), after committing D91250.

Note that the `LLVMIntrinsicCopyOverloadedName` is being deprecated in favor of `LLVMIntrinsicCopyOverloadedName2`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D99173
2021-06-14 14:52:29 +02:00
Guillaume Chatelet
9aa4a5f77d [llvm] remove Sequence::asSmallVector()
There's no need for `toSmallVector()` as `SmallVector.h` already provides a `to_vector` free function that takes a range.

Reviewed By: Quuxplusone

Differential Revision: https://reviews.llvm.org/D104024
2021-06-14 08:28:05 +00:00
Simon Moll
91d4645488 [VP] Binary floating-point intrinsics.
This patch implements vector-predicated intrinsics on IR level for fadd,
fsub, fmul, fdiv and frem.  There operate in the default floating-point
environment. We will use constrained fp operand bundles for constrained
vector-predicated fp math (D93455).

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D93470
2021-06-14 08:51:41 +02:00
Xuanda Yang
a66f237758 [LLParser] Remove outdated deplibs
The comment mentions deplibs should be removed in 4.0. Removing it in this patch.

Reviewed By: compnerd, dexonsmith, lattner

Differential Revision: https://reviews.llvm.org/D102763
2021-06-14 12:46:12 +08:00
RamNalamothu
a2306da6e0 Implement DW_CFA_LLVM_* for Heterogeneous Debugging
Add support in MC/MIR for writing/parsing, and DebugInfo.

This is part of the Extensions for Heterogeneous Debugging defined at
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html

Specifically the CFI instructions implemented here are defined at
https://llvm.org/docs/AMDGPUDwarfExtensionsForHeterogeneousDebugging.html#cfa-definition-instructions

Reviewed By: clayborg

Differential Revision: https://reviews.llvm.org/D76877
2021-06-14 08:51:50 +05:30
Simon Pilgrim
0b64dd4442 RawError.h - remove unused <string> include. NFCI. 2021-06-13 17:32:57 +01:00
Simon Pilgrim
836026294d DIPrinter.h - tidy implicit header dependencies. NFCI.
We don't use <string> but we do use std::unique_ptr (<memory>) and llvm::Optional<>
2021-06-13 17:00:15 +01:00
Simon Pilgrim
fdadecc8f8 ProfiledCallGraph.h - remove unused <string> include. NFCI. 2021-06-13 15:19:25 +01:00
Florian Hahn
a3f4e168f5 Revert "Allow signposts to take advantage of deferred string substitution"
This reverts commit 4fc93a3a1f95ef5a0a57750fc621f2411ea445a8 because it
breaks LLDB builds on certain macOS platform & SDK combinations, e.g.
http://green.lab.llvm.org/green/job/lldb-cmake-standalone/3288/consoleFull#-195476041949ba4694-19c4-4d7e-bec5-911270d8a58c
2021-06-12 12:08:25 +01:00
Florian Hahn
f2662d35c8 Revert "[X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB"
This reverts commit 1b748faf2bae246e2fc77d88420df13c2e60f4df because it
breaks building the llvm-test-suite with -verify-machineinstrs on X86:
http://green.lab.llvm.org/green/job/test-suite-verify-machineinstrs-x86_64-O3/9585/

Running llc -verify-machineinstr on X86 crashes on the IR below:

    target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"

    %struct.widget = type { i32, i32, i32, i32, i32*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [16 x [16 x i16]], [6 x [32 x i32]], [16 x [16 x i32]], [4 x [12 x [4 x [4 x i32]]]], [16 x i32], i8**, i32*, i32***, i32**, i32, i32, i32, i32, %struct.baz*, %struct.wobble.1*, i32, i32, i32, i32, i32, i32, %struct.quux.2*, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x i32], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32***, i32***, i32****, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, [3 x [2 x i32]], [3 x [2 x i32]], i32, i32, i64, i64, %struct.zot.3, %struct.zot.3, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.baz = type { i32, i32, i32, i32, i32, i32, i32, i32, i32, %struct.snork*, %struct.wombat.0*, %struct.wobble*, i32, i32*, i32*, i32*, i32, i32*, i32*, i32*, i32 (%struct.widget*, %struct.eggs*)*, i32, i32, i32, i32 }
    %struct.snork = type { %struct.spam*, %struct.zot, i32 (%struct.wombat*, %struct.widget*, %struct.snork*)* }
    %struct.spam = type { i32, i32, i32, i32, i8*, i32 }
    %struct.zot = type { i32, i32, i32, i32, i32, i8*, i32* }
    %struct.wombat = type { i32, i32, i32, i32, i32, i32, i32, i32, void (i32, i32, i32*, i32*)*, void (%struct.wombat*, %struct.widget*, %struct.zot*)* }
    %struct.wombat.0 = type { [4 x [11 x %struct.quux]], [2 x [9 x %struct.quux]], [2 x [10 x %struct.quux]], [2 x [6 x %struct.quux]], [4 x %struct.quux], [4 x %struct.quux], [3 x %struct.quux] }
    %struct.quux = type { i16, i8 }
    %struct.wobble = type { [2 x %struct.quux], [4 x %struct.quux], [3 x [4 x %struct.quux]], [10 x [4 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [5 x %struct.quux]], [10 x [15 x %struct.quux]], [10 x [15 x %struct.quux]] }
    %struct.eggs = type { [1000 x i8], [1000 x i8], [1000 x i8], i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.wobble.1 = type { i32, [2 x i32], i32, i32, %struct.wobble.1*, %struct.wobble.1*, i32, [2 x [4 x [4 x [2 x i32]]]], i32, i64, i64, i32, i32, [4 x i8], [4 x i8], i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32 }
    %struct.quux.2 = type { i32, i32, i32, i32, i32, %struct.quux.2* }
    %struct.zot.3 = type { i64, i16, i16, i16 }

    define void @blam(%struct.widget* %arg, i32 %arg1) local_unnamed_addr {
    bb:
      %tmp = load i32, i32* undef, align 4
      %tmp2 = sdiv i32 %tmp, 6
      %tmp3 = sdiv i32 undef, 6
      %tmp4 = load i32, i32* undef, align 4
      %tmp5 = icmp eq i32 %tmp4, 4
      %tmp6 = select i1 %tmp5, i32 %tmp3, i32 %tmp2
      %tmp7 = getelementptr inbounds [4 x [4 x i32]], [4 x [4 x i32]]* undef, i64 0, i64 0, i64 0
      %tmp8 = zext i16 undef to i32
      %tmp9 = zext i16 undef to i32
      %tmp10 = load i16, i16* undef, align 2
      %tmp11 = zext i16 %tmp10 to i32
      %tmp12 = zext i16 undef to i32
      %tmp13 = zext i16 undef to i32
      %tmp14 = zext i16 undef to i32
      %tmp15 = load i16, i16* undef, align 2
      %tmp16 = zext i16 %tmp15 to i32
      %tmp17 = zext i16 undef to i32
      %tmp18 = sub nsw i32 %tmp8, %tmp9
      %tmp19 = shl nsw i32 undef, 1
      %tmp20 = add nsw i32 %tmp19, %tmp18
      %tmp21 = sub nsw i32 %tmp11, %tmp12
      %tmp22 = shl nsw i32 undef, 1
      %tmp23 = add nsw i32 %tmp22, %tmp21
      %tmp24 = sub nsw i32 %tmp13, %tmp14
      %tmp25 = shl nsw i32 undef, 1
      %tmp26 = add nsw i32 %tmp25, %tmp24
      %tmp27 = sub nsw i32 %tmp16, %tmp17
      %tmp28 = shl nsw i32 undef, 1
      %tmp29 = add nsw i32 %tmp28, %tmp27
      %tmp30 = sub nsw i32 %tmp20, %tmp29
      %tmp31 = sub nsw i32 %tmp23, %tmp26
      %tmp32 = shl nsw i32 %tmp30, 1
      %tmp33 = add nsw i32 %tmp32, %tmp31
      store i32 %tmp33, i32* undef, align 4
      %tmp34 = mul nsw i32 %tmp31, -2
      %tmp35 = add nsw i32 %tmp34, %tmp30
      store i32 %tmp35, i32* undef, align 4
      %tmp36 = select i1 %tmp5, i32 undef, i32 undef
      br label %bb37

    bb37:                                             ; preds = %bb
      %tmp38 = load i32, i32* undef, align 4
      %tmp39 = ashr i32 %tmp38, %tmp6
      %tmp40 = load i32, i32* undef, align 4
      %tmp41 = sdiv i32 %tmp39, %tmp40
      store i32 %tmp41, i32* undef, align 4
      ret void
    }
2021-06-12 11:41:38 +01:00
spupyrev
09c0e58fa9 A post-processing for BFI inference
The current implementation for computing relative block frequencies does
not handle correctly control-flow graphs containing irreducible loops. This
results in suboptimally generated binaries, whose perf can be up to 5%
worse than optimal.

To resolve the problem, we apply a post-processing step, which iteratively
updates block frequencies based on the frequencies of their predesessors.
This corresponds to finding the stationary point of the Markov chain by
an iterative method aka "PageRank computation". The algorithm takes at
most O(|E| * IterativeBFIMaxIterations) steps but typically converges faster.

It is turned on by passing option `use-iterative-bfi-inference`
and applied only for functions containing profile data and irreducible loops.

Tested on SPEC06/17, where it is helping to get correct profile counts for one of
the binaries (403.gcc). In prod binaries, we've seen a speedup of up to 2%-5%
for binaries containing functions with hot irreducible loops.

Reviewed By: hoy, wenlei, davidxl

Differential Revision: https://reviews.llvm.org/D103289
2021-06-11 21:46:04 -07:00
Adrian Prantl
5f69f29b5c Allow signposts to take advantage of deferred string substitution
One nice feature of the os_signpost API is that format string
substitutions happen in the consumer, not the logging
application. LLVM's current Signpost class doesn't take advantage of
this though and instead always uses a static "Begin/End %s" format
string.

This patch uses variadic macros to allow the API to be used as
intended. Unfortunately, the primary use-case I had in mind (the
LLDB_SCOPED_TIMER() macro) does not get much better from this, because
__PRETTY_FUNCTION__ is *not* a macro, but a static string, so
signposts created by LLDB_SCOPED_TIMER() still use a static "%s"
format string. At least LLDB_SCOPED_TIMERF() works as intended.

This reapplies the previsously reverted patch with support for
platforms where signposts are unavailable.

Differential Revision: https://reviews.llvm.org/D103575
2021-06-11 16:52:34 -07:00
Adrian Prantl
5ed934f9e8 Revert "Allow signposts to take advantage of deferred string substitution"
I forgot to make the LLDB macro conditional on Linux.

This reverts commit 541ccd1c1bb23e1e20a382844b35312c0caffd79.
2021-06-11 16:46:34 -07:00
Andrew Litteken
77b6ee14d4 [IRSim] Strip out the findSimilarity call from the constructor
Both doInitialize and runOnModule were running the entire analysis
due to the actual work being done in the constructor. Strip it out here
and only get the similarity during runOnModule.

Author: lanza
Reviewers: AndrewLitteken, paquette, plofti

Differential Revision: https://reviews.llvm.org/D92524
2021-06-11 18:41:28 -05:00
Adrian Prantl
111b1fef7a Allow signposts to take advantage of deferred string substitution
One nice feature of the os_signpost API is that format string
substitutions happen in the consumer, not the logging
application. LLVM's current Signpost class doesn't take advantage of
this though and instead always uses a static "Begin/End %s" format
string.

This patch uses variadic macros to allow the API to be used as
intended. Unfortunately, the primary use-case I had in mind (the
LLDB_SCOPED_TIMER() macro) does not get much better from this, because
__PRETTY_FUNCTION__ is *not* a macro, but a static string, so
signposts created by LLDB_SCOPED_TIMER() still use a static "%s"
format string. At least LLDB_SCOPED_TIMERF() works as intended.

Differential Revision: https://reviews.llvm.org/D103575
2021-06-11 16:35:43 -07:00
Daniil Fukalov
3d8b8a6451 [NFC][CostModel] Fixed comment that comparisons work regardless of the state.
Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D104068
2021-06-11 23:48:49 +03:00
Kevin Athey
cd621c234a [clang-cl][sanitizer] Add -fsanitize-address-use-after-return to clang.
Also:
  - add driver test (fsanitize-use-after-return.c)
  - add basic IR test (asan-use-after-return.cpp)
  - (NFC) cleaned up logic for generating table of __asan_stack_malloc
    depending on flag.

for issue: https://github.com/google/sanitizers/issues/1394

Reviewed By: vitalybuka

Differential Revision: https://reviews.llvm.org/D104076
2021-06-11 12:07:35 -07:00
Matt Arsenault
f133d28419 CodeGen: Fix missing const 2021-06-11 13:45:24 -04:00
Simon Pilgrim
b871c5c8a1 APInt.h - add missing <utility> header.
Some buildbots are complaining about std::move() after rG61cdaf66fe22be2b5942ddee4f46a998b4f3ee29
2021-06-11 13:35:12 +01:00
Simon Pilgrim
165132af1b [ADT] Remove APInt/APSInt toString() std::string variants
<string> is currently the highest impact header in a clang+llvm build:

https://commondatastorage.googleapis.com/chromium-browser-clang/llvm-include-analysis.html

One of the most common places this is being included is the APInt.h header, which needs it for an old toString() implementation that returns std::string - an inefficient method compared to the SmallString versions that it actually wraps.

This patch replaces these APInt/APSInt methods with a pair of llvm::toString() helpers inside StringExtras.h, adjusts users accordingly and removes the <string> from APInt.h - I was hoping that more of these users could be converted to use the SmallString methods, but it appears that most end up creating a std::string anyhow. I avoided trying to use the raw_ostream << operators as well as I didn't want to lose having the integer radix explicit in the code.

Differential Revision: https://reviews.llvm.org/D103888
2021-06-11 13:19:15 +01:00
Fraser Cormack
13db9330f8 [VP][NFC] Format comment to 80 columns 2021-06-11 12:53:48 +01:00
Bing1 Yu
c7250ce1db [X86] Support __tile_stream_loadd intrinsic for new AMX interface
Adding support for __tile_stream_loadd intrinsic.

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D103784
2021-06-11 17:28:43 +08:00
Simon Pilgrim
5c5e621290 SampleProf.h - fix spelling mistake in assert message. NFC. 2021-06-11 10:24:14 +01:00
Simon Pilgrim
cac15052d4 [Analysis] Pass RecurrenceDescriptor as const reference. NFCI.
We were passing the RecurrenceDescriptor by value to most of the reduction analysis methods, despite it being rather bulky with TrackingVH members (that can be costly to copy). In all these cases we're only using the RecurrenceDescriptor for rather basic purposes (access to types/kinds etc.).

Differential Revision: https://reviews.llvm.org/D104029
2021-06-11 10:24:14 +01:00
Sjoerd Meijer
6a49dbd1a3 Function Specialization Pass
This adds a function specialization pass to LLVM. Constant parameters
like function pointers and constant globals are propagated to the callee by
specializing the function.

This is a first version with a number of limitations:
- The pass is off by default, so needs to be enabled on the command line,
- It does not handle specialization of recursive functions,
- It does not yet handle constants and constant ranges,
- Only 1 argument per function is specialised,
- The cost-model could be further looked into, and perhaps related,
- We are not yet caching analysis results.

This is based on earlier work by Matthew Simpson (D36432) and Vinay Madhusudan.
More recently this was also discussed on the list, see:

https://lists.llvm.org/pipermail/llvm-dev/2021-March/149380.html.

The motivation for this work is that function specialisation often comes up as
a reason for performance differences of generated code between LLVM and GCC,
which has this enabled by default from optimisation level -O3 and up. And while
this certainly helps a few cpu benchmark cases, this also triggers in real
world codes and is thus a generally useful transformation to have in LLVM.

Function specialisation has great potential to increase compile-times and
code-size.  The summary from some investigations with this patch is:
- Compile-time increases for short compile jobs is high relatively, but the
  increase in absolute numbers still low.
- For longer compile-jobs, the extra compile time is around 1%, and very much
  in line with GCC.
- It is difficult to blame one thing for compile-time increases: it looks like
  everywhere a little bit more time is spent processing more functions and
  instructions.
- But the function specialisation pass itself is not very expensive; it doesn't
  show up very high in the profile of the optimisation passes.

The goal of this work is to reach parity with GCC which means that eventually
we would like to get this enabled by default. But first we would like to address
some of the limitations before that.

Differential Revision: https://reviews.llvm.org/D93838
2021-06-11 09:11:29 +01:00
Carl Ritson
15d8bd80ce [ValueTypes] Define MVTs for v6i32, v6f32, v7i32, v7f32
For use in AMDGPU selection DAG.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103881
2021-06-11 08:58:16 +09:00
Nick Desaulniers
e9e1661fa2 [IR] make -warn-frame-size into a module attr
-Wframe-larger-than= is an interesting warning; we can't know the frame
size until PrologueEpilogueInsertion (PEI); very late in the compilation
pipeline.

-Wframe-larger-than= was propagated through CC1 as an -mllvm flag, then
was a cl::opt in LLVM's PEI pass; this meant it was dropped during LTO
and needed to be re-specified via -plugin-opt.

Instead, make it part of the IR proper as a module level attribute,
similar to D103048. Introduce -fwarn-stack-size CC1 option.

Reviewed By: rsmith, qcolombet

Differential Revision: https://reviews.llvm.org/D103928
2021-06-10 16:15:27 -07:00
Jessica Paquette
2a84a282fe [AArch64][GlobalISel] Legalize scalar G_CTTZ + G_CTTZ_ZERO_UNDEF
This adds legalization for scalar G_CTTZ and G_CTTZ_ZERO_UNDEF. Vector support
requires handling vector G_BITREVERSE, which I haven't gotten around to yet.

For G_CTTZ_ZERO_UNDEF, we just lower it to G_CTTZ.

For G_CTTZ, we match SelectionDAG's lowering to a G_BITREVERSE + G_CTLZ.

e.g. https://godbolt.org/z/nPEseYh1s

(With this patch, we have slightly worse codegen than SDAG for types smaller
than s32; it seems like we're missing a combine.)

Also, this adds in a function to build G_BITREVERSE to MachineIRBuilder.

Differential Revision: https://reviews.llvm.org/D104065
2021-06-10 15:29:51 -07:00
Joachim Meyer
6b73f118b0 [LV] Parallel annotated loop does not imply all loads can be hoisted.
As noted in https://bugs.llvm.org/show_bug.cgi?id=46666, the current behavior of assuming if-conversion safety if a loop is annotated parallel (`!llvm.loop.parallel_accesses`), is not expectable, the documentation for this behavior was since removed from the LangRef again, and can lead to invalid reads.
This was observed in POCL (https://github.com/pocl/pocl/issues/757) and would require similar workarounds in current work at hipSYCL.

The question remains why this was initially added and what the implications of removing this optimization would be.
Do we need an alternative mechanism to propagate the information about legality of if-conversion?
Or is the idea that conditional loads in `#pragma clang loop vectorize(assume_safety)` can be executed unmasked without additional checks flawed in general?
I think this implication is not part of what a user of that pragma (and corresponding metadata) would expect and thus dangerous.

Only two additional tests failed, which are adapted in this patch. Depending on the further direction force-ifcvt.ll should be removed or further adapted.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103907
2021-06-10 23:37:57 +02:00
Philip Reames
f2dafac922 [LI] Add a cover function for checking if a loop is mustprogress [nfc]
Essentially, the cover function simply combines the loop level check and the function level scope into one call.  This simplifies several callers and is (subjectively) less error prone.
2021-06-10 13:37:32 -07:00
Philip Reames
52d05589ca Move code for checking loop metadata into Analysis [nfc]
I need the mustprogress loop metadata in ScalarEvolution and it makes sense to keep all the accessors for quering loop metadate together.
2021-06-10 13:01:22 -07:00
Michael Kruse
4460a4c76f [OpenMP] Implement '#pragma omp unroll'.
Implementation of the unroll directive introduced in OpenMP 5.1. Follows the approach from D76342 for the tile directive (i.e. AST-based, not using the OpenMPIRBuilder). Tries to use `llvm.loop.unroll.*` metadata where possible, but has to fall back to an AST representation of the outer loop if the partially unrolled generated loop is associated with another directive (because it needs to compute the number of iterations).

Reviewed By: ABataev

Differential Revision: https://reviews.llvm.org/D99459
2021-06-10 14:30:17 -05:00
Guillaume Chatelet
434526729b [llvm] Make Sequence reverse-iterable
This is a roll forward of D102679.
This patch simplifies the implementation of Sequence and makes it compatible with llvm::reverse.
It exposes the reverse iterators through rbegin/rend which prevents a dangling reference in std::reverse_iterator::operator++().

Note: Compared to D102679, this patch introduces a `asSmallVector()` member function and fixes compilation issue with GCC 5.

Differential Revision: https://reviews.llvm.org/D103948
2021-06-10 11:15:28 +00:00
Esme-Yi
10a92ed3bb [NFC][XCOFF] Replace structs FileHeader32/SectionHeader32 with constants.
Summary: Some structs like FileHeader32/SectionHeader32
defined in llvm/include/llvm/BinaryFormat/XCOFF.h seem
unnecessary, because we only need their size. So this
patch removes them and defines size constants directly.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D103901
2021-06-10 11:10:45 +00:00
David Spickett
17058150e4 Revert "Implementation of global.get/set for reftypes in LLVM IR"
This reverts commit 31859f896cf90d64904134ce7b31230f374c3fcc.

Causing SVE and RISCV-V test failures on bots.
2021-06-10 10:11:17 +00:00
Simon Pilgrim
ffc4cc1eae [TargetLowering] getABIAlignmentForCallingConv - pass DataLayout by const reference. NFCI.
Avoid unnecessary copies and match every other method in TargetLowering that takes DataLayout as an argument.
2021-06-10 10:55:24 +01:00
Paulo Matos
fb9c8fd3dd Implementation of global.get/set for reftypes in LLVM IR
This change implements new DAG notes GLOBAL_GET/GLOBAL_SET, and
lowering methods for load and stores of reference types from IR
globals. Once the lowering creates the new nodes, tablegen pattern
matches those and converts them to Wasm global.get/set.

Reviewed By: tlively

Differential Revision: https://reviews.llvm.org/D95425
2021-06-10 10:07:45 +02:00
Esme-Yi
af137d3a70 [XCOFF][llvm-objdump] Dump the debug type in --section-headers option.
Summary: Add XCOFF recognition of debug section types
under `--section-headers` option.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D103079
2021-06-10 07:08:23 +00:00
Esme-Yi
0c72329384 [llvm-objdump][XCOFF] Enable the -l (--line-numbers) option.
Summary: Add support for dumping line number
information for XCOFF object files in llvm-objdump.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D101272
2021-06-10 04:37:06 +00:00
Sam Powell
edcfd20e4e Reland "[llvm] llvm-tapi-diff"
This is relanding commit d1d36f7ad2ae82bea8a6fcc40d6c42a72e21f096 .
This patch additionally addresses failures found in buildbots due to unstable build ordering & post review comments.

This patch introduces a new tool, llvm-tapi-diff, that compares and returns the diff of two TBD files.

Reviewed By: ributzka, JDevlieghere

Differential Revision: https://reviews.llvm.org/D101835
2021-06-09 21:17:34 -07:00
Jinsong Ji
34315a15e8 [AIX] Add traceback ssp canary bit support
We will need to set the ssp canary bit in traceback table to communicate
with unwinder about the canary.

Reviewed By: #powerpc, shchenz

Differential Revision: https://reviews.llvm.org/D103202
2021-06-10 02:40:02 +00:00
Justin Lebar
4a2e3ac704 Save/restore OuterTemplateParams in AbstractManglingParser::parseEncoding.
Previously we were only saving plain TemplateParams.

Differential Revision: https://reviews.llvm.org/D103996
2021-06-09 17:56:23 -07:00
Cyndy Ishida
1ad231f52e Revert "Reland "[llvm] llvm-tapi-diff""
This reverts commit 20126c9fd4afe2fe11510becccaa769332da302f.
The sorting fixes failed to have stable output on different platforms.
2021-06-09 13:48:09 -07:00
Sam Powell
6baaf00ed9 Reland "[llvm] llvm-tapi-diff"
This is relanding commit d1d36f7ad2ae82bea8a6fcc40d6c42a72e21f096 .
This patch additionally addresses failures found in buildbots & post review comments.

This patch introduces a new tool, llvm-tapi-diff, that compares and returns the diff of two TBD files.

Reviewed By: ributzka, JDevlieghere

Differential Revision: https://reviews.llvm.org/D101835
2021-06-09 10:35:41 -07:00
Fraser Cormack
cb0fa6245f [ValueTypes][RISCV] Cap RVV fixed-length vectors by size
This patch changes RVV's policy for its supported list of fixed-length
vector types by capping by vector size rather than element count. Now
all 1024-byte vectors (of supported element types) are supported, rather
than all 256-element vectors.

This is a more natural fit for the architecture, and allows us to, for
example, improve the support for vector bitcasts.

This change necessitated the adding of some new simple types to avoid
"regressing" on the number of currently-supported vectors. We round out
the 1024-byte types by adding `v512i8`, `v1024i8`, `v512i16` and
`v512f16`.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D103884
2021-06-09 12:15:37 +01:00
Florian Hahn
37c3bfd1ce [LTO] Support new PM in ThinLTOCodeGenerator.
This patch adds initial support for using the new pass manager when
doing ThinLTO via libLTO.

Reviewed By: steven_wu

Differential Revision: https://reviews.llvm.org/D102627
2021-06-09 10:05:14 +01:00
Jan Kratochvil
c90315c834 Revert "[llvm] Sync DebugInfo.h with DebugInfoFlags.def"
This reverts commit 093750dd0be6b0729f8e817766c3d5849545e10c.

It broke buildbots, goint to investigate it more.
2021-06-09 10:39:57 +02:00
Jan Kratochvil
191a799f53 [llvm] Sync DebugInfo.h with DebugInfoFlags.def
Command to see the differences:
  diff -u <(sed -n 's#^HANDLE_DI_FLAG *([^,]*, *\([^()]*\)) *\(//.*\)\?$#\1#p' <llvm/include/llvm/IR/DebugInfoFlags.def | grep -vw Largest) <(sed -n 's#^ *LLVMDIFlag\([^ ]*\) *= (\?[0-9].*$#\1#p' <llvm/include/llvm-c/DebugInfo.h)

OCaml binding is more seriously out of sync but I have not tried to sync it.

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D103910
2021-06-09 10:11:23 +02:00
Guillaume Chatelet
06979257cf [NFC] Reformat MachineValueType
This is a follow up patch based on https://reviews.llvm.org/D103251#2804016.

Differential Revision: https://reviews.llvm.org/D103893
2021-06-09 07:20:51 +00:00
Sterling Augustine
59049d1a79 Add Twine support for std::string_view.
With Twine now ubiquitous after rG92a79dbe91413f685ab19295fc7a6297dbd6c824,
it needs support for string_view when building clang with newer C++ standards.

This is similar to how StringRef is handled.

Differential Revision: https://reviews.llvm.org/D103935
2021-06-08 20:19:04 -07:00
Brendon Cahoon
3a664dba6e Reland "[AMDGPU] Add gfx1013 target"
This reverts commit 211e584fa2a4c032e4d573e7cdbffd622aad0a8f.

Fixed a use-after-free error that caused the sanitizers to fail.
2021-06-08 21:15:35 -04:00
Quinn Pham
6ce8448c70 [NFC] In the future, all intrinsics defined for compatibility with the XL
compiler will be placed in this collection.

This patch has no functional changes.

Differential revision: https://reviews.llvm.org/D103921
2021-06-08 17:58:02 -05:00
Whitney Tsang
c40da26524 Revert "Revert "[LoopNest] Fix Wdeprecated-copy warnings""
This reverts commit 07ef5805abe5d4576eb5528eab63e75505bfd0bd.

The broke of the sanitizer-windows bot:
https://lab.llvm.org/buildbot/#/builders/127/builds/12064
is not caused by the original commit.

Differential Revision: https://reviews.llvm.org/D103752
2021-06-08 21:51:53 +00:00
Whitney Tsang
d1d5a06d3d Revert "[LoopNest] Fix Wdeprecated-copy warnings"
This reverts commit dee1f0cb348b0a56375d9b563fb4d6918c431ed1.

It appears that this change broke the sanitizer-windows bot:
https://lab.llvm.org/buildbot/#/builders/127/builds/12064

Differential Revision: https://reviews.llvm.org/D103752
2021-06-08 20:46:12 +00:00
Brendon Cahoon
8238dc695f Revert "[AMDGPU] Add gfx1013 target"
This reverts commit ea10a86984ea73fcec3b12d22404a15f2f59b219.

A sanitizer buildbot reports an error.
2021-06-08 16:29:41 -04:00
Abhina Sreeskantharajan
f48a352265 [SystemZ][z/OS] Pass OpenFlags when creating tmp files
This patch https://reviews.llvm.org/D102876 caused some lit regressions on z/OS because tmp files were no longer being opened based on binary/text mode. This patch passes OpenFlags when creating tmp files so we can open files in different modes.

Reviewed By: amccarth

Differential Revision: https://reviews.llvm.org/D103806
2021-06-08 14:45:34 -04:00
Nick Desaulniers
42052632ff reland [IR] make -stack-alignment= into a module attr
Relands commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13 with fixes for
MIPS.

Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 10:59:46 -07:00
Mehdi Amini
ff780e6f68 Revert "[llvm] Make Sequence reverse-iterable"
This reverts commit e772216e708937988c039420d2c559568f91ae27
(and fixup 7f6c878a2c035eb6325ab228d9bc2d257509d959).

The build is broken with gcc5 host compiler:

In file included from
                 from mlir/lib/Dialect/Utils/StructuredOpsUtils.cpp:9:
tools/mlir/include/mlir/IR/BuiltinAttributes.h.inc:424:57: error: type/value mismatch at argument 1 in template parameter list for 'template<class ItTy, class FuncTy, class FuncReturnTy> class llvm::mapped_iterator'
                               std::function<T(ptrdiff_t)>>;
                                                         ^
tools/mlir/include/mlir/IR/BuiltinAttributes.h.inc:424:57: note:   expected a type, got 'decltype (seq<ptrdiff_t>(0, 0))::const_iterator'
2021-06-08 17:03:10 +00:00
Brendon Cahoon
c9fa68e102 [AMDGPU] Add gfx1013 target
Differential Revision: https://reviews.llvm.org/D103663
2021-06-08 12:49:49 -04:00
Nick Desaulniers
579f298a64 Revert "[IR] make -stack-alignment= into a module attr"
This reverts commit 433c8d950cb3a1fa0977355ce0367e8c763a3f13.

Breaks the MIPS build.
2021-06-08 08:55:50 -07:00
Nick Desaulniers
5c936095e3 [IR] make -stack-alignment= into a module attr
Similar to D102742, specifying the stack alignment via CodegenOpts means
that this flag gets dropped during LTO, unless the command line is
re-specified as a plugin opt. Instead, encode this information as a
module level attribute so that we don't have to expose this llvm
internal flag when linking the Linux kernel with LTO.

Looks like external dependencies might need a fix:
* https://github.com/llvm-hs/llvm-hs/issues/345
* https://github.com/halide/Halide/issues/6079

Link: https://github.com/ClangBuiltLinux/linux/issues/1377

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D103048
2021-06-08 08:31:04 -07:00
Whitney Tsang
8193fc7976 [LoopNest] Fix Wdeprecated-copy warnings
error: definition of implicit copy constructor for 'LoopNest' is
deprecated because it has a user-declared copy assignment operator
[-Werror,-Wdeprecated-copy]
  LoopNest &operator=(const LoopNest &) = delete;

Reviewed By: Meinersbur

Differential Revision: https://reviews.llvm.org/D103752
2021-06-08 14:48:13 +00:00
Guillaume Chatelet
5c5f9ef7cf Fix missing header and namespace qualifier in ADT Sequence 2021-06-08 14:11:54 +00:00
Guillaume Chatelet
83dd05c1f3 [llvm] Make Sequence reverse-iterable
This patch simplifies the implementation of Sequence and makes it compatible with llvm::reverse.
It exposes the reverse iterators through rbegin/rend which prevents a dangling reference in std::reverse_iterator::operator++().

Differential Revision: https://reviews.llvm.org/D102679
2021-06-08 13:18:57 +00:00
Hans Wennborg
5697956ae9 Revert "3rd Reapply "[DebugInfo] Use variadic debug values to salvage BinOps and GEP instrs with non-const operands""
> This reapplies c0f3dfb9, which was reverted following the discovery of
> crashes on linux kernel and chromium builds - these issues have since
> been fixed, allowing this patch to re-land.

This reverts commit 36ec97f76ac0d8be76fb16ac521f55126766267d.

The change caused non-determinism in the compiler, see comments on the code
review at https://reviews.llvm.org/D91722.

Reverting to unbreak people's builds until that can be addressed.

This also reverts the follow-up "[DebugInfo] Limit the number of values
that may be referenced by a dbg.value" in
a0bd6105d80698c53ceaa64bbe6e3b7e7bbf99ee.
2021-06-08 14:54:08 +02:00
Simon Moll
64d5c9acc6 [VP] getDeclarationForParams
`VPIntrinsic::getDeclarationForParams` creates a vp intrinsic
declaration for parameters you want to call it with.  This is in
preparation of a new builder class that makes emitting vp intrinsic code
nearly as convenient as using a plain ir builder (aka `VectorBuilder`,
to be used by D99750).

Reviewed By: frasercrmck, craig.topper, vkmr

Differential Revision: https://reviews.llvm.org/D102686
2021-06-08 14:21:28 +02:00
Timm Bäder
a146be49ac [NFC] Remove some include cycles
These files include themselves directly.
2021-06-08 14:00:39 +02:00
maekawatoshiki
aeb6fd7377 [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Also, a crash problem on legacy pass manager is fixed.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-06-08 20:30:02 +09:00
Lang Hames
49b1fd668d [JITLink] Clarify LinkGraph::splitBlock contract in comment. 2021-06-08 18:51:12 +10:00
Tomasz Miąsko
1dcdf007c2 [Demangle][Rust] Parse path backreferences
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103459
2021-06-08 10:01:49 +02:00
hsmahesha
b3154f6617 [IR] Add utility to convert constant expression operands (of an instruction) to instructions.
In the situation where we need to replace a constant operand C from a constant expression CE
by an instruction NI, it not possible without converting CE itself into an instruction. This
utility helps to convert the given set of constant expression operands from an instruction I
into a corresponding set of instructions.

The current use-case for this utility is from the patches - https://reviews.llvm.org/D103225
and https://reviews.llvm.org/D103655.

Reviewed By: rampitec

Differential Revision: https://reviews.llvm.org/D103661
2021-06-08 03:22:32 +05:30
Amir Ayupov
af66a21d42 [ELF] getRelocatedSection: remove the check for ET_REL object file
getRelocatedSection interface should not check that the object file is
relocatable, as executable files may have relocations preserved with
`--emit-relocs` linker flag. The relocations are useful in context of post-link
binary analysis for function reference identification. For example, BOLT relies
on relocations to perform function reordering.

Reviewed By: MaskRay, jhenderson

Differential Revision: https://reviews.llvm.org/D102296
2021-06-07 13:17:00 -07:00
Harald van Dijk
9e8b1c8ad9 [X32] Add Triple::isX32(), use it.
So far, support for x86_64-linux-gnux32 has been handled by explicit
comparisons of Triple.getEnvironment() to GNUX32. This worked as long as
x86_64-linux-gnux32 was the only X32 environment to worry about, but we
now have x86_64-linux-muslx32 as well. To support this, this change adds
an isX32() function and uses it. It replaces all checks for GNUX32 or
MuslX32 by isX32(), except for the following:

- Triple::isGNUEnvironment() and Triple::isMusl() are supposed to treat
  GNUX32 and MuslX32 differently.
- computeTargetTriple() needs to be able to transform triples to add or
  remove X32 from the environment and needs to map GNU to GNUX32, and
  Musl to MuslX32.
- getMultiarchTriple() completely lacks any Musl support and retains the
  explicit check for GNUX32 as it can only return x86_64-linux-gnux32.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D103777
2021-06-07 20:48:39 +01:00
Philip Reames
8ae2dfce83 [SCEV] Compute exit counts for unsigned IVs using mustprogress semantics
The motivation here is simple loops with unsigned induction variables w/non-one steps. A toy example would be:
for (unsigned i = 0; i < N; i += 2) { body; }

Given C/C++ semantics, we do not get the nuw flag on the induction variable. Given that lack, we currently can't compute a bound for this loop. We can do better for many cases, depending on the contents of "body".

The basic intuition behind this patch is as follows:
* A step which evenly divides the iteration space must wrap through the same numbers repeatedly. And thus, we can ignore potential cornercases where we exit after the n-th wrap through uint32_max.
* Per C++ rules, infinite loops without side effects are UB. We already have code in SCEV which relies on this.  In LLVM, this is tied to the mustprogress attribute.

Together, these let us conclude that the trip count of this loop must come before unsigned overflow unless the body would form a well defined infinite loop.

A couple notes for those reading along:
* I reused the loop properties code which is overly conservative for this case. I may follow up in another patch to generalize it for the actual UB rules.
* We could cache the n(s/u)w facts. I left that out because doing a pre-patch which cached existing inference showed a lot of diffs I had trouble fully explaining. I plan to get back to this, but I don't want it on the critical path.

Differential Revision: https://reviews.llvm.org/D103118
2021-06-07 11:24:00 -07:00
jasonliu
c03aed5d1b [XCOFF][AIX] Enable tooling support for 64 bit symbol table parsing
Add in the ability of parsing symbol table for 64 bit object.

Reviewed By: jhenderson, DiggerLin

Differential Revision: https://reviews.llvm.org/D85774
2021-06-07 17:24:13 +00:00
Raphael Isemann
001e5b68a5 [NFC] Add missing include to LaneBitmask.h to fix modules build 2021-06-07 18:43:00 +02:00
Sander de Smalen
052444882c [CostModel] Return Invalid cost in getArithmeticCost instead of crashing for scalable vectors.
This fixes an issue in BasicTTIImpl.h where it tries to do a
cast<FixedVectorType> on a scalable vector type in order to get the
scalarization cost. Because scalarization of scalable vectors is not
supported, we return Invalid instead.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D103798
2021-06-07 17:26:23 +01:00
Tomasz Miąsko
1d2a2e625f [Demangle][Rust] Parse dyn-trait-assoc-binding
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103364
2021-06-07 18:18:31 +02:00
Tomasz Miąsko
5534f83077 [Demangle][Rust] Parse dyn-trait
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103361
2021-06-07 18:18:31 +02:00
Tomasz Miąsko
5d985b77fe [Demangle][Rust] Parse dyn-bounds
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103151
2021-06-07 18:18:30 +02:00
Bradley Smith
6fa37c2af6 [AArch64][SVE] Improve codegen for dupq SVE ACLE intrinsics
Use llvm.experimental.vector.insert instead of storing into an alloca
when generating code for these intrinsics. This defers the codegen of
the generated vector to instruction selection, allowing existing
shufflevector style optimizations to apply.

Additionally, introduce a new target transform that can recognise fixed
predicate patterns in the svbool variants of these intrinsics.

Differential Revision: https://reviews.llvm.org/D103082
2021-06-07 12:21:38 +01:00
Guillaume Chatelet
517914e3e7 [NFC] Fix semantic discrepancy for MVT::LAST_VALUETYPE
Differential Revision: https://reviews.llvm.org/D103251
2021-06-07 10:04:16 +00:00
Jingu Kang
7721ad79a0 [SimpleLoopBoundSplit] Split Bound of Loop which has conditional branch with IV
This pass transforms loops that contain a conditional branch with induction
variable. For example, it transforms left code to right code:

                             newbound = min(n, c)
 while (iv < n) {            while(iv < newbound) {
   A                           A
   if (iv < c)                 B
     B                         C
   C                         }
 }                           if (iv != n) {
                               while (iv < n) {
                                 A
                                 C
                               }
                             }

Differential Revision: https://reviews.llvm.org/D102234
2021-06-07 10:55:25 +01:00
Esme-Yi
4a3c52750c [yaml2obj] Initial the support of yaml2obj for 32-bit XCOFF.
Summary: The patch implements the mapping of the Yaml
information to XCOFF object file to enable the yaml2obj
tool for XCOFF. Currently only 32-bit is supported.

Reviewed By: jhenderson, shchenz

Differential Revision: https://reviews.llvm.org/D95505
2021-06-07 04:14:44 +00:00
maekawatoshiki
c253e5477f Revert "[LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass"
This reverts commit 21653600034084e8335374ddc1eb8d362158d9a8.

To fix the crash problem in legacy pass manager
2021-06-07 01:26:47 +09:00
Nikita Popov
6fa17c5cd4 [TargetLowering] Use IRBuilderBase instead of IRBuilder<> (NFC)
Don't require a specific kind of IRBuilder for TargetLowering hooks.
This allows us to drop the IRBuilder.h include from TargetLowering.h.

Differential Revision: https://reviews.llvm.org/D103759
2021-06-06 16:29:50 +02:00
Nikita Popov
435f6cc870 [TargetLowering] Move methods out of line (NFC)
Move methods using IRBuilder out of line, so we can drop the
dependency on the header.
2021-06-06 16:02:10 +02:00
Simon Pilgrim
2640520cf0 BreadthFirstIterator.h - fix uninitialized variable warning in default constructor. NFCI. 2021-06-06 14:13:08 +01:00
Simon Pilgrim
e1a45194ff PatternMatch.h - wrap WrapFlags tests inside brackets to stop static analysis warning about & vs && usage. NFCI. 2021-06-06 13:38:39 +01:00
Simon Pilgrim
ae1d08ce66 SmallVector.h - remove unused MathExtras.h header. NFCI. 2021-06-06 12:05:39 +01:00
Simon Pilgrim
d12d7f93a7 Fix uninitialized variable warnings. NFCI. 2021-06-06 11:09:55 +01:00
Simon Pilgrim
e01b869231 Revert rG0b18c4c0ec03f0321ee83b9976da5777d0e4f53f "SmallVector.h - remove unused MathExtras.h header (REAPPLIED). NFCI."
Buildbots still seem to find implicit header dependencies that I can't locally....
2021-06-06 09:39:19 +01:00
Simon Pilgrim
e9d779eb95 SmallVector.h - remove unused MathExtras.h header (REAPPLIED). NFCI.
Try again to remove this header - I think I've found the implicit dependencies (mainly for <cmath>) on linux builds now.
2021-06-06 09:30:15 +01:00
Simon Pilgrim
b46820ae2c Revert rG7b839b3542983a313a9bf9f8d8039ceeea35c4d7 - "SmallVector.h - remove unused MathExtras.h header. NFCI."
Breaks on linux buildbots as I seem to have missed some implicit header dependencies....
2021-06-05 20:59:46 +01:00
Simon Pilgrim
a2c0915a26 SmallVector.h - remove unused MathExtras.h header. NFCI. 2021-06-05 20:19:58 +01:00
Simon Pilgrim
9f2f5ad5b4 BitstreamWriter.h - add missing implicit MathExtras.h header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h
2021-06-05 19:20:14 +01:00
Simon Pilgrim
10c6d871b5 ELFTypes.h - add missing implicit MathExtras.h header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h
2021-06-05 19:11:40 +01:00
Simon Pilgrim
8916359679 [MCA] Support.h - add missing implicit MathExtras.h header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h
2021-06-05 19:10:49 +01:00
Simon Pilgrim
1cd1249a42 EndianStream.h - add missing implicit MathExtras.h header dependency. NFCI.
Noticed while investigating if we can remove an unnecessary MathExtras.h include from SmallVector.h
2021-06-05 18:05:40 +01:00
Roman Lebedev
46218a275e [NFC] Promote willNotOverflow() / getStrengthenedNoWrapFlagsFromBinOp() from IndVars into SCEV proper
We might want to use it when creating SCEV proper in createSCEV(),
now that we don't `forgetValue()` in `SimplifyIndvar::strengthenOverflowingOperation()`,
which might have caused us to loose some optimization potential.
2021-06-05 12:17:51 +03:00
Nikita Popov
a17845f6c0 [LoopUnroll] Separate peeling from unrolling
Loop peeling is currently performed as part of UnrollLoop().
Outside test scenarios, it is always performed with an unroll
count of 1. This means that unrolling doesn't actually do anything
apart from performing post-unroll simplification.

When testing, it's currently possible to specify both an explicit
peel count and an explicit unroll count. This doesn't perform any
sensible operation and may result in miscompiles, see
https://bugs.llvm.org/show_bug.cgi?id=45939.

This patch moves peeling from UnrollLoop() into tryToUnrollLoop(),
so that peeling does not also perform a susequent unroll. We only
run the post-unroll simplifications. Specifying both an explicit
peel count and unroll count is forbidden.

In the future, we may want to support both (non-PGO) peeling a
loop and unrolling it, but this needs to be done by first performing
the peel and then recalculating unrolling heuristics on a now
possibly analyzable loop.

Differential Revision: https://reviews.llvm.org/D103362
2021-06-05 10:32:00 +02:00
Amir Ayupov
1d32182997 [MC] Add getLSDASection interface
This diff adds getLSDASection method to MCObjectFileInfo.

Test plan: make check-all

Differential revision: https://reviews.llvm.org/D102298
2021-06-05 00:28:20 -07:00
Rong Xu
559805b594 [SampleFDO] New hierarchical discriminator for FS SampleFDO (llvm-profdata part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is for llvm-profdata part of change. It sets the bit masks for the
profile reader in llvm-profdata. Also add an internal option
"-fs-discriminator-pass" for show and merge command to process the profile
offline.

This patch also moved setDiscriminatorMaskedBitFrom() to
SampleProfileReader::create() to simplify the interface.

Differential Revision: https://reviews.llvm.org/D103550
2021-06-04 11:22:06 -07:00
Joseph Huber
6028348946 [Attributor] Check HeapToStack's state for isKnownHeapToStack
This patch changes the `isKnownHeapToStack` and `isAssumedHeapToStack`
member functions to return if a function call is going to be altered by
HeapToStack.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103574
2021-06-04 12:38:33 -04:00
Joseph Huber
d685caa27a [Attributor] Allow lookupAAFor to return null on invalid state
This patch adds an option to `lookupAAFor` that allows it to return a
nullptr if the state of the looked up attribute is invalid. This is so
future passes can use this to query other attributes with the guarantee
that they are valid.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D103556
2021-06-04 12:29:15 -04:00
Alexey Bataev
fe6e3a2893 [OPENMP]Fix PR50129: omp cancel parallel not working as expected.
Need to emit a call for __kmpc_cancel_barrier in the exit block for
__kmpc_cancel function call if cancellation of the parallel block is
requested.

Differential Revision: https://reviews.llvm.org/D103646
2021-06-04 08:24:55 -07:00
Mirko Brkusanin
2afa995d4a [AMDGPU][GlobalISel] Legalize G_ABS
Legalize and select G_ABS so that we can use llvm.abs intrinsic

Differential Revision: https://reviews.llvm.org/D102391
2021-06-04 14:46:43 +02:00
Cyndy Ishida
35b9c30842 Revert "[llvm] llvm-tapi-diff"
This reverts commit d1d36f7ad2ae82bea8a6fcc40d6c42a72e21f096.
Reverting this patch to investigate linux bot failures
 + fix with author offline
2021-06-03 21:10:51 -07:00
Arthur Eubanks
ad23e89deb [NFC] Remove checking pointee type for byval/preallocated type
These currently always require a type parameter. The bitcode reader
already upgrades old bitcode without the type parameter to use the
pointee type.
2021-06-03 19:09:09 -07:00
zero9178
d052166c91 [NFC] Add missing includes for LLVM_ENABLE_MODULES builds
Building LLVM with the LLVM_ENABLE_MODULES cmake option fails when the modules are being compiled due to missing includes. This is a side effect of some transitive includes that changed recently.

Differential Revision: https://reviews.llvm.org/D103645
2021-06-03 23:29:03 +02:00
Philip Reames
8c571feefd [LoopUnroll] Eliminate PreserveCondBr parameter and fix a bug in the process
This builds on D103584. The change eliminates the coupling between unroll heuristic and implementation w.r.t. knowing when the passed in trip count is an exact trip count or a max trip count. In theory the new code is slightly less powerful (since it relies on exact computable trip counts), but in practice, it appears to cover all the same cases. It can also be extended if needed.

The test change shows what appears to be a bug in the existing code around the interaction of peeling and unrolling. The original loop only ran 8 iterations. The previous output had the loop peeled by 2, and then an exact unroll of 8. This meant the loop ran a total of 10 iterations which appears to have been a miscompile.

Differential Revision: https://reviews.llvm.org/D103620
2021-06-03 14:09:16 -07:00
Sam Powell
270d9dda87 [llvm] llvm-tapi-diff
This patch introduces a new tool, llvm-tapi-diff, that compares and returns the diff of two TBD files.

Reviewed By: ributzka, JDevlieghere

Differential Revision: https://reviews.llvm.org/D101835
2021-06-03 11:38:00 -07:00
Eli Friedman
3474ad7aad [AtomicExpand] Merge cmpxchg success and failure ordering when appropriate.
If we're not emitting separate fences for the success/failure cases, we
need to pass the merged ordering to the target so it can emit the
correct instructions.

For the PowerPC testcase, we end up with extra fences, but that seems
like an improvement over missing fences.  If someone wants to improve
that, the PowerPC backed could be taught to emit the fences after isel,
instead of depending on fences emitted by AtomicExpand.

Fixes https://bugs.llvm.org/show_bug.cgi?id=33332 .

Differential Revision: https://reviews.llvm.org/D103342
2021-06-03 11:34:35 -07:00
Artur Pilipenko
781215bf15 NFC. Mark DOTFuncInfo getters as const
This is a preparatory refactoring for introducing new
types of hidden blocks.
2021-06-03 11:27:06 -07:00
Artur Pilipenko
b3d2f04a04 NFC. Refactor DOTGraphTraits::isNodeHidden
Restructure handling of cfg-hide-unreachable-paths and
cfg-hide-deoptimize-paths options so as to make it easier
to introduce new types of hidden blocks.
2021-06-03 11:27:06 -07:00
Philip Reames
40fae0516b [LoopUnroll] Eliminate PreserveOnlyFirst parameter [nfc]
This is a first step towards simplifying the transform interface to be less error prone. The basic idea is that querying SCEV is cheap (since it's cached) and we can just check for properties related to branch folding in the transform method instead of relying on the heuristic part to pass everything in correctly.

Differential Revision: https://reviews.llvm.org/D103584
2021-06-03 10:33:14 -07:00
Nikita Popov
df6a63d5f1 [MC] Add missing include (NFC)
Try to fix buildbots after 983565a6fe4a9f40c7caf82b65c650c20dbcc104.
2021-06-03 18:50:00 +02:00
Nikita Popov
1c866d4e4f [ADT] Move DenseMapInfo for ArrayRef/StringRef into respective headers (NFC)
This is a followup to D103422. The DenseMapInfo implementations for
ArrayRef and StringRef are moved into the ArrayRef.h and StringRef.h
headers, which means that these two headers no longer need to be
included by DenseMapInfo.h.

This required adding a few additional includes, as many files were
relying on various things pulled in by ArrayRef.h.

Differential Revision: https://reviews.llvm.org/D103491
2021-06-03 18:34:36 +02:00
David Spickett
b2313e8eb4 [clang][ARM] Remove arm2/3/6/7m CPU names
These legacy CPUs are known to clang but not llvm.
Their use was ignored by llvm and it would print a
warning saying it did not recognise them.

However because some of them are default CPUs for their
architecture, you would get those warnings even if you didn't
choose a cpu explicitly.
(now those architectures will default to a "generic" CPU)

Information is thin on the ground for these older chips
so this is the best I could find:
https://en.wikichip.org/wiki/acorn/microarchitectures/arm2
https://en.wikichip.org/wiki/acorn/microarchitectures/arm3
https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm6
https://en.wikichip.org/wiki/arm_holdings/microarchitectures/arm7

Final part of fixing https://bugs.llvm.org/show_bug.cgi?id=50454.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D103028
2021-06-03 08:55:44 +00:00
Min-Yih Hsu
fc07b73b83 [CodeGen][NFC] Remove unused virtual function
`TargetFrameLowering::emitCalleeSavedFrameMoves` with 4 arguments is not
used anywhere in CodeGen. Thus it shouldn't be exposed as a virtual
function. NFC.

Differential Revision: https://reviews.llvm.org/D103328
2021-06-02 13:11:12 -07:00
Stefan Pintilie
7e5fc9d052 [NFC] Remove variable that was set but not used.
The buildbot ppc64le-lld-multistage-test has been failing because the variable
Tag in Waymaking.h is set but not used. This patch removes that varaible.
2021-06-02 13:20:32 -05:00
Rong Xu
f505b894a2 [SampleFDO] New hierarchical discriminator for FS SampleFDO (ProfileData part)
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.

This patch also simplified the bit API used by FS discriminators.

Differential Revision: https://reviews.llvm.org/D103041
2021-06-02 10:32:52 -07:00
Qunyan Mangus
037a1994f1 Add getDemandedBits for uses.
Add getDemandedBits method for uses so we can query demanded bits for each use.  This can help getting better use information. For example, for the code below
define i32 @test_use(i32 %a) {
  %1 = and i32 %a, -256
  %2 = or i32 %1, 1
  %3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose)
  ... some use of %3
  ret %2
}
if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D97074
2021-06-02 10:07:40 -04:00
Sander de Smalen
9fbb47ea3c [LV] Build and cost VPlans for scalable VFs.
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D98722
2021-06-02 14:47:47 +01:00
Sean Fertile
ffbb13fbef [PowerPC][AIX} FIx AIX bootstrap build.
A recent patch:
https://reviews.llvm.org/rGe0921655b1ff8d4ba7c14be59252fe05b705920e
changed clangs AIX bitfield handling to use 4-byte bitfield containers,
matching XLs behavior. This change triggers static assert failures when
bootstrapping. Change the macro we check to enable bitfield packing on
AIX to `__clang__` which is defined by both xlclang and clang.

Differential Revision: https://reviews.llvm.org/D103474
2021-06-02 09:31:11 -04:00
Daniil Fukalov
ac8a0d3041 [TTI] NFC: Change getIntImmCodeSizeCost to return InstructionCost.
This patch migrates the TTI cost interfaces to return an InstructionCost.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: sdesmalen

Differential Revision: https://reviews.llvm.org/D102915
2021-06-02 16:04:11 +03:00
Bjorn Pettersson
e9cfe99a84 [CodeGen] Refactor libcall lookups for RTLIB::POWI_*
Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_*
libcall for a given value type.

This includes a small refactoring of ExpandFPLibCall and
ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code,
plus adding an ExpandFPLibCall version that can be called directly
when expanding FPOWI/STRICT_FPOWI to ensure that we actually use
the same RTLIB::Libcall when expanding the libcall as we used when
checking the legality of such a call by doing a getLibcallName check.

Differential Revision: https://reviews.llvm.org/D103050
2021-06-02 11:40:34 +02:00
Bjorn Pettersson
1f980badb2 [SimplifyLibCalls] Take size of int into consideration when emitting ldexp/ldexpf
When rewriting
  powf(2.0, itofp(x)) -> ldexpf(1.0, x)
  exp2(sitofp(x)) -> ldexp(1.0, sext(x))
  exp2(uitofp(x)) -> ldexp(1.0, zext(x))

the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.

The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.

Differential Revision: https://reviews.llvm.org/D99438
2021-06-02 11:40:34 +02:00
Tomasz Miąsko
8a32a3f1a2 [Demangle][Rust] Parse binders
Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D102729
2021-06-02 10:36:45 +02:00
Arthur Eubanks
2a26a5c713 [OpaquePtr] Create API to make a copy of a PointerType with some address space
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D103429
2021-06-01 16:52:32 -07:00
Daniel Sanders
8c8a0a3167 fixup: Missing operator in [globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one
My local compiler was fine with it but the bots complain about ambiguous types.
2021-06-01 13:58:03 -07:00
Daniel Sanders
71a22fb7f8 [globalisel][legalizer] Separate the deprecated LegalizerInfo from the current one
It's still in use in a few places so we can't delete it yet but there's not
many at this point.

Differential Revision: https://reviews.llvm.org/D103352
2021-06-01 13:23:48 -07:00
Andrew Kelley
08aca8b420 WindowsSupport.h: do not depend on private config header
WindowsSupport.h is a public header, however if it gets included, will cause a compile error indicating that llvm/Config/config.h cannot be found, because config.h is a private header. However there is no actual dependency on the private things in this header, so it can be changed to the public config header.

Reviewed By: amccarth

Differential Revision: https://reviews.llvm.org/D103370
2021-06-01 23:05:03 +03:00
Jessica Paquette
852a8449e7 [GlobalISel][AArch64] Combine and (lshr x, cst), mask -> ubfx x, cst, width
Also add a target hook which allows us to get around custom legalization on
AArch64.

Differential Revision: https://reviews.llvm.org/D99283
2021-06-01 10:56:17 -07:00
Guozhi Wei
b2dfe60e88 [X86FixupLEAs] Transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence

    lea (reg1, reg2), reg3
    sub reg3, reg4

to two sub instructions

    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.
The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register of ADD).

Differential Revision: https://reviews.llvm.org/D101970
2021-06-01 10:31:30 -07:00
Eli Friedman
a4632c066a [polly] Fix SCEVLoopAddRecRewriter to avoid invalid AddRecs.
When we're remapping an AddRec, the AddRec constructed by a partial
rewrite might not make sense.  This triggers an assertion complaining
it's not loop-invariant.

Instead of constructing the partially rewritten AddRec, just skip
straight to calling evaluateAtIteration.

Testcase was automatically reduced using llvm-reduce, so it's a little
messy, but hopefully makes sense.

Differential Revision: https://reviews.llvm.org/D102959
2021-06-01 09:51:05 -07:00
Nikita Popov
0d55b59b6a [ADT] Move DenseMapInfo for APInt into APInt.h (PR50527)
As suggested in https://bugs.llvm.org/show_bug.cgi?id=50527, this
moves the DenseMapInfo for APInt and APSInt into the respective
headers, removing the need to include APInt.h and APSInt.h from
DenseMapInfo.h.

We could probably do the same from StringRef and ArrayRef as well.

Differential Revision: https://reviews.llvm.org/D103422
2021-06-01 18:31:41 +02:00
Daniil Seredkin
764783428b [InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y)
InstCombine didn't perform the transformations when fmul's operands were
the same instruction because it required to have one use for each of them
which is false in the case. This patch fixes this + adds tests for them
and introduces a new function isOnlyUserOfAnyOperand to check these cases
in a single place.

This patch is a result of discussion in D102574.

Differential Revision: https://reviews.llvm.org/D102698
2021-06-01 08:33:23 -04:00
Andy Wingo
a2b88794ad [WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal.  This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.

For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1.  SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there.  Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.

Differential Revision: https://reviews.llvm.org/D101140
2021-06-01 11:31:39 +02:00
Arthur Eubanks
8f3353aa63 [OpaquePtr] Remove some uses of PointerType::getElementType() 2021-05-31 16:11:25 -07:00
Florian Hahn
fd8a91542c [LV] Try to sink users recursively for first-order recurrences.
Update isFirstOrderRecurrence to  explore all uses of a recurrence phi
and check if we can sink them. If there are multiple users to sink, they
are all mapped to the previous instruction.

Fixes PR44286 (and another PR or two).

Reviewed By: Ayal

Differential Revision: https://reviews.llvm.org/D84951
2021-05-31 19:55:33 +01:00
Andrea Di Biagio
44ebb7579c [MCA][NFCI] Minor changes to InstrBuilder and Instruction.
This is based on the assumption that most simulated instructions don't define
more than one or two registers. This is true for example on x86, where
most instruction definitions don't declare more than one register write.

The default code region size has been increased from 8 to 16. This is based on
the assumption that, for small microbenchmarks, the typical code snippet size is
often less than 16 instructions.

mca::Instruction now uses bitfields to pack flags.
No functional change intended.
2021-05-31 17:05:13 +01:00
Daniil Fukalov
47fca931a0 [NFC] MemoryDependenceAnalysis cleanup.
1. Removed redundant includes,
2. Removed never defined and used `releaseMemory()`.
3. Fixed member functions names first letter case.
4. Renamed duplicate (in nested struct `NonLocalPointerInfo`) name
   `NonLocalDeps` to `NonLocalDepsMap`.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D102358
2021-05-31 18:07:55 +03:00
Roman Lebedev
471f397407 [NFC] ScalarEvolution: apply SSO to the ExprValueMap value
ExprValueMap is a map from SCEV * to a set-vector of (Value *, ConstantInt *) pair,
and while the map itself will likely be big-ish (have many keys),
it is a reasonable assumption that each key will refer to a small-ish
number of pairs.

In particular looking at n=512 case from
https://bugs.llvm.org/show_bug.cgi?id=50384,
the small-size of 4 appears to be the sweet spot,
it results in the least allocations while minimizing memory footprint.
```
$ for i in $(ls heaptrack.opt.*.gz); do echo $i; heaptrack_print $i | tail -n 6; echo ""; done
heaptrack.opt.0-orig.gz
total runtime: 14.32s.
calls to allocation functions: 8222442 (574192/s)
temporary memory allocations: 2419000 (168924/s)
peak heap memory consumption: 190.98MB
peak RSS (including heaptrack overhead): 239.65MB
total memory leaked: 67.58KB

heaptrack.opt.1-n1.gz
total runtime: 13.72s.
calls to allocation functions: 7184188 (523705/s)
temporary memory allocations: 2419017 (176338/s)
peak heap memory consumption: 191.38MB
peak RSS (including heaptrack overhead): 239.64MB
total memory leaked: 67.58KB

heaptrack.opt.2-n2.gz
total runtime: 12.24s.
calls to allocation functions: 6146827 (502355/s)
temporary memory allocations: 2418997 (197695/s)
peak heap memory consumption: 163.31MB
peak RSS (including heaptrack overhead): 211.01MB
total memory leaked: 67.58KB

heaptrack.opt.3-n4.gz
total runtime: 12.28s.
calls to allocation functions: 6068532 (494260/s)
temporary memory allocations: 2418985 (197017/s)
peak heap memory consumption: 155.43MB
peak RSS (including heaptrack overhead): 201.77MB
total memory leaked: 67.58KB

heaptrack.opt.4-n8.gz
total runtime: 12.06s.
calls to allocation functions: 6068042 (503321/s)
temporary memory allocations: 2418992 (200646/s)
peak heap memory consumption: 166.03MB
peak RSS (including heaptrack overhead): 213.55MB
total memory leaked: 67.58KB

heaptrack.opt.5-n16.gz
total runtime: 12.14s.
calls to allocation functions: 6067993 (499958/s)
temporary memory allocations: 2418999 (199307/s)
peak heap memory consumption: 187.24MB
peak RSS (including heaptrack overhead): 233.69MB
total memory leaked: 67.58KB
```

While that test may be an edge worst-case scenario,
https://llvm-compile-time-tracker.com/compare.php?from=dee85d47d9f15fc268f7b18f279dac2774836615&to=98a57e31b1947d5bcdf4a5605ac2ab32b4bd5f63&stat=instructions
agrees that this also results in improvements in the usual situations.
2021-05-31 15:34:03 +03:00
Andy Wingo
34f735fb88 Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables"
This reverts commit bf35f4af51cddd743435bb6b94a45592c967891a.  There was
an error in a shared-library build.
2021-05-31 10:55:15 +02:00
Andy Wingo
6faf61e8ac [WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal.  This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.

For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1.  SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there.  Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.

Differential Revision: https://reviews.llvm.org/D101140
2021-05-31 10:40:38 +02:00
Sanjay Patel
c5aaaaa9b9 [InstCombine] fix miscompile from vector select substitution
This is similar to the fix in c590a9880d7a ( PR49832 ), but
we missed handling the pattern for select of bools (no compare
inst).

We can't substitute a vector value because the equality condition
replacement that we are attempting requires that the condition
is true/false for the entire value. Vector select can be partly
true/false.

I added an assert for vector types, so we shouldn't hit this again.
Fixed formatting while auditing the callers.

https://llvm.org/PR50500
2021-05-30 07:11:58 -04:00
Arthur Eubanks
85767d0682 Revert "[NFC] Use ArgListEntry indirect types more in ISel lowering"
This reverts commit bc7d15c61da78864b35e3c114294d6e4db645611.

Dependent change is to be reverted.
2021-05-29 22:40:33 -07:00
Fangrui Song
df99c4fbee [Internalize] Simplify comdat renaming with noduplicates after D103043
I realized that we can use `comdat noduplicates` which is available on ELF.
Add a special case for wasm which doesn't support the feature.
2021-05-28 16:58:38 -07:00
Eli Friedman
1638fc9086 [AArch64][RISCV] Make sure isel correctly honors failure orderings.
If a cmpxchg specifies acquire or seq_cst on failure, make sure we
generate code consistent with that ordering even if the success ordering
is not acquire/seq_cst.

At one point, it was ambiguous whether this sort of construct was valid,
but the C++ standad and LLVM now accept arbitrary combinations of
success/failure orderings.

This doesn't address the corresponding issue in AtomicExpand. (This was
reported as https://bugs.llvm.org/show_bug.cgi?id=33332 .)

Fixes https://bugs.llvm.org/show_bug.cgi?id=50512.

Differential Revision: https://reviews.llvm.org/D103284
2021-05-28 12:47:40 -07:00
Craig Topper
22fc6f8fbe [VP] Make getMaskParamPos/getVectorLengthParamPos return unsigned. Lowercase function names.
Parameter positions seem like they should be unsigned.

While there, make function names lowercase per coding standards.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D103224
2021-05-28 11:28:47 -07:00
eopXD
00f9d45052 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 15:43:12 +00:00
Andy Wingo
89bbb9e629 Revert "[WebAssembly][CodeGen] IR support for WebAssembly local variables"
This reverts commit 00ecf18979e3326b3afee8af3dc701c53ffdc93f, as it
broke the AMDGPU build.  Will reland later with a fix.
2021-05-28 12:42:12 +02:00
Andy Wingo
6446c1dcda [WebAssembly][CodeGen] IR support for WebAssembly local variables
This patch adds TargetStackID::WasmLocal.  This stack holds locations of
values that are only addressable by name -- not via a pointer to memory.
For the WebAssembly target, these objects are lowered to WebAssembly
local variables, which are managed by the WebAssembly run-time and are
not addressable by linear memory.

For the WebAssembly target IR indicates that an AllocaInst should be put
on TargetStackID::WasmLocal by putting it in the non-integral address
space WASM_ADDRESS_SPACE_WASM_VAR, with value 1.  SROA will mostly lift
these allocations to SSA locals, but any alloca that reaches instruction
selection (usually in non-optimized builds) will be assigned the new
TargetStackID there.  Loads and stores to those values are transformed
to new WebAssemblyISD::LOCAL_GET / WebAssemblyISD::LOCAL_SET nodes,
which then lower to the type-specific LOCAL_GET_I32 etc instructions via
tablegen patterns.

Differential Revision: https://reviews.llvm.org/D101140
2021-05-28 11:07:41 +02:00
eopXD
ed545893e8 Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass"
This reverts commit 7952ddb21fb7e086d5a6f97767f235d2f6ae2176.

Differential Revision: https://reviews.llvm.org/D103302
2021-05-28 07:58:06 +00:00
eopXD
7ef7d942e2 Revert "[LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass"
This reverts commit ffc4d3e06855550a8bd2a691f6d05828d5bf4ddf.
2021-05-28 07:48:04 +00:00
eopXD
0dfabb08f1 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 07:25:53 +00:00
eopXD
94aa05ba01 [LoopNest][LoopFlatten] Change LoopFlattenPass to LoopNest pass
This patch changes LoopFlattenPass from FunctionPass to LoopNestPass.

Utilize LoopNest and let function 'Flatten' generate information from it.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D102904
2021-05-28 07:11:26 +00:00
Jordan Rupprecht
b50ce5b1a4 [NFC][libObject] clang-format Archive{.h,.cpp}
In preparation for D100651
2021-05-27 16:48:40 -07:00
Andrea Di Biagio
f7539d249a [MCA] Minor changes to the InOrderIssueStage. NFC
The constructor of InOrderIssueStage no longer takes as input a reference to the
target scheduling model. The stage can always query the subtarget to obtain a
reference to the scheduling model.
The ResourceManager is no longer stored internally as a unique_ptr.
Moved a couple of method definitions to the .cpp file.
2021-05-28 00:33:59 +01:00
Andrea Di Biagio
8de4c75a32 [MCA] Refactor the InOrderIssueStage stage. NFCI
Moved the logic that checks for RAW hazards from the InOrderIssueStage to the
RegisterFile.

Changed how the InOrderIssueStage keeps track of backend stalls. Stall events
are now generated from method notifyStallEvent().

No functional change intended.
2021-05-27 22:28:04 +01:00
Quinn Pham
085fb33d86 [PowerPC] Added multiple PowerPC builtins
This is the first in a series of patches to provide builtins for
compatibility with the XL compiler. Most of the builtins already had
intrinsics and only needed to be implemented in the front end.
Intrinsics were created for the three iospace builtins, eieio, and icbt.
Pseudo instructions were created for eieio and iospace_eieio to
ensure that nops were inserted before the eieio instruction.

Reviewed By: nemanjai, #powerpc

Differential Revision: https://reviews.llvm.org/D102443
2021-05-27 16:23:03 -05:00
Adrian Prantl
dc45fffd37 Support stripping indirectly referenced DILocations from !llvm.loop metadata
in stripDebugInfo().  This patch fixes an oversight in
https://reviews.llvm.org/D96181 and also takes into account loop
metadata pointing to other MDNodes that point into the debug info.

rdar://78487175

Differential Revision: https://reviews.llvm.org/D103220
2021-05-27 13:23:33 -07:00
Saleem Abdulrasool
20ebf3a03d MC: mark dump with LLVM_DUMP_METHOD
Mark the `ELFRelocationEntry::dump` method as `LLVM_DUMP_METHOD` to
annotate it properly as used to prevent the function being dead stripped
away.  This allows use of `dump` in the debugger.  This is purely to
improve the developer experience.
2021-05-27 10:47:39 -07:00
maekawatoshiki
45a93367af [LoopUnrollAndJam] Change LoopUnrollAndJamPass to LoopNest pass
This patch changes LoopUnrollAndJamPass from FunctionPass to LoopNest pass.
The next patch will utilize LoopNest to effectively handle loop nests.

Reviewed By: Whitney

Differential Revision: https://reviews.llvm.org/D99149
2021-05-28 01:17:23 +09:00
Fraser Cormack
58a6d02787 [VP][SelectionDAG] Add a target-configurable EVL operand type
This patch adds a way for the target to configure the type it uses for
the explicit vector length operands of VP SDNodes. The type must be a
legal integer type (there is still no target-independent legalization of
this operand) and must currently be at least as big as i32, the type
used by the IR intrinsics. An implicit zero-extension takes place on
targets which choose a larger type. All VP nodes should be created with
this type used for the EVL operand.

This allows 64-bit RISC-V to avoid custom legalization of all VP nodes,
keeping them in their target-independent form for that bit longer.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D103027
2021-05-27 15:27:36 +01:00
Mats Petersson
caa14ae743 [OpenMP]Add support for workshare loop modifier in lowering
When lowering the dynamic, guided, auto and runtime types of scheduling,
there is an optional monotonic or non-monotonic modifier. This patch
adds support in the OMP IR Builder to pass this down to the runtime
functions.

Also implements tests for the variants.

Differential Revision: https://reviews.llvm.org/D102008
2021-05-27 15:33:05 +01:00
Simon Giesecke
f35a5b956e Add --quiet option to llvm-gsymutil to suppress output of warnings.
Differential Revision: https://reviews.llvm.org/D102829
2021-05-27 12:36:34 +00:00
Mats Petersson
ffafbe5131 Revert "[OpenMP]Add support for workshare loop modifier in lowering"
This reverts commit ea4c5fb04c6d9618d451fb2d2c360dc95c6d9131.
2021-05-27 13:09:47 +01:00
Mats Petersson
ae07366301 [OpenMP]Add support for workshare loop modifier in lowering
When lowering the dynamic, guided, auto and runtime types of scheduling,
there is an optional monotonic or non-monotonic modifier. This patch
adds support in the OMP IR Builder to pass this down to the runtime
functions.

Also implements tests for the variants.

Differential Revision: https://reviews.llvm.org/D102008
2021-05-27 12:28:27 +01:00
Amara Emerson
d5383816bc [GlobalISel] Implement splitting of G_SHUFFLE_VECTOR.
Thhis is a port from the DAG legalization. We're still missing some of the
canonicalizations of shuffles but it's a start.

Differential Revision: https://reviews.llvm.org/D102828
2021-05-27 00:28:38 -07:00
Hasyimi Bahrudin
aa98e6ea8a Fix non-global-value-max-name-size not considered by LLParser
`non-global-value-max-name-size` is used by `Value` to cap the length of local value name. However, this flag is not considered by `LLParser`, which leads to unexpected `use of undefined value error`. The fix is to move the responsibility of capping the length to `ValueSymbolTable`.

The test is the one provided by [[ https://bugs.llvm.org/show_bug.cgi?id=45899 | Mikael in the bug report ]].

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D102707
2021-05-27 04:20:03 +00:00
Yevgeny Rouban
0a4cf978a7 [RS4GC] Introduce intrinsics to get base ptr and offset
There can be a need for some optimizations to get (base, offset)
for any GC pointer. The base can be calculated by generating
needed instructions as it is done by the
RewriteStatepointsForGC::findBasePointer() function. The offset
can be calculated in the same way. Though to not expose the base
calculation and to make the offset calculation as simple as
ptrtoint(derived_ptr) - ptrtoint(base_ptr), which is illegal
outside RS4GC, this patch introduces 2 intrinsics:

 @llvm.experimental.gc.get.pointer.base(%derived_ptr)
 @llvm.experimental.gc.get.pointer.offset(%derived_ptr)

These intrinsics are inlined by RS4GC along with generation of
statepoint sequences.

With these new intrinsics the GC parseable lowering for atomic
memcpy intrinsics (6ec2c5e402a724ba99bce82a9cac7a3006d660f4)
could be implemented as a separate pass.

Reviewed By: reames
Differential Revision: https://reviews.llvm.org/D100445
2021-05-27 09:14:14 +07:00
Jessica Paquette
d821abe3ce [GlobalISel] Don't emit lost debug location remarks when legalizing tail calls
There were a bunch of lost debug location remarks that show up when legalizing
tail calls on AArch64.

This would happen because we drop the return in the block where we emit the
tail call. So, we end up dropping the debug location, which makes the
LostDebugLocObserver report a missing debug location.

Although it's *true* that we lose these debug locations, this isn't
a particularly useful remark. We expect to drop these debug locations when
emitting tail calls. Suppressing remarks in this case is preferable, since the
amount of noise could hide actual debug location related bugs.

To do this, I just plumbed the LostDebugLocObserver through the relevant
LegalizerHelper functions. This is the only case I can think of where we need
the LostDebugLocObserver in the LegalizerHelper. So, rather than storing it
in the LegalizerHelper proper and mucking around with the constructors, I
figured it'd be cleanest to take the simplest path for now.

This clears up ~20 noisy lost debug location remarks on CTMark in AArch64 at
-Os.

Differential Revision: https://reviews.llvm.org/D103128
2021-05-26 17:16:11 -07:00
Jacob Hegna
324ccfabc2 Update documentation for InlineModel features.
Reviewed By: mtrofin

Differential Revision: https://reviews.llvm.org/D103193
2021-05-26 12:52:28 -07:00