1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 02:33:06 +01:00
Commit Graph

4758 Commits

Author SHA1 Message Date
Jon Chesterfield
47e63cb60e [openmp] Apply code change from D109500
(cherry picked from commit 71052ea1e3c63b7209731fdc1726d10640d97480)
2021-09-13 20:57:17 -07:00
Sami Tolvanen
8a19c7f52b ThinLTO: Fix inline assembly references to static functions with CFI
Create an internal alias with the original name for static functions
that are renamed in promoteInternals to avoid breaking inline
assembly references to them.

Relands 700d07f8ce6f2879610fd6b6968b05c6f17bb915 with -msvc targets
fixed.

Link: https://github.com/ClangBuiltLinux/linux/issues/1354

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D104058

(cherry picked from commit 7ce1c4da7726577986535cb7766d782f325145fe)
2021-08-24 18:49:13 -07:00
Johannes Doerfert
0bbcb191e3 [Attributor][FIX] Guard constant casts with type size checks
(cherry picked from commit 5f543919b2646d36f2ddc1424acdd555bfcebe4f)
2021-08-16 11:36:30 -07:00
Johannes Doerfert
64a6596867 [Attributor][NFC] Try to make the windows build bots happy
Failed for some reason, potentially because of the inner type
declaration in combination with the `using`. This might help.

Failure:
https://lab.llvm.org/buildbot/#/builders/127/builds/15432
(cherry picked from commit fc32a5c87d9d5aef2c0b27715153fdd45cebd3f3)
2021-08-11 09:15:37 -07:00
Johannes Doerfert
d2cc939747 [Attributor][FIX] Handle recurrences (PHIs) in AAPointerInfo explicitly
PHI nodes are not pass through but change their value, we have to
account for that to avoid missing stores.

Follow up for D107798 to fix PR51249 for good.

Differential Revision: https://reviews.llvm.org/D107808

(cherry picked from commit e7e3585cde0b08152a8cbf54029794d07c15963d)
2021-08-11 09:15:33 -07:00
Johannes Doerfert
e9a9a807dd [Attributor][FIX] Only avoid visiting PHI uses multiple times (PR51249)
AAPointerInfoFloating needs to visit all uses and some multiple times if
we go through PHI nodes. Attributor::checkForAllUses keeps a visited set
so we don't recurs endlessly. We now allow recursion for non-phi uses so
we track all pointer offsets via PHI nodes properly without endless
recursion.

This replaces the first attempt D107579.

Differential Revision: https://reviews.llvm.org/D107798

(cherry picked from commit 96da6dd6ba53bce5dbe822fe968c2b67ba9bc221)
2021-08-11 09:15:31 -07:00
Joseph Huber
4f1fd1c209 [Attributor] Change function internalization to not replace uses in internalized callers
The current implementation of function internalization creats a copy of each
function and replaces every use. This has the downside that the external
versions of the functions will call into the internalized versions of the
functions. This prevents them from being fully independent of eachother. This
patch replaces the current internalization scheme with a method that creates
all the copies of the functions intended to be internalized first and then
replaces the uses as long as their caller is not already internalized.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106931

(cherry picked from commit adbaa39dfce7a8361d89b6a3b382fd8f50b94727)
2021-08-04 16:35:01 -07:00
Jose M Monsalve Diaz
5b7208da36 [OpenMP] Folding threadLimit and numThreads when single value in kernels
The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block`
and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant,
these calls can be folded to the constant value.

In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and
`NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions.
The code checks all the kernels, and if their attributes match, the functions are folded.

In the future we will explore specializing for multiple values of NumThreads and NumTeams.

Depends on D106390

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D106033
2021-07-27 21:47:12 -04:00
Johannes Doerfert
9b53e594b4 Reapply "[Attributor] Disable simplification AAs if a callback is present""
This reapplies commit cbb709e25124dc38ee593882051fc88c987fe591 and
includes the use of the lookup method instead of operator[] to avoid
accidentally setting (empty) simplification callbacks.

This reverts commit aa27430a625b2fd059707a87f8ba2df8f480ff11.
2021-07-27 19:14:50 -05:00
Johannes Doerfert
96f821bd7b Revert "[Attributor] Disable simplification AAs if a callback is present"
This reverts commit cbb709e25124dc38ee593882051fc88c987fe591 as it
breaks the tests, which was not supposed to happen. Investigating now.
2021-07-27 18:09:42 -05:00
Johannes Doerfert
0042081f58 [Attributor] Verify checkForAllUses return value properly
Also do not emit more than one remark after Heap2Stack failed.
2021-07-27 17:50:27 -05:00
Johannes Doerfert
0b6fb7b1ef [Attributor] Disable simplification AAs if a callback is present
AAValueSimplify, AAValueConstantRange, and AAPotentialValues all look at
the IR by default. If queried for a IR position which has a
simplification callback we should either look at the callback return, or
give up. We do the latter for now.
2021-07-27 17:50:26 -05:00
Alexey Zhikhartsev
6516543c4b Add jump-threading optimization for deterministic finite automata
The current JumpThreading pass does not jump thread loops since it can
result in irreducible control flow that harms other optimizations. This
prevents switch statements inside a loop from being optimized to use
unconditional branches.

This code pattern occurs in the core_state_transition function of
Coremark. The state machine can be implemented manually with goto
statements resulting in a large runtime improvement, and this transform
makes the switch implementation match the goto version in performance.

This patch specifically targets switch statements inside a loop that
have the opportunity to be threaded. Once it identifies an opportunity,
it creates new paths that branch directly to the correct code block.
For example, the left CFG could be transformed to the right CFG:

```
          sw.bb                        sw.bb
        /   |   \                    /   |   \
   case1  case2  case3          case1  case2  case3
        \   |   /                /       |       \
        latch.bb             latch.2  latch.3  latch.1
         br sw.bb              /         |         \
                           sw.bb.2     sw.bb.3     sw.bb.1
                            br case2    br case3    br case1
```

Co-author: Justin Kreiner @jkreiner
Co-author: Ehsan Amiri @amehsan

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D99205
2021-07-27 14:34:04 -04:00
Johannes Doerfert
e9073c9b78 [OpenMP] Try to simplify all loads in device code
Eliminating loads/stores in the device code is worth the extra effort,
especially for the new device runtime.

At the same time we do not compute AAExecutionDomain for non-device code
anymore, there is no point.

Differential Revision: https://reviews.llvm.org/D106845
2021-07-27 01:44:15 -05:00
Johannes Doerfert
756885168c [Attributor][FIX] Copy all members in the assignment operator
Also improve debug output slightly.
2021-07-27 01:44:13 -05:00
Johannes Doerfert
7e9917e379 [Attributor] Utilize the InstSimplify interface to simplify instructions
When we simplify at least one operand in the Attributor simplification
we can use the InstSimplify to work on the simplified operands. This
allows us to avoid duplication of the logic.

Depends on D106189

Differential Revision: https://reviews.llvm.org/D106190
2021-07-27 00:56:23 -05:00
wlei
eca1a76d00 [CSSPGO] Tweak ICP threshold in top-down inliner
This change slightly relaxed the current ICP threshold in top-down inliner, specifically always allow one ICP for it. It shows some perf improvements on SPEC and our internal benchmarks. Also renamed the previous flag. We can also try to turn off PGO ICP in the future.

Reviewed By: wenlei, hoy, wmi

Differential Revision: https://reviews.llvm.org/D106588
2021-07-26 21:49:20 -07:00
Johannes Doerfert
4ce16022d3 [Local] Do not introduce a new llvm.trap before unreachable
This is the second attempt to remove the `llvm.trap` insertion after
https://reviews.llvm.org/rGe14e7bc4b889dfaffb7180d176a03311df2d4ae6
reverted the first one. It is not clear what the exact issue was back
then and it might already be gone by now, it has been >5 years after
all.

Replaces D106299.

Differential Revision: https://reviews.llvm.org/D106308
2021-07-26 23:33:36 -05:00
Johannes Doerfert
95f415bd34 [Attributor] Delete dead stores
D106185 allows us to determine if a store is needed easily. Using that
knowledge we can start to delete dead stores.

In AAIsDead we now track more state as an instruction can be dead (= the
old optimisitc state) or just "removable". A store instruction can be
removable while being very much alive, e.g., if it stores a constant
into an alloca or internal global. If we would pretend it was dead
instead of only removablewe we would ignore it when we determine what
values a load can see, so that is not what we want.

Differential Revision: https://reviews.llvm.org/D106188
2021-07-26 23:33:36 -05:00
Johannes Doerfert
5ee3942ae7 [Attributor] Introduce getPotentialCopiesOfStoredValue and use it
This patch introduces `getPotentialCopiesOfStoredValue` which uses
AAPointerInfo to determine all "aliases" or "potential copies" of a
value that is stored into memory. This operation can fail but if it
succeeds it means we can visit all "uses" of a value even if it is
temporarily stored in memory.

There are two users for the function:
  1) `Attributor::checkForAllUses` which will now ignore the value use
     in a store if all "potential copies" can be identified and instead
     be visited. This allows various AAs, including AAPointerInfo
     itself, to look through memory.
  2) `AANoCapture` which uses a custom use tracking through the
     CaptureTracker interface and therefore needs to be thought
     explicitly.

Differential Revision: https://reviews.llvm.org/D106185
2021-07-26 23:33:36 -05:00
Shilei Tian
14491b35e0 [AbstractAttributor] Fold __kmpc_parallel_level if possible
Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible.
Note that `__kmpc_parallel_level` doesn't take activeness into consideration,
based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead
of 0, 129, 130, etc. that also indicate activeness.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106154
2021-07-26 22:46:19 -04:00
Johannes Doerfert
6a849a320d [OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass
While rewriteDeviceCodeStateMachine should probably be folded into
buildCustomStateMachine, we at least need the optimization to happen.
This was not reliably the case in the CGSCC pass but in the Module pass
it seems to work reliably.

This also ports a test to the new kernel encoding (target_init/deinit),
and makes sure we cannot run the kernel in SPMD mode.

Differential Revision: https://reviews.llvm.org/D106345
2021-07-26 21:26:07 -05:00
Johannes Doerfert
d40841959b [Attributor][FIX] Do not return CHANGED unconditionally
This caused us to rerun AAMemoryBehaviorFloating::updateImpl over and
over again. Unfortunately it turned out to be hard to reproduce the
behavior in a reasonable way.
2021-07-26 21:22:02 -05:00
Johannes Doerfert
04130ef1fa [Attributor][FIX] Track change status for AAIsDead properly
If we add a new live edge we need to indicate a change or otherwise the
new live block is not shown to users. Similarly, new known dead ends and
a changed `ToBeExploredFrom` set need to cause us to return CHANGED.
2021-07-26 21:21:59 -05:00
Joseph Huber
71556de73f [OpenMP][NFC] Remove unncessary capture in RAII struct
Summary:
There was an unnecessary variable assigned to the information cache when we
only need it in the constructor to extract the function declaration.
2021-07-26 15:05:55 -04:00
Joseph Huber
daea2dd14a [OpenMP] Introduce RAII to protect certain RTL calls from DCE
This patch introduces a new RAII struct that will temporarily make an OpenMP
RTL function have external linkage. This is done before the attributor is
invoked to prevent it from incorrectly removing some function definitions that
we will use later. For example, if we determine all calls to one function are
dead, because it has internal linkage it can safely be removed. Later when we
try to get an instance to that function to modify the source using
`getOrCreateRuntimeFunction` we will then get an empty declaration for that
function that won't be defined anywhere. This patch prevents this from
occurring.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106707
2021-07-25 14:15:47 -04:00
Nikita Popov
ed5a269ff1 [Attributes] Clean up handling of UB implying attributes (NFC)
Rather than adding methods for dropping these attributes in
various places, add a function that returns an AttrBuilder with
these attributes, which can then be used with existing methods
for dropping attributes. This is with an eye on D104641, which
also needs to drop them from returns, not just parameters.

Also be more explicit about the semantics of the method in the
documentation. Refer to UB rather than Undef, which is what this
is actually about.
2021-07-25 18:21:13 +02:00
Kuter Dinel
a502a514d2 [AMDGPU] Deduce attributes with the Attributor
This patch introduces a pass that uses the Attributor to deduce AMDGPU specific attributes.

Reviewed By: jdoerfert, arsenm

Differential Revision: https://reviews.llvm.org/D104997
2021-07-24 06:07:15 +03:00
Kuter Dinel
bb892cf39f [Attributor][FIX] checkForAllInstructions, correctly handle declarations
checkForAllInstructions was not handling declarations correctly.
It should have been returning false when it gets called on a declaration

The patch also fixes a test case for AAFunctionReachability for it to be able
to pass after the changes to the checkForAllinstructions.

Differential Revision: https://reviews.llvm.org/D106625
2021-07-24 02:21:29 +03:00
Shilei Tian
89d6157c2c [AbstractAttributor] Refine logic to indicate pessimistic fixed point when folding __kmpc_is_spmd_exec_mode
Since we are using assumed information now, the logic should be refined to avoid
unncessary assertion.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106630
2021-07-23 13:36:47 -04:00
Giorgis Georgakoudis
85a5c6ecfe [OpenMPOpt] Move dedup runtime calls after init for target regions
Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106556
2021-07-23 05:54:01 -07:00
Johannes Doerfert
2051c9341b [Attributor] If provided, only look at simplification callbacks not IR
A simplification callback can mean that the IR value is modified beyond
the apparent IR semantics. That is, a `i1 true` could be replaced by an
`i1 false` based on high-level domain-specific information. If a user
provides a simplification callback we will not look at the IR but
instead give up if the callback returns a nullptr.
2021-07-22 23:57:37 -05:00
Giorgis Georgakoudis
6805993080 [Attributor][Fix] Add overrides for AA2HS analysis 2021-07-22 18:20:14 -07:00
Giorgis Georgakoudis
d1dd1d3743 [OpenMP] Use AAHeapToStack/AAHeapToShared analysis in SPMDization
SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D105634
2021-07-22 18:08:37 -07:00
Hongtao Yu
03b82baf95 [CSSPGO] Fix a typo in SampleContextTracker
Fixing a typo in SampleContextTracker to use debug name when debug linkage name is no present. This should only affect C programs.

Saw 0.6% perf win on Cinder which is mostly C code.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D106599
2021-07-22 16:44:50 -07:00
Shilei Tian
04b998f247 [OpenMPOpt] Add support for BooleanStateWithSetVector
D101977 added `BooleanStateWithPtrSetVector` to store pointers to a set meanwhile
tracking boolean state. One of the limitation is that it can only store pointer.
We might want it to store other types of values, such as integer for parallel
level. This patch generalizes the idea and create `BooleanStateWithSetVector`.
`BooleanStateWithPtrSetVector` therefore becomes a type alias of `BooleanStateWithSetVector`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106149
2021-07-22 13:12:29 -04:00
Johannes Doerfert
f303e248b9 [Attributor][FIX] Improve call graph updating
If we remove a non-intrinsic instruction we need to tell the (old) call
graph about it. This caused problems with some features down the line as
they allowed to removed calls more aggressively.
2021-07-22 00:07:56 -05:00
Johannes Doerfert
54c73c71f7 [Attributor][FIX] Do not introduce multiple instances of SSA values
If we have a recursive function we could create multiple instantiations
of an SSA value, one per recursive invocation of the function. This is a
problem as we use SSA value equality in various places. The basic idea
follows from this test:

```
static int r(int c, int *a) {
  int X;
  return c ? r(false, &X) : a == &X;
}

int test(int c) {
  return r(c, undef);
}
```

If we look through the argument `a` we will end up with `X`. Using SSA
value equality we will fold `a == &X` to true and return true even
though it should have been false because `a` and `&X` are from different
instantiations of the function.

Various tests for this  have been placed in value-simplify-instances.ll
and this commit fixes them all by avoiding to produce simplified values
that could be non-unique at runtime. Thus, the result of a simplify
value call will always be unique at runtime or the original value, both
do not allow to accidentally compare two instances of a value with each
other and conclude they are equal statically (pointer equivalence) while
they are unequal at runtime.
2021-07-22 00:07:55 -05:00
Johannes Doerfert
b93c50b86a [Attributor] Improve the Attributor::getAssumedConstant interface
Similar to Attributor::getAssumedSimplified we need to allow IRPs
directly to get the right simplification callback (and context).
2021-07-22 00:07:55 -05:00
Johannes Doerfert
76691e60d6 [OpenMP][FIX] Use name + type checks not only name checks for calls
A call that is analyzed in an optimization needs to be verified against
the name and type of the runtime function to avoid that we look at
arguments that do not exist (anymore). This can happen if the signature
was rewritten. Since we will not set RFI.Declaration if the type doesn't
match we can use it (if it's not null) to determine if the signature is
as expected.

Differential Revision: https://reviews.llvm.org/D106341
2021-07-21 22:51:05 -05:00
Johannes Doerfert
59bc220605 [Attributor][NFC] Clang format 2021-07-21 22:51:05 -05:00
Joseph Huber
191a71d3e8 [OpenMP] Strip NoInline from known OpenMP runtime functions
This patch strips the NoInline attribute from known OpenMP runtime functions.
This is done so that we can denote certain runtime functions as NoInline to
ensure their call sites are intact so they can be checked by OpenMPOpt. We
don't wan't this noinline attribute to remain for any functions after OpenMPOpt
has been run however.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106482
2021-07-21 21:18:26 -04:00
Joseph Huber
4323f25227 [OpenMP] Fold __kmpc_is_generic_main_thread_id if possible
This patch adds the ability to fold `__kmpc_is_generic_main_thread_id` if we
know for a fact that it is executed by the initial thread using
AAExecutionDomain. This combined with folding `__kmpc_is_spmd_exec_mode` will
allow us to fully fold `__kmpc_is_generic_main_thread`.

Depends on D106438 D106437

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106439
2021-07-21 21:18:22 -04:00
Joseph Huber
e6c2e59f71 [OpenMP] Add an option to disable function internalization
Function internalization can sometimes occur in situations where we want to
keep the call sites intact. This patch adds an option to disable function
internalization and prevents the device runtime from being internalized while
creating the bitcode library.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106438
2021-07-21 21:18:18 -04:00
Joseph Huber
2c3ddf5d6f [OpenMP] Add new execution mode for SPMD execution with Generic semantics
Qualified kernels can be transformed from generic-mode to SPMD mode using an
optimization in OpenMPOpt. This patch introduces a new execution mode to
indicate kernels that have been transformed from generic-mode to SPMD-mode.
These kernels have SPMD-mode execution, but need generic-mode semantics for
scheduling the blocks and threads. Without this far too few blocks will be
scheduled for a generic region as SPMD mode expects the trip count to be
divided by the number of threads.

Reviewed By: ggeorgakoudis

Differential Revision: https://reviews.llvm.org/D106460
2021-07-21 20:57:28 -04:00
Giorgis Georgakoudis
7e612fb3a1 [Attributor] Preserve BBs and instructions added in AA manifests
Manifesting AbstractAttributes may add new BBs in the IR. This patch provides an interface to register those BBs in the Attributor so that those BBs and containing instructions are not deleted as dead.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106383
2021-07-21 11:27:00 -07:00
Giorgis Georgakoudis
2b07724890 [Attributor][NFC] Modify isAssumedHeapToStack for const argument
There is no need for a non-const argument interface and the const argument modification covers existing and upcoming use cases.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D106418
2021-07-21 10:28:21 -07:00
Arthur Eubanks
f3b675c071 [NewPM][Inliner] Check if deleted function is in current SCC
In weird cases, the inliner will inline internal recursive functions,
sometimes causing them to have no more uses, in which case the
inliner will mark the function to be deleted. The function is
actually deleted after the call to
updateCGAndAnalysisManagerForCGSCCPass(). In
updateCGAndAnalysisManagerForCGSCCPass(), UR.UpdatedC may be set to
the SCC containing the function to be deleted. Then the inliner calls
CG.removeDeadFunction() which can cause that SCC to be deleted, even
though it's still stored in UR.UpdatedC.

We could potentially check in the wrappers/pass managers if UR.UpdatedC
is in UR.InvalidatedSCCs before doing anything with it, but it's safer
to do this as close to possible to the call to CG.removeDeadFunction()
to avoid issues with allocating a new SCC in the same address as
the deleted one.

It's hard to find a small test case since we need to have recursive
internal functions be reachable from non-internal functions, yet they
need to become non-recursive and not referenced by other functions when
inlined.

Similar to https://reviews.llvm.org/D106306.

Fixes PR50788.

Reviewed By: asbirlea

Differential Revision: https://reviews.llvm.org/D106405
2021-07-21 08:47:45 -07:00
Sami Tolvanen
515d9b0199 Revert "ThinLTO: Fix inline assembly references to static functions with CFI"
This reverts commit 700d07f8ce6f2879610fd6b6968b05c6f17bb915.

Reverting due to a ThinLTO+CFI breakage on -msvc targets.
2021-07-20 13:59:46 -07:00
Fangrui Song
dd6e19a41c [IR] Rename comdat noduplicates to comdat nodeduplicate
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00