1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-26 04:32:44 +01:00
Commit Graph

212161 Commits

Author SHA1 Message Date
Hongtao Yu
b26882e846 [CSSPGO] Introducing dangling pseudo probes.
Dangling probes are the probes associated to an empty block. This usually happens when all real instructions are optimized away from the block. There is a problem with dangling probes during the offline counts processing. The way the sample profiler works is that samples collected on the first physical instruction following a probe will be counted towards the probe. This logically equals to treating the instruction next to a probe as if it is from the same block of the probe. In the dangling probe case, the real instruction following a dangling probe actually starts a new block, and samples collected on the new block may cause issues when counted towards the empty block.

To mitigate this issue, we first try to move around a dangling probe inside its owning block. If there are still native instructions preceding the probe in the same block, we can then use them as a place holder to collect samples for the probe. A pass is added to walk each block backwards looking for probes not followed by any real instruction and moving them before the first real instruction. This is done right before the object emission.

If we are unlucky to find such in-block preceding instructions for a probe, the solution we are taking is to tag such probe as dangling so that the samples reported for them will not be trusted by the compiler. We leave it up to the counts inference algorithm to get such probes a reasonable count. The number `UINT64_MAX` is used to mark sample count as collected for a dangling probe.

Reviewed By: wmi

Differential Revision: https://reviews.llvm.org/D95962
2021-03-03 22:44:41 -08:00
Johannes Doerfert
97088b4db9 [Docs] Remove no-aa from the alias analysis documentation
The `no-aa` pass has been removed with 7b560d40bddf.

Differential Revision: https://reviews.llvm.org/D95416
2021-03-04 00:35:52 -06:00
Johannes Doerfert
c5b0326e2f [Attributor] Make DepClass a required argument
We often used a sub-optimal dependence class in the past because we
didn't see the argument. Let's make it explicit so we remember to think
about it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert
fa902e7024 [Attributor] Fold "TrackDependence" into the DepClassTy enum
We don't need a bool and an enum to express the three options we
currently have. This makes the interface nicer and much easier to
use optional dependencies. Also avoids mistakes where the bool is
false and enum ignored.
2021-03-04 00:35:52 -06:00
Johannes Doerfert
45e74bf95c [Attributor] Avoid work for GEPs and wait till the users are visited 2021-03-04 00:35:52 -06:00
Johannes Doerfert
d5abc69606 [Attributor] Use known alignment as lower bound to avoid work
If we know already more than available from a use, we don't need to
invest time on it.
2021-03-04 00:35:52 -06:00
Johannes Doerfert
c92d3580fd [Attributor][NFC] Move some trivial checks up 2021-03-04 00:35:52 -06:00
Johannes Doerfert
a8bee9c76c [Attributor] Use sensible initialization in AANoCaptureCallSiteReturned 2021-03-04 00:35:51 -06:00
Evgeniy Brevnov
f31243de74 [DSE] Add support for not aligned begin/end
This is an attempt to improve handling of partial overlaps in case of unaligned begin\end.

Existing implementation just bails out if it encounters such cases. Even when it doesn't I believe existing code checking alignment constraints is not quite correct. It tries to ensure alignment of the "later" start/end offset while should be preserving relative alignment between earlier and later start/end.

The idea behind the change is simple. When start/end is not aligned as we wish instead of bailing out let's adjust it as necessary to get desired alignment.

I'll update with performance results as measured by the test-suite...it's still running...

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D93530
2021-03-04 12:24:23 +07:00
Serguei Katkov
36bf8f45fb [InstCombine] Move statepoint intrinsic handling from visitCall to visitCallBase
statepoint intrinsic can be used in invoke context,
so it should be handled in visitCallBase to cover both call and invoke.

Reviewers: reames, dantrushin
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D97833
2021-03-04 11:00:22 +07:00
Wang, Pengfei
2e2e287013 Add Windows ehcont section support (/guard:ehcont).
Add option /guard:ehcont

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D96709
2021-03-04 11:47:29 +08:00
Juneyoung Lee
09d28f9368 [LangRef] fix undefined label 2021-03-04 10:12:57 +09:00
Juneyoung Lee
f861c629c1 [LangRef] Make lifetime intrinsic's semantics consistent with StackColoring's comment
This patch is an update to LangRef by describing lifetime intrinsics' behavior
by following the description of MIR's LIFETIME_START/LIFETIME_END markers
at StackColoring.cpp (eb44682d67/llvm/lib/CodeGen/StackColoring.cpp (L163)) and the discussion in llvm-dev.

In order to explicitly define the meaning of an object lifetime, I added 'Object Lifetime' subsection.

Reviewed By: nlopes

Differential Revision: https://reviews.llvm.org/D94002
2021-03-04 09:58:06 +09:00
Fangrui Song
25d19a9795 [IRSymTab] Set FB_used on llvm.compiler.used symbols
IR symbol table does not parse inline asm. A symbol only referenced by inline
asm is not in the IR symbol table, so LTO does not know that the definition (in
another translation unit) is referenced and may internalize it, even if that
definition has `__attribute__((used))` (which lowers to `llvm.compiler.used` on
ELF targets since D97446).

```
// cabac.c
__attribute__((used)) const uint8_t ff_h264_cabac_tables[...] = {...};

// h264_cabac.c
  asm("lea ff_h264_cabac_tables(%rip), %0" : ...);
```

`__attribute__((used))` is the recommended way to tell the compiler there may
be inline asm references, so the usage is perfectly fine. This patch
conservatively sets the `FB_used` bit on `llvm.compiler.used` symbols to work
around the IR symbol table limitation. Note: before D97446, Clang never emitted
symbols in the `llvm.compiler.used` list, so this change does not punish any
Clang emitted global object.

Without the patch, `ff_h264_cabac_tables` may be assigned to a non-external
partition and get internalized. Then we will get a linker error because the
`cabac.c` definition is not exposed.

Differential Revision: https://reviews.llvm.org/D97755
2021-03-03 16:22:30 -08:00
Xun Li
a7cf9dd738 [LICM][Coroutine] Don't sink stores from loops with coro.suspend instructions
See pr46990(https://bugs.llvm.org/show_bug.cgi?id=46990). LICM should not sink store instructions to loop exit blocks which cross coro.suspend intrinsics. This breaks semantic of coro.suspend intrinsic which return to caller directly. Also this leads to use-after-free if the coroutine is freed before control returns to the caller in multithread environment.

This patch disable promotion by check whether loop contains coro.suspend intrinsics.
This is a resubmit of D86190.
Disabling LICM for loops with coroutine suspension is a better option not only for correctness purpose but also for performance purpose.
In most cases LICM sinks memory operations. In the case of coroutine, sinking memory operation out of the loop does not improve performance since coroutien needs to get data from the frame anyway. In fact LICM would hurt coroutine performance since it adds more entries to the frame.

Differential Revision: https://reviews.llvm.org/D96928
2021-03-03 15:21:57 -08:00
Fangrui Song
e0a172f86d [test] Fix profiling.ll
`__llvm_prf_nm` is compressed if zlib is available. In addition, its size may not be that stable.
2021-03-03 15:18:44 -08:00
Jin Lin
b34db73957 Add the use of register r for outlined function when register r is live in and defined later.
The compiler needs to mark register $x0 as live in for the following case.

    $x1 = ADDXri $sp, 16, 0
    BL @spam, csr_darwin_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit killed $x1, implicit-def $sp, implicit-def dead $x0

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95267
2021-03-03 15:14:11 -08:00
George Balatsouras
a99b153a84 [dfsan] Remove hard-coded shadow width in more tests
As a preparation step for fast8 support, we need to update the tests
to pass in both modes. That requires generalizing the shadow width
and remove any hard coded references that assume it's always 2 bytes.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D97884
2021-03-03 15:05:16 -08:00
Sanjay Patel
f2517ef43e [Analysis] simplify propagation of FMF in recurrences; NFC
This is a mess, but this is hopefully no-functional-change.
The 'Prev' descriptor is only used for min/max recurrences
or when starting a match from a phi, so it should not be a
factor when propagating FMF for fmul/fadd.

The API is confusing (and should be reduced in subsequent steps)
because the "UnsafeAlgebraInst" appears to actually be a placeholder
for a recurrence that does NOT have FMF, but we still want to
treat it as reassociative.
2021-03-03 17:28:10 -05:00
Stefan Gränitz
ee07f5b459 [lli] Add JITLink link component after 99a6d003edbe 2021-03-03 23:14:26 +01:00
Florian Hahn
4b2c4a9325 [AArch64] Add implicit uses for operands when expanding BLR_RVMARKER.
Make sure we preserve info about passed arguments as implicit uses, to
make sure later passes still have access to this information.

This fixes a mis-compile where the machine-combiner would pick an
incorrect free register.
2021-03-03 21:56:05 +00:00
Stefan Gränitz
4479241a3b Revert "hack to unbreak check-llvm on win after D97335" in attempt for actual fix
This reverts commit 900f076113302e26e1939541b546b0075e3e9721 and attempts an actual fix: All failing tests for llvm-jitlink use the `-noexec` flag. The inputs they operate on are not meant for execution on the host system. Looking e.g. at the MachO_test_harness_harnesss.s test, llvm-mc generates input machine code with "x86_64-apple-macosx10.9".

My previous attempt in bbdb4c8c9bcef0e8db751630accc04ad874f54e7 disabled the debug support plugin for Windows targets, but what we would actually want is to disable it on Windows HOSTS.

With the new patch here, I don't do exactly that, but instead follow the approach for the EH frame plugin and include the `-noexec` flag in the condition. It should have the desired effect when it comes to the test suite. It appears a little workaround'ish, but should work reliably for now. I will discuss the issue with Lang and see if we can do better. Thanks @thakis again for the temporary fix.
2021-03-03 22:35:36 +01:00
Whitney Tsang
9ec6c6eade [LoopUnrollRuntime] Add option to assume the non latch exit block to be
predictable.

Reviewed By: Meinersbur, bmahjour

Differential Revision: https://reviews.llvm.org/D97747
2021-03-03 20:43:31 +00:00
Alexey Bataev
73ba017fcc [Cost]Add tests for boolean and/or reductions, NFC.
Tests with the default costs for boolean and/or reductions.

Differential Revision: https://reviews.llvm.org/D97793
2021-03-03 12:34:30 -08:00
Philip Reames
db90a04542 Address review comment from D97219 (follow up to 8051156)
Probably should have done this before landing, but I forgot.

Basic idea is to avoid using the SCEV predicate when it doesn't buy us anything.  Also happens to set us up for handling non-add recurrences in the future if desired.
2021-03-03 12:20:27 -08:00
Philip Reames
5c3ba57c9e Sink routine for replacing a operand bundle to CallBase [NFC]
We had equivalent code for both CallInst and InvokeInst, but never cared about the result type.
2021-03-03 12:07:55 -08:00
Philip Reames
d2f05a31bd [LSR] Unify scheduling of existing and inserted addrecs
LSR goes to some lengths to schedule IV increments such that %iv and %iv.next never need to overlap. This is fairly fundamental to LSRs cost model. LSR assumes that an addrec can be represented with a single register. If %iv and %iv.next have to overlap, then that assumption does not hold.

The bug - which this patch is fixing - is that LSR only does this scheduling for IVs which it inserts, but it's cost model assumes the same for existing IVs that it reuses. It will rewrite existing IV users such that the no-overlap property holds, but will not actually reschedule said IV increment.

As you can see from the relatively lack of test updates, this doesn't actually impact codegen much. The main reason for doing it is to make a follow up patch series which improves post-increment use and scheduling easier to follow.

Differential Revision: https://reviews.llvm.org/D97219
2021-03-03 12:07:55 -08:00
Jonas Paulsson
d57759f053 [SystemZ] Reimplement the i8/i16 compare-and-swap logic.
Even though the implementation in emitAtomicCmpSwapW() was correct, it made
Valgrind report an error. Instead of using a RISBG on CmpVal, an LL[CH]R can
be made on the OldVal, and the problem is avoided.

Review: Ulrich Weigand

Differential Revision: https://reviews.llvm.org/D97604
2021-03-03 14:04:32 -06:00
Florian Hahn
620203451b [AArch64] Move CALL_RVMARKER definition after CALL.
This is a NFC with respect to the generated code. But it fixes a crash
when using -debug, because of the position in the enum CALL_RVMARKER
nodes were treated as memops. That caused a crash when printing
CALL_RVMARKER nodes.
2021-03-03 19:42:16 +00:00
Fangrui Song
6ef5900ddb [InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF
`__llvm_prf_vnodes` and `__llvm_prf_names` are used by runtime but not
referenced via relocation in the translation unit.

With `-z start-stop-gc` (LLD 13 (D96914); GNU ld 2.37 https://sourceware.org/bugzilla/show_bug.cgi?id=27451),
the linker does not let `__start_/__stop_` references retain their sections.

Place `__llvm_prf_vnodes` and `__llvm_prf_names` in `llvm.used` to make
them retained by the linker.

This patch changes most existing `UsedVars` cases to `CompilerUsedVars`
to reflect the ideal state - if the binary format properly supports
section based GC (dead stripping), `llvm.compiler.used` should be sufficient.

`__llvm_prf_vnodes` and `__llvm_prf_names` are switched to `UsedVars`
since we want them to be unconditionally retained by both compiler and linker.

Behaviors on COFF/Mach-O are not affected.

Reviewed By: davidxl

Differential Revision: https://reviews.llvm.org/D97649
2021-03-03 11:32:24 -08:00
Fangrui Song
d5a2734874 [test] Improve PGO tests 2021-03-03 11:32:24 -08:00
George Balatsouras
000167f2ef [dfsan] Remove hardcoded shadow width in abilist_aggregate.ll
As a preparation step for fast8 support, we need to update the tests
to pass in both modes. That requires generalizing the shadow width
and remove any hard coded references that assume it's always 2 bytes.

Reviewed By: stephan.yichao.zhao

Differential Revision: https://reviews.llvm.org/D97723
2021-03-03 11:12:59 -08:00
Stanislav Mekhanoshin
2b0a0e51ae [AMDGPU] Exclude always_inline from max bb threshold
Honor always_inline attribute when processing -amdgpu-inline-max-bb.
It was lost during the ports of the heuristic. There is no reason
to honor inline hint, but not always inline.

Differential Revision: https://reviews.llvm.org/D97790
2021-03-03 10:21:56 -08:00
Hongtao Yu
6997a9ef05 [CSSPGO][llvm-profgen] Continue disassembling after illegal instruction is seen.
Previously we errored out when disassembling illegal instructions and there would be no profile generated. In fact illegal instructions are not uncommon and we'd better skip them and print "unknown" instead of erroring out. This matches the behavior of llvm-objdump (see disassembleObject in llvm-objdump.cpp).

Reviewed By: wlei, wenlei

Differential Revision: https://reviews.llvm.org/D97776
2021-03-03 10:14:10 -08:00
Choongwoo Han
c78f1b079e [llvm-cov] Cache file status information
Currently, getSourceFile accesses file system to check if two paths are
the same file with a thread lock, which is a huge performance bottleneck
in some cases. Currently, it's accessing file system size(files) * size(files) times.

Thus, cache file status information, which reduces file system access to size(files) times.

When I tested it with two binaries and 16 cpu cores,
it saved over 70% of time.

Binary 1: 56 secs -> 3 secs
Binary 2: 17 hours -> 4 hours

Differential Revision: https://reviews.llvm.org/D97061
2021-03-03 10:04:07 -08:00
Philip Reames
238c95fc20 Fix a build warning from ea7d208 2021-03-03 09:16:56 -08:00
Philip Reames
05b6860c6e [basicaa] Rewrite isGEPBaseAtNegativeOffset in terms of index difference [mostly NFC]
This is almost purely NFC, it just fits more obviously in the flow of the code now that we've standardized on the index different approach.  The non-NFC bit is that because of canceling the VariableOffsets in the subtract, we can now handle the case where both sides involve a common variable offset.  This isn't an "interesting" improvement; it just happens to fall out of the natural code structure.

One subtle point - the placement of this above the BaseAlias check is important in the original code as this can return NoAlias even when we can't find a relation between the bases otherwise.

Also added some enhancement TODOs noticed while understanding the existing code.

Note: This is slightly different than the LGTMed version.  I fixed the "inbounds" issue Nikita noticed with the original code in e6e5ef4 and rebased this to include the same fix.

Differential Revision: https://reviews.llvm.org/D97520
2021-03-03 09:03:28 -08:00
Baptiste Saleil
8c45c9b8dd [VirtRegRewriter] Insert missing killed flags when tracking subregister liveness
VirtRegRewriter may sometimes fail to correctly apply the kill flag where necessary,
which causes unecessary code gen on PowerPC. This patch fixes the way masks for
defined lanes are computed and the way mask for used lanes is computed.

Contact albion.fung@ibm.com instead of author for problems related to this commit.

Differential Revision: https://reviews.llvm.org/D92405
2021-03-03 12:02:04 -05:00
Philip Reames
a40e17a3fc [basicaa] Fix a latent bug in isGEPBaseAtNegativeOffset
This was pointed out in review of D97520 by Nikita, but existed in the original code as well.

The basic issue is that a decomposed GEP expression describes (potentially) more than one getelementptr.  The "inbounds" derived UB which justifies this aliasing rule requires that the entire offset be composed of "inbounds" geps.  Otherwise, as can be seen in the recently added and changes in this patch test, we can end up with a large commulative offset with only a small sub-offset actually being "inbounds".  If that small sub-offset lies within the object, the result was unsound.

We could potentially be fancier here, but for the moment, simply be conservative when any of the GEPs parsed aren't inbounds.
2021-03-03 08:43:32 -08:00
Philip Reames
ab2305f7e6 [basicaa] Minor indentation fix 2021-03-03 08:33:19 -08:00
Philip Reames
a36885fad4 [tests] Add tests for cases brought up during review of D97520 2021-03-03 08:30:54 -08:00
Arnold Schwaighofer
df3b1a1359 [coro async] Allow a coro.suspend.async to specify which argument is the context argument
Before we used the same argument as the entry point. The resume partial
function might want to use a different ABI for its context argument

Differential Revision: https://reviews.llvm.org/D97333
2021-03-03 08:27:37 -08:00
Simon Pilgrim
45ef6e16a0 [X86] Fold scalar_to_vector(x) -> extract_subvector(broadcast(x),0) iff broadcast(x) exists
Add handling for reusing an existing broadcast(x) to a wider vector.
2021-03-03 15:50:37 +00:00
Nico Weber
2d6479022a Revert "[InstrProfiling] Place __llvm_prf_vnodes and __llvm_prf_names in llvm.used on ELF"
This reverts commit 04c3040f417683e7c31b3ee3381a3263106f48c5.
Breaks instrprof-value-merge.c in bootstrap builds.
2021-03-03 10:21:17 -05:00
Hans Wennborg
c2ea8c4219 Revert "[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR"
This caused miscompiles of Chromium tests for iOS due clobbering of live
registers. See discussion on the code review for details.

> Background:
>
> This fixes a longstanding problem where llvm breaks ARC's autorelease
> optimization (see the link below) by separating calls from the marker
> instructions or retainRV/claimRV calls. The backend changes are in
> https://reviews.llvm.org/D92569.
>
> https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue
>
> What this patch does to fix the problem:
>
> - The front-end adds operand bundle "clang.arc.attachedcall" to calls,
>   which indicates the call is implicitly followed by a marker
>   instruction and an implicit retainRV/claimRV call that consumes the
>   call result. In addition, it emits a call to
>   @llvm.objc.clang.arc.noop.use, which consumes the call result, to
>   prevent the middle-end passes from changing the return type of the
>   called function. This is currently done only when the target is arm64
>   and the optimization level is higher than -O0.
>
> - ARC optimizer temporarily emits retainRV/claimRV calls after the calls
>   with the operand bundle in the IR and removes the inserted calls after
>   processing the function.
>
> - ARC contract pass emits retainRV/claimRV calls after the call with the
>   operand bundle. It doesn't remove the operand bundle on the call since
>   the backend needs it to emit the marker instruction. The retainRV and
>   claimRV calls are emitted late in the pipeline to prevent optimization
>   passes from transforming the IR in a way that makes it harder for the
>   ARC middle-end passes to figure out the def-use relationship between
>   the call and the retainRV/claimRV calls (which is the cause of
>   PR31925).
>
> - The function inliner removes an autoreleaseRV call in the callee if
>   nothing in the callee prevents it from being paired up with the
>   retainRV/claimRV call in the caller. It then inserts a release call if
>   claimRV is attached to the call since autoreleaseRV+claimRV is
>   equivalent to a release. If it cannot find an autoreleaseRV call, it
>   tries to transfer the operand bundle to a function call in the callee.
>   This is important since the ARC optimizer can remove the autoreleaseRV
>   returning the callee result, which makes it impossible to pair it up
>   with the retainRV/claimRV call in the caller. If that fails, it simply
>   emits a retain call in the IR if retainRV is attached to the call and
>   does nothing if claimRV is attached to it.
>
> - SCCP refrains from replacing the return value of a call with a
>   constant value if the call has the operand bundle. This ensures the
>   call always has at least one user (the call to
>   @llvm.objc.clang.arc.noop.use).
>
> - This patch also fixes a bug in replaceUsesOfNonProtoConstant where
>   multiple operand bundles of the same kind were being added to a call.
>
> Future work:
>
> - Use the operand bundle on x86-64.
>
> - Fix the auto upgrader to convert call+retainRV/claimRV pairs into
>   calls with the operand bundles.
>
> rdar://71443534
>
> Differential Revision: https://reviews.llvm.org/D92808

This reverts commit ed4718eccb12bd42214ca4fb17d196d49561c0c7.
2021-03-03 15:51:40 +01:00
Hans Wennborg
145fdab545 revert llvm/include/llvm/Analysis/ObjCARCUtil.h part of 1cc558bd4fa1acd1462226ef4796c712f80ea8e8 2021-03-03 15:50:02 +01:00
Ayke van Laethem
8dc0516b44 [AVR] Fix def state of operands
Some instructions (especially mov+pop instructions) were setting the
wrong operands. For example, the pop instruction had the register set as
a source operand while it is a destination operand (the value is loaded
into the register).

I have found these issues using the machine verifier and using manual
code inspection.

Differential Revision: https://reviews.llvm.org/D97159
2021-03-03 15:36:05 +01:00
Ayke van Laethem
3a40dba5a3 [AVR] Fix expansion of NEGW
The previous expansion used SBCI, which is incorrect because the NEGW
pseudo instruction accepts a DREGS operand (2xGPR8) and SBCI only allows
LD8 registers. One solution could be to correct the NEGW pseudo
instruction, but another solution is to use a different instruction
(sbc) that does accept a GPR8 register and therefore allows more freedom
to the register allocator.

The output now matches avr-gcc for the following code:

    int foo(int n) {
        return -n;
    }

I've found this issue using the machine instruction verifier: it was
complaining about the wrong register class in NEGWRd.mir.

Differential Revision: https://reviews.llvm.org/D97131
2021-03-03 15:36:05 +01:00
Ayke van Laethem
67e3f1aa81 [AVR] Add register aliases XL, YH, etc
These aliases are sometimes used in assembly code and make the code more
readable. They are supported by avr-gcc too.

Differential Revision: https://reviews.llvm.org/D96492
2021-03-03 15:36:05 +01:00
Matt Arsenault
c5b90ec153 GlobalISel: Add default implementation of assignValueToReg
Refactor insertion of the asserting ops. This enables using them for
AMDGPU.

This code should essentially be the same for every target. Mips, X86
and ARM all have different code there now, but this seems to be an
accident. The assignment functions are called with different types
than they would be in the DAG, so this is all likely an assortment of
hacks to get around that.
2021-03-03 09:29:53 -05:00