Internally the pass skips any function with the optnone attribute. But that still requires checking each function. If the opt level is set to None we might as well just skip putting in the pipeline at all. This what is already done for many of the passes added by TargetPassConfig.
Differential Revision: https://reviews.llvm.org/D92511
When MemCpyOpt performs call slot optimization it will concatenate the `alias.scope` metadata between the function call and the memcpy. However, scoped AA relies on the domains in metadata to be maintained in a caller-callee relationship. Naive concatenation breaks this assumption leading to bad AA results.
The fix is to take the intersection of domains then union the scopes within those domains.
The original bug came from a case of rust bad codegen which uses this bad aliasing to perform additional memcpy optimizations. As show in the added test case `%src` got forwarded past its lifetime leading to a dereference of garbage data.
Testing
ninja check-llvm
Reviewed By: jeroen.dobbelaere
Differential Revision: https://reviews.llvm.org/D91576
This patch removes the variants of DecodeVPERMVMask and
DecodeVPERMV3Mask that take "const Constant *C" as they are not used
anymore.
They were introduced on Sep 8, 2015 in commit
e88038f23517ffc741acfd307ff92e2b1af136d8.
The last use of DecodeVPERMVMask(const Constant *C, ...) was removed
on Feb 7, 2016 in commit 73fc26b44a8591b15f13eaffef17e67161c69388.
The last use of DecodeVPERMV3Mask(const Constant *C, ...) was removed
on May 28, 2018 in commit dcfcfdb0d166fff8388bdd2edc5a2948054c9da1.
Differential Revision: https://reviews.llvm.org/D91926
This is needed for cross-compiling LLVM from Cygwin, but it had gotten
deleted in rG2724d9e12960cc1d93eeabbfc9aa1bffffa041cc
Reviewed By: compnerd
Differential Revision: https://reviews.llvm.org/D92336
Add unit tests for functions in LLVMFrontendOpenACC. As notice in D91470 these functions were not tested
as well as the ones for OpenMP (D91643). This patch add tests for the OpenACC part.
Reviewed By: Meinersbur
Differential Revision: https://reviews.llvm.org/D91653
This also teaches MachO writers/readers about the MachO cpu subtype,
beyond the minimal subtype reader support present at the moment.
This also defines a preprocessor macro to allow users to distinguish
__arm64__ from __arm64e__.
arm64e defaults to an "apple-a12" CPU, which supports v8.3a, allowing
pointer-authentication codegen.
It also currently defaults to ios14 and macos11.
Differential Revision: https://reviews.llvm.org/D87095
When using accumulators in loops, they are passed around in PHI nodes of unprimed
accumulators, causing the generation of additional prime/unprime instructions.
This patch detects these cases and changes these PHI nodes to primed accumulator
PHI nodes. We also add IR and MIR test cases for several PHI node cases.
Differential Revision: https://reviews.llvm.org/D91391
Implement fetch_<op>/fetch_and_<op>/exchange/compare-and-exchange
instructions for BPF. Specially, the following gcc intrinsics
are implemented.
__sync_fetch_and_add (32, 64)
__sync_fetch_and_sub (32, 64)
__sync_fetch_and_and (32, 64)
__sync_fetch_and_or (32, 64)
__sync_fetch_and_xor (32, 64)
__sync_lock_test_and_set (32, 64)
__sync_val_compare_and_swap (32, 64)
For __sync_fetch_and_sub, internally, it is implemented as
a negation followed by __sync_fetch_and_add.
For __sync_lock_test_and_set, despite its name, it actually
does an atomic exchange and return the old content.
https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
For intrinsics like __sync_{add,sub}_and_fetch and
__sync_bool_compare_and_swap, the compiler is able to generate
codes using __sync_fetch_and_{add,sub} and __sync_val_compare_and_swap.
Similar to xadd, atomic xadd, xor and xxor (atomic_<op>)
instructions are added for atomic operations which do not
have return values. LLVM will check the return value for
__sync_fetch_and_{add,and,or,xor}.
If the return value is used, instructions atomic_fetch_<op>
will be used. Otherwise, atomic_<op> instructions will be used.
All new instructions only support 64bit and 32bit with alu32 mode.
old xadd instruction still supports 32bit without alu32 mode.
For encoding, please take a look at test atomics_2.ll.
Differential Revision: https://reviews.llvm.org/D72184
1. Removed #include "...AliasAnalysis.h" in other headers and modules.
2. Cleaned up includes in AliasAnalysis.h.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D92489
New tes cases added. Change var names to avoid the following warning from update_test_checks.py:
WARNING: Change IR value name 'tmp5' to prevent possible conflict with scripted FileCheck name.
Reviewed By: ebrevnov
Differential Revision: https://reviews.llvm.org/D92566
Generate checks with update_test_checks.py in order to simplify upcoming updates.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D92561
This reverts commit 0c9c6ddf17bb01ae350a899b3395bb078aa0c62e.
We are seeing some failures with this patch locally. Not clear
if it's causing them or just triggering a problem in another
place. Reverting while investigating.
This adds a test for the bug
https://bugs.llvm.org/show_bug.cgi?id=47591
Previously, selection dag has a bug which may incorrectly
assume no alias when crossing a lifetime boundary and this
may generate incorrect code as demonstrated in the above bug.
It looks the bug is fixed by https://reviews.llvm.org/D91833.
Basically, when comparing two potential memory access dag nodes,
a store and a lifetime.start,
with the same frame index.
Previously, it may be decided no alias. With the above fix,
these two will be considered aliasing which will prevent
incorrect code scheduling.
Differential Revision: https://reviews.llvm.org/D92451
This is a child diff of D92261.
After supporting field/index-level shadow, the existing shadow with type
i16 works for only primitive types.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92459
PowerPC ISA support the input test for vector type v4f32 and v2f64.
Replace the software compare with hw test will improve the perf.
Reviewed By: ChenZheng
Differential Revision: https://reviews.llvm.org/D90914
llvm-link should not rely on the '.a' file extension when deciding if input file
should be loaded as archive. Archives may have other extensions (f.e. .lib) or no
extensions at all. This patch changes llvm-link to use llvm::file_magic to check
if input file is an archive.
Reviewed By: RaviNarayanaswamy
Differential Revision: https://reviews.llvm.org/D92376
While I was adding a new intrinsic instruction (not overloaded), I accidentally used CreateUnaryIntrinsic to create the intrinsics, which turns out to be passing the type list to getName, and ended up naming the intrinsics function with type suffix, which leads to wierd bugs latter on. It took me a long time to debug.
It seems a good idea to add an assertion in getName so that it fails if types are passed but it's not a overloaded function.
Also, the overloade version of getName is less efficient because it creates an std::string. We should avoid calling it if we know that there are no types provided.
Differential Revision: https://reviews.llvm.org/D92523
llvm-link should not rely on the '.a' file extension when deciding if input file
should be loaded as archive. Archives may have other extensions (f.e. .lib) or no
extensions at all. This patch changes llvm-link to use llvm::file_magic to check
if input file is an archive.
Reviewed By: RaviNarayanaswamy
Differential Revision: https://reviews.llvm.org/D92376
Instead of computing the alignment and size of the `char` buffer in
`AlignedCharArrayUnion`, rely on the math in `std::aligned_union_t`.
Because some users of this rely on the `buffer` field existing with a
type convertible to `char *`, we can't change the field type, but we can
still avoid duplicating the logic.
A potential follow up would be to delete `AlignedCharArrayUnion` after
updating its users to use `std::aligned_union_t` directly; or if we like
our template parameters better, could update users to stop peeking
inside and then replace the definition with:
```
template <class T, class... Ts>
using AlignedCharArrayUnion = std::aligned_union_t<1, T, Ts...>;
```
Differential Revision: https://reviews.llvm.org/D92500
`AlignedArrayCharUnion` is now using `alignas`, which is properly
supported now by all the host toolchains we support. As a result, the
extra `alignas` on `IntervalMap` isn't needed anymore.
This is effectively a revert of 379daa29744cd96b0a87ed0d4a010fa4bc47ce73.
Differential Revision: https://reviews.llvm.org/D92509
Revert "Delete llvm::is_trivially_copyable and CMake variable HAVE_STD_IS_TRIVIALLY_COPYABLE"
This reverts commit 4d4bd40b578d77b8c5bc349ded405fb58c333c78.
This reverts commit 557b00e0afb2dc1776f50948094ca8cc62d97be4.
LLVM has TLS_(base_)addr32 for 32-bit TLS addresses in 32-bit mode, and
TLS_(base_)addr64 for 64-bit TLS addresses in 64-bit mode. x32 mode wants 32-bit
TLS addresses in 64-bit mode, which were not yet handled. This adds
TLS_(base_)addrX32 as copies of TLS_(base_)addr64, except that they use
tls32(base)addr rather than tls64(base)addr, and then restricts
TLS_(base_)addr64 to 64-bit LP64 mode, TLS_(base_)addrX32 to 64-bit ILP32 mode.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D92346
Since x32 supports PC-relative address, it shouldn't use EBX for TLS
address. Instead of checking N.getValueType(), we should check
Subtarget->is32Bit(). This fixes PR 22676.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D16474
An indirect call site needs to be probed for its potential call targets. With CSSPGO a direct call also needs a probe so that a calling context can be represented by a stack of callsite probes. Unlike pseudo probes for basic blocks that are in form of standalone intrinsic call instructions, pseudo probes for callsites have to be attached to the call instruction, thus a separate instruction would not work.
One possible way of attaching a probe to a call instruction is to use a special metadata that carries information about the probe. The special metadata will have to make its way through the optimization pipeline down to object emission. This requires additional efforts to maintain the metadata in various places. Given that the `!dbg` metadata is a first-class metadata and has all essential support in place , leveraging the `!dbg` metadata as a channel to encode pseudo probe information is probably the easiest solution.
With the requirement of not inflating `!dbg` metadata that is allocated for almost every instruction, we found that the 32-bit DWARF discriminator field which mainly serves AutoFDO can be reused for pseudo probes. DWARF discriminators distinguish identical source locations between instructions and with pseudo probes such support is not required. In this change we are using the discriminator field to encode the ID and type of a callsite probe and the encoded value will be unpacked and consumed right before object emission. When a callsite is inlined, the callsite discriminator field will go with the inlined instructions. The `!dbg` metadata of an inlined instruction is in form of a scope stack. The top of the stack is the instruction's original `!dbg` metadata and the bottom of the stack is for the original callsite of the top-level inliner. Except for the top of the stack, all other elements of the stack actually refer to the nested inlined callsites whose discriminator field (which actually represents a calliste probe) can be used together to represent the inline context of an inlined PseudoProbeInst or CallInst.
To avoid collision with the baseline AutoFDO in various places that handles dwarf discriminators where a check against the `-pseudo-probe-for-profiling` switch is not available, a special encoding scheme is used to tell apart a pseudo probe discriminator from a regular discriminator. For the regular discriminator, if all lowest 3 bits are non-zero, it means the discriminator is basically empty and all higher 29 bits can be reversed for pseudo probe use.
Callsite pseudo probes are inserted in `SampleProfileProbePass` and a target-independent MIR pass `PseudoProbeInserter` is added to unpack the probe ID/type from `!dbg`.
Note that with this work the switch -debug-info-for-profiling will not work with -pseudo-probe-for-profiling anymore. They cannot be used at the same time.
Reviewed By: wmi
Differential Revision: https://reviews.llvm.org/D91756
At D92261, this type will be used to cache both combined shadow and
converted shadow values.
Reviewed-by: morehouse
Differential Revision: https://reviews.llvm.org/D92458