This test case requires unrolling against a non-latch exit in
a multiple-exit loop with exiting latch. It's not covered by
exiting heuristics or the extension in D102635.
If exiting using _Exit or ExitProcess, DLLs are still unloaded
cleanly before exiting, running destructors and other cleanup in those
DLLs. When the caller expects to exit without cleanup, running
destructors in some loaded DLLs (which can be either libLLVM.dll or
e.g. libc++.dll) can cause deadlocks occasionally.
This is an alternative to D102684.
Differential Revision: https://reviews.llvm.org/D102944
These functions were marked unavailable for MSVC targets before,
within an "T.isOSWindows() && !T.isOSCygMing()" block, but these ones
are unavailable on MinGW targets too.
This avoids generating calls to stpcpy for MinGW targets, which has
been happening since 6dbf0cfcf789365493f70ae69df8a7a59be41c75 (in
some cases).
This fixes https://github.com/mstorsjo/llvm-mingw/issues/201.
Differential Revision: https://reviews.llvm.org/D102946
Based on worst case of sandybridge (vs btver2 + bdver2) llvm-mca analysis - which is a lot less than what we were predicting (I think based off total uop count).
When removing an AttrBuilder from an AttributeSet, first check
whether there is any overlap. If nothing is being removed, we can
directly return the original set.
Don't run tasks until their corresponding thread has been added to the running
threads vector. This is an extention to fda4300da82, which doesn't seem to have
been enough to fix the synchronization issues on its own.
Keeping these bitfields from Block to Addressable allows them to be packed with
the bitfields at the end of Addressable, reducing the size of Block by eight
bytes.
Add options -[no-]offload-lto and -foffload-lto=[thin,full] for controlling
LTO for offload compilation. Allow LTO for AMDGPU target.
AMDGPU target does not support codegen of object files containing
call of external functions, therefore the LLVM module passed to
AMDGPU backend needs to contain definitions of all the callees.
An LLVM option is added to allow function importer to import
functions with noinline attribute.
HIP toolchain passes proper LLVM options to lld to make sure
function importer imports definitions of all the callees.
Reviewed by: Teresa Johnson, Artem Belevich
Differential Revision: https://reviews.llvm.org/D99683
This was reverted due to performance regressions in ARM benchmarks,
which have since been addressed by D101196 (SCEV analysis improvement)
and D101778 (CGP reverse transform).
-----
The single-use case is handled implicity by converting the icmp
into a mask check first. When comparing with zero in particular,
we don't need the one-use restriction, as we only produce a single
icmp.
https://alive2.llvm.org/ce/z/MSixcmhttps://alive2.llvm.org/ce/z/GwpG0M
If there are no matrix intrinsics in a function, we can directly bail
out, as there's nothing left to do.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D102931
BTVER2 has a 2 cycle throughput for v4i32 multiplies (same as SSE41 targets), which is only partially hidden by the subvector extracts/insert when splitting v8i32.
Now that getMemoryOpCost() correctly handles all the vector variants,
we should no longer hand-roll our own version of it, but use it directly.
The AVX512 variant probably needs a similar change,
but there it is less obvious.
This was initially landed in 69ed93a4355123a45c1d7216aea7cd53d07a361b,
but was reverted in 6b95fd199d96e3ba5c28a23b17b74203522bdaa8
because the patch it depends on was reverted.
Instead of handling power-of-two sized vector chunks,
try handling the large vector in a stream mode,
decreasing the operational vector size
once it no longer works for the elements left to process.
Notably, this improves costs for overaligned loads - loading padding is fine.
This more directly tracks when we need to insert/extract the YMM/XMM subvector,
some costs fluctuate because of that.
This was initially landed in c02476f3158f2908ef0a6f628210b5380bd33695,
but reverted in 5fddc3312bad7e62493f1605385fad5e589e6450,
because the code made some very optimistic assumptions about invariants
that didn't hold in practice.
Reviewed By: RKSimon, ABataev
Differential Revision: https://reviews.llvm.org/D100684
D29668 enabled to avoid a useless copy of the argument value into an alloca if the caller places it in memory (as it often happens on x86) by directly forwarding the pointer to it. This optimization is illegal if the type contains padding bytes: if a truncating store into the alloca is replaced the upper bits are filled with garbage and produce code misbehaving at runtime.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D102153
Replace use of host floating types with operations on APFloat when it is
possible. Use of APFloat makes analysis more convenient and facilitates
constant folding in the case of non-default FP environment.
Differential Revision: https://reviews.llvm.org/D102672
True is a bad default: the useful symbol names and `@GOTPCREL` are scrubbed.
Change the default and add global variable tests to x86-basic.ll
(renamed from x86_function_name.ll since we now also test variables).
I updated some tests to show the differences.
Updated LCPI regex to include Darwin style `LCPI_[0-9]+_[0-9]+` (no
leading dot).
Reviewed By: pengfei
Differential Revision: https://reviews.llvm.org/D102588
The implementation and intent behind freeing the triple string here is the same
as LLVMGetDefaultTargetTriple (and any other owned c string returned from the C
API), so we should use LLVMDisposeMessage for to free the string for
consistency.
Patch by Mats Larsen -- thanks Mats!
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D102957
D88631 added initial support for:
- -mstack-protector-guard=
- -mstack-protector-guard-reg=
- -mstack-protector-guard-offset=
flags, and D100919 extended these to AArch64. Unfortunately, these flags
aren't retained for LTO. Make them module attributes rather than
TargetOptions.
Link: https://github.com/ClangBuiltLinux/linux/issues/1378
Reviewed By: tejohnson
Differential Revision: https://reviews.llvm.org/D102742
This adds the {s,u,m}badaddr CSR aliases as well as the sptbr alias.
These are for compatibility with binutils. Furthermore, these are used
by the RISC-V Proxy Kernel and are required to enable building the Proxy
Kernel with the LLVM IAS.
The aliases here are deprecated. These are being introduced in order to
provide a compatibility story for the existing GNU toolchain, which
still supports the deprecated spelling in the assembler. However, in
order to encourage the migration of existing coding, we provide warnings
indicating that the aliased CSRs are deprecated and should be replaced.
Differential Revision: https://reviews.llvm.org/D101919
Reviewed By: Craig Topper
These checks already exist as asserts when creating the corresponding
instruction. Anybody creating these instructions already need to take
care to not break these checks.
Move the checks for success/failure ordering in cmpxchg from the
verifier to the LLParser and BitcodeReader plus an assert.
Add some tests for cmpxchg ordering. The .bc files are created from the
.ll files with an llvm-as with these checks disabled.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102803
The option was used during the initial bringup, but it does not add any
value at this point. Remove it.
Reviewed By: anemet
Differential Revision: https://reviews.llvm.org/D102930
Now that gtest has been updated to 1.10 which supports GTEST_SKIP, we can use
that over return;
Patch by Mats Larsen. Thanks Mats!
Reviewed By: lhames, ikudrin
Differential Revision: https://reviews.llvm.org/D102710
BTVER2 has a weaker f64 multiplier that other AVX1-era targets, so we need to bump the worst case cost slightly - llvm-mca reports the new vectorization in simplebb is beneficial on btver2, bdver2 and sandybridge AVX1 targets
Add link to documentation for "AMD Instinct MI100 Instruction Set
Architecture" to AMDGPUUsage.rst.
Reviewed By: kzhuravl, rampitec, dp
Differential Revision: https://reviews.llvm.org/D102859