`TargetFrameLowering::emitCalleeSavedFrameMoves` with 4 arguments is not
used anywhere in CodeGen. Thus it shouldn't be exposed as a virtual
function. NFC.
Differential Revision: https://reviews.llvm.org/D103328
This patch updates llvm-dwp to include rnglists and loclists
when parsing debug sections.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101894
This patch adds support for DWARFv5 type units: parsing from
the .debug_info section, and writing index to the type unit index.
Previously, the type units were part of the .debug_types section
which is no longer used in DWARFv5.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D101818
This patch adds general support for DWARFv5 index writing.
In particular, this means only allowing inputs with one version,
either DWARFv5 or DWARFv4.
This patch adds the .debug_macro section as an example,
but the DWARFv5 type support and loc and rangelists are still
missing (and upcoming).
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102315
This extends 434c8e013a2c and ede3982792df to handle signed
predicates by sign-extending the setcc operands.
This is not shown directly in https://llvm.org/PR50055 ,
but the pattern is visible by changing the unsigned convert
to signed in the source code.
This patch makes llvm-dwp skip debug info sections that may not be encoding a compile unit.
In DWARF5, debug info sections are also used for type units. As in preparation to support type units,
make llvm-dwp aware of other uses of debug info sections but skip them for now.
The patch first records all .debug_info sections, then goes through them one by one and records
the cu debug info section for writing the index unit, and copies that section to the final dwp output
info section. If it's not a compile unit, skip.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D102312
Without this change, a callsite like:
[[clang::musttail]] return func_call(x);
will cause an error like:
fatal error: error in backend: failed to perform tail call elimination
on a call site marked musttail
due to DFSan inserting instrumentation between the musttail call and
the return.
Reviewed By: stephan.yichao.zhao
Differential Revision: https://reviews.llvm.org/D103542
The buildbot ppc64le-lld-multistage-test has been failing because the variable
Tag in Waymaking.h is set but not used. This patch removes that varaible.
This patch was split from https://reviews.llvm.org/D102246
[SampleFDO] New hierarchical discriminator for Flow Sensitive SampleFDO
This is mainly for ProfileData part of change. It will load
FS Profile when such profile is detected. For an extbinary format profile,
create_llvm_prof tool will add a flag to profile summary section.
For other format profiles, the users need to use an internal option
(-profile-isfs) to tell the compiler that the profile uses FS discriminators.
This patch also simplified the bit API used by FS discriminators.
Differential Revision: https://reviews.llvm.org/D103041
This is a follow-up to D103280 that eases the use restrictions,
so we can handle the motivating case from:
https://llvm.org/PR50055
The loop code is adapted from similar use checks in
ExtendUsesToFormExtLoad() and SliceUpLoad(). I did not see an
easier way to filter out non-chain uses of load values.
Differential Revision: https://reviews.llvm.org/D103462
Also adds support for live_support sections, no_dead_strip sections,
.no_dead_strip symbols.
Chromium Framework 345MB unstripped -> 250MB stripped
(vs 290MB unstripped -> 236M stripped with ld64).
Doing dead stripping is a bit faster than not, because so much less
data needs to be processed:
% ministat lld_*
x lld_nostrip.txt
+ lld_strip.txt
N Min Max Median Avg Stddev
x 10 3.929414 4.07692 4.0269079 4.0089678 0.044214794
+ 10 3.8129408 3.9025559 3.8670411 3.8642573 0.024779651
Difference at 95.0% confidence
-0.144711 +/- 0.0336749
-3.60967% +/- 0.839989%
(Student's t, pooled s = 0.0358398)
This interacts with many parts of the linker. I tried to add test coverage
for all added `isLive()` checks, so that some test will fail if any of them
is removed. I checked that the test expectations for the most part match
ld64's behavior (except for live-support-iterations.s, see the comment
in the test). Interacts with:
- debug info
- export tries
- import opcodes
- flags like -exported_symbol(s_list)
- -U / dynamic_lookup
- mod_init_funcs, mod_term_funcs
- weak symbol handling
- unwind info
- stubs
- map files
- -sectcreate
- undefined, dylib, common, defined (both absolute and normal) symbols
It's possible it interacts with more features I didn't think of,
of course.
I also did some manual testing:
- check-llvm check-clang check-lld work with lld with this patch
as host linker and -dead_strip enabled
- Chromium still starts
- Chromium's base_unittests still pass, including unwind tests
Implemenation-wise, this is InputSection-based, so it'll work for
object files with .subsections_via_symbols (which includes all
object files generated by clang). I first based this on the COFF
implementation, but later realized that things are more similar to ELF.
I think it'd be good to refactor MarkLive.cpp to look more like the ELF
part at some point, but I'd like to get a working state checked in first.
Mechanical parts:
- Rename canOmitFromOutput to wasCoalesced (no behavior change)
since it really is for weak coalesced symbols
- Add noDeadStrip to Defined, corresponding to N_NO_DEAD_STRIP
(`.no_dead_strip` in asm)
Fixes PR49276.
Differential Revision: https://reviews.llvm.org/D103324
During Loop Strength Reduce, if the terminating condition for the loop
is not immediately adjacent to the terminating branch and it has more
than one use, a clone of the condition will be created just before the
terminating branch and will be used as the branch condition. Currently,
whether the instructions are "immediately adjacent" is determined by
checking whether the next instruction after the condition is the
terminating branch; this is incorrect however, as the presence of a
debug intrinsic between the two will result in a change to the output.
This is fixed by using getNextNonDebugInstruction() instead.
Differential Revision: https://reviews.llvm.org/D103033
Transfer the swiftasync attribute to the resume partial function according to
suspend.async specification. It's first argument denotes which argument is the
async context.
rdar://71499498
Differential Revision: https://reviews.llvm.org/D103285
Add getDemandedBits method for uses so we can query demanded bits for each use. This can help getting better use information. For example, for the code below
define i32 @test_use(i32 %a) {
%1 = and i32 %a, -256
%2 = or i32 %1, 1
%3 = trunc i32 %2 to i8 (didn't optimize this to 1 for illustration purpose)
... some use of %3
ret %2
}
if we look at the demanded bit of %2 (which is all 32 bits because of the return), we would conclude that %a is used regardless of how its return is used. However, if we look at each use separately, we will see that the demanded bit of %2 in trunc only uses the lower 8 bits of %a which is redefined, therefore %a's usage depends on how the function return is used.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D97074
This patch uses the calculated maximum scalable VFs to build VPlans,
cost them and select a suitable scalable VF.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D98722
A recent patch:
https://reviews.llvm.org/rGe0921655b1ff8d4ba7c14be59252fe05b705920e
changed clangs AIX bitfield handling to use 4-byte bitfield containers,
matching XLs behavior. This change triggers static assert failures when
bootstrapping. Change the macro we check to enable bitfield packing on
AIX to `__clang__` which is defined by both xlclang and clang.
Differential Revision: https://reviews.llvm.org/D103474
llvm::getLoadStoreType was added recently and has the same implementation
as 'getMemInstValueType' in LoopVectorize.cpp. Since there is no
value in having two implementations, this patch removes the custom LV
implementation in favor of the generic one defined in Instructions.h.
This is a pre-commit of a test case D99439 which is a patch that
updates @llvm.powi to handle different int sizes for the exponent.
Problem is that @llvm.powi is used as an IR construct that maps
to RT libcalls to __powi* functions, and those lib functions depend
on sizeof(int) to use correct type for the exponent.
The test cases show that we use i32 for the powi expenent, which
later would result in wrong type being used in libcalls (miscompile).
But there are also a couple of the negative test cases that show
that we rewrite into using powi when having a uitofp conversion
from i16, which would be wrong when doing the libcall as an
"unsigned int" isn't guaranteed to fit inside the "int" argument
in the called libcall function.
Differential Revision: https://reviews.llvm.org/D102919
Use RuntimeLibcalls to get a common way to pick correct RTLIB::POWI_*
libcall for a given value type.
This includes a small refactoring of ExpandFPLibCall and
ExpandArgFPLibCall in SelectionDAGLegalize to share a bit of code,
plus adding an ExpandFPLibCall version that can be called directly
when expanding FPOWI/STRICT_FPOWI to ensure that we actually use
the same RTLIB::Libcall when expanding the libcall as we used when
checking the legality of such a call by doing a getLibcallName check.
Differential Revision: https://reviews.llvm.org/D103050
The FPOWI DAG node is normally lowered to a libcall to one of the
RTLIB::POWI* runtime functions and the exponent should normally
have a type matching sizeof(int) when making the call. Thus,
type promotion of the exponent could lead to an FPOWI with a type
for the second operand that would be incorrect when doing the
libcall (a situation which would be hard to detect post-legalization
if we allow such FPOWI nodes).
This patch is changing DAGTypeLegalizer::PromoteIntOp_FPOWI to
do the rewrite into a libcall directly instead of promoting the
operand. This way we can check that the exponent is smaller than
sizeof(int) and we can let TargetLowering handle promotion as
part of making the libcall. It could be noticed here that makeLibCall
has some knowledge about targets such as 64-bit RISCV, for which the
libcall argument should be extended to a type larger than sizeof(int).
Differential Revision: https://reviews.llvm.org/D102950
When rewriting
powf(2.0, itofp(x)) -> ldexpf(1.0, x)
exp2(sitofp(x)) -> ldexp(1.0, sext(x))
exp2(uitofp(x)) -> ldexp(1.0, zext(x))
the wrong type was used for the second argument in the ldexp/ldexpf
libc call, for target architectures with 16 bit "int" type.
The transform incorrectly used a bitcasted function pointer with
a 32-bit argument when emitting the ldexp/ldexpf call for such
targets.
The fault is solved by using the correct function prototype
in the call, by asking TargetLibraryInfo about the size of "int".
TargetLibraryInfo by default derives the size of the int type by
assuming that it is 16 bits for 16-bit architectures, and
32 bits otherwise. If this isn't true for a target it should be
possible to override that default in the TargetLibraryInfo
initializer.
Differential Revision: https://reviews.llvm.org/D99438
RVV vectors must be aligned to their element types, so anything less is
unaligned.
For regular loads and stores, our custom-lowering of fixed-length
vectors meant that we opted out of LegalizeDAG's built-in unaligned
expansion. This patch adds that logic in to our custom lower function.
For masked intrinsics, we declare that anything unaligned is not legal,
leaving the ScalarizeMaskedMemIntrin pass to do the expansion for us.
Note that neither of these methods can handle the expansion of
scalable-vector memory ops, so those cases are left alone by this patch.
Scalable loads and stores already go through expansion by default but
hit an assertion, and scalable masked intrinsics will silently generate
incorrect code. It may be prudent to return an error in both of these
cases.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D102493
D85085 was pushed earlier but broke tests on mac and win:
http://lab.llvm.org:8080/green/job/clang-stage1-RA/21182/consoleFull#-706149783d489585b-5106-414a-ac11-3ff90657619c
Recommitting it after adding mtriple to the llc commands.
Emit correct location lists with basic block sections.
This patch addresses multiple things:
1) It ensures that const_value is emitted when possible with basic block
sections.
2) It emits location lists such that the labels are always within the
section boundary.
3) It fixes a bug when the parameter is first used in a non-entry block
which is in a different section from the entry block.
Differential Revision: https://reviews.llvm.org/D85085
The first source has the same EEW as the destination, but we're
using earlyclobber which prevents them from ever being the same
register.
To workaround this, add a special TIED pseudo to use whenever the
first source and merge operand are the same value. This allows
us to use a single operand for the merge operand and first source
which we can then tie to the destination. A tied source disables
earlyclobber for that operand.
Reviewed By: arcbbb
Differential Revision: https://reviews.llvm.org/D103211
This patch uses the `getSymbolIndexForFunctionAddress` helper function to print function names for BB address map entries.
Reviewed By: jhenderson
Differential Revision: https://reviews.llvm.org/D102900
These tests will show how (and r i) will be optimized to
(BCLRI (BCLRI r, i0), i1) or (BCLRI (ANDI r, i0), i1) by future
commits.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D103359
Clang writes object files by first writing to a .tmp file and then
renaming to the final .obj name. On Windows, if a compile is killed
partway through the .tmp files don't get deleted.
Currently it seems like RemoveFileOnSignal takes care of deleting the
tmp files on Linux, but on Windows we need to call
setDeleteDisposition on tmp files so that they are deleted when
closed.
This patch switches to using TempFile to create the .tmp files we write
when creating object files, since it uses setDeleteDisposition on Windows.
This change applies to both Linux and Windows for consistency.
Differential Revision: https://reviews.llvm.org/D102876
Some existing places use getPointerElementType() to create a copy of a
pointer type with some new address space.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D103429
We can look through invariant group intrinsics for the purposes of
simplifying the result of a load.
Since intrinsics can't be constants, but we also don't want to
completely rewrite load constant folding, we convert the load operand to
a constant. For GEPs and bitcasts we just treat them as constants. For
invariant group intrinsics, we treat them as a bitcast.
Reviewed By: lebedev.ri
Differential Revision: https://reviews.llvm.org/D101103