If .gcda is corrupted, gcov continues to produce a .gcov and just
assumes execution counts are zeros. This is reasonable, because the
program can corrupt its .gcda output. The code path should be similar to
the code path without .gcda.
Summary:
Adds the ability to add members to a generated combiner via
a State base class. In the current AArch64PreLegalizerCombiner
this is used to make Helper available without having to
provide it to every call.
As part of this, split the command line processing into a
separate object so that it still only runs once even though
the generated combiner is constructed more frequently.
Depends on D81862
Reviewers: aditya_nandakumar, bogner, volkan, aemerson, paquette, arsenm
Reviewed By: arsenm
Subscribers: jvesely, wdng, nhaehnle, kristof.beyls, hiraditya, kerbowa, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81863
This patch introduces the heat coloring of the Call Printer which is based
on the relative "hotness" of each function. The patch is a part of sequence of
three patches, related to graphs Heat Coloring.
Another feature added is the flag similar to "-cfg-dot-filename-prefix",
which allows to write the graph into a named .pdf
Reviewers: rcorcs, apilipenko, davidxl, sfertile, fedor.sergeev, eraman, bollu
Differential Revision: https://reviews.llvm.org/D77172
Between gcov 4.9~8, `gcov -i $file` prints coverage information to
$file.gcov in an intermediate text format (single file, instead of
$source.gcov for each source file).
lcov newer than 2019-05-24 detects -i support and uses it to increase
processing speed. gcov 9 (GCC r265587) removed --intermediate-format
and -i was changed to mean --json-format. However, we consider this
format still useful and support it. geninfo (part of lcov) supports this
format even if we announce that we are compatible with gcov 9.0.0
Summary:
Move the bail out logic to before constructing the Result and Lane
vectors. This is both potentially faster, and avoids calling
getNumElements on a potentially scalable vector
Reviewers: efriedma, sunfish, chandlerc, c-rhodes, fpetrogalli
Reviewed By: fpetrogalli
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81619
Summary:
simplifyDivRem attempts to walk a VectorType elementwise. Ensure that it
only does so for FixedVectorType
Reviewers: efriedma, spatel, lebedev.ri, david-arm, kmclaughlin
Reviewed By: spatel, david-arm
Subscribers: tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D81856
Generalize scalarization (recently enhanced with D80885)
to allow compares as well as binops.
Similar to binops, we are avoiding scalarization of a loaded
value because that could avoid a register transfer in codegen.
This requires 1 extra predicate that I am aware of: we do not
want to scalarize the condition value of a vector select. That
might also invert a transform that we do in instcombine that
prefers a vector condition operand for a vector select.
I think this is the final step in solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463
Differential Revision: https://reviews.llvm.org/D81661
When selecting 32 b -> 64 b G_ZEXTs, we don't have to always emit the extend.
If the instruction feeding into the G_ZEXT implicitly zero extends the high
half of the register, we can just emit a SUBREG_TO_REG instead.
Differential Revision: https://reviews.llvm.org/D81897
Compiling assembly files when newlines are reduced to line markers within a `.macro` context will generate wrong information in `.debug_line` section.
This patch fixes this issue by evaluating line markers within the macro scope but not when they are used and evaluated.
Reviewed By: probinson
Differential Revision: https://reviews.llvm.org/D80381
This patch upstreams support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch64. This includes IR intrinsics. Unittests are
provided as needed. AArch32 Intrinsics + CodeGen will come after this
patch.
This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a
The bfloat type, and its properties are specified in the Arm
Architecture
Reference Manual:
https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile
The following people contributed to this patch:
Luke Geeson
- Momchil Velikov
- Mikhail Maltsev
- Luke Cheeseman
Reviewers: SjoerdMeijer, t.p.northover, sdesmalen, labrinea, miyuki,
stuij
Reviewed By: miyuki, stuij
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits,
llvm-commits, miyuki, chill, pbarrio, stuij
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D80752
Change-Id: I174f0fd0f600d04e3799b06a7da88973c6c0703f
An instruction like this will need to allocate some stack space for the
last parameter:
%x = call addrspace(1) i16 @bar(i64 undef, i64 undef, i16 undef, i16 0)
This worked fine when passing an actual value (in this case 0). However,
when passing undef, no value was pushed to the stack and therefore no
push instructions were created. This caused an unbalanced stack leading
to interesting results.
This commit fixes that by replacing the push logic with a regular stack
adjustment and stack-relative load/stores. This is less efficient but at
least it correctly compiles the code.
I can think of a few improvements in the future:
* The stack should have been adjusted in the function prologue when
there are no allocas in the function.
* Many (if not most) stack adjustments can be replaced by
pushing/popping the values directly. Exactly like the previous code
attempted but didn't do correctly.
* Small stack adjustments can be done more efficiently with a few
push/pop instructions (pushing/popping bogus values), both for code
size and for speed.
All in all, as long as there are no allocas in the function I think that
it is almost always more efficient to emit regular push/pop
instructions. This is however left for future optimizations.
Differential Revision: https://reviews.llvm.org/D78581
This patch fixes a bug in stack save/restore code. Because the frame
pointer was saved/restored manually (not by marking it as clobbered) the
StackSize variable was not updated accordingly. Most code still worked,
but code that tried to load a parameter passed on the stack did not.
This commit fixes this by marking the frame pointer as a
callee-clobbered register. This will let it be saved without any effort
in prolog/epilog code and will make sure the correct address is
calculated for loading parameters that are passed on the stack.
This approach is used by most other targets (such as X86, AArch64 and
RISC-V).
Differential Revision: https://reviews.llvm.org/D78579
These code patterns attempt to call isVMOVModifiedImm on a splat of i1
values, leading to an unreachable being hit. I've guarded the call on a
more specific set of sizes, as i1 vectors are legal under MVE.
Differential Revision: https://reviews.llvm.org/D81860
Summary:
this reduces significantly the number of assumes generated without aftecting too much
the information that is preserved. this improves the compile-time cost
of enable-knowledge-retention significantly.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: hiraditya, asbirlea, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D79650
GCC 7 was reporting "enumeral and non-enumeral type in conditional expression"
as a warning.
The code casts an instruction opcode enum to unsigned implicitly, in
line with intentions; so this commit silences the warning by making the
cast to unsigned explicit.
We are planning to add the bf16 value type in the HPR register class
and this will make the codegen patterns ambiguous.
Differential Revision: https://reviews.llvm.org/D81505
Note that .eh_frame sections are generated in the 32-bit format even
when debug sections are 64-bit, for compatibility reasons. They use
relative references between entries, so they hardly benefit from the
64-bit format.
Differential Revision: https://reviews.llvm.org/D81149
DW_FORM_sec_offset was introduced in DWARFv4, so, for 64-bit DWARFv3,
DW_FORM_data8 should be used instead.
Differential Revision: https://reviews.llvm.org/D81148
The patch enables producing DWARF64 compilation units and fixes
generating references to .debug_abbrev and .debug_line sections.
A similar change for .debug_ranges/.debug_rnglists will be added
in a forthcoming patch.
Differential Revision: https://reviews.llvm.org/D81145
The patch adds an option `--dwarf64` to instruct a tool to generate
debug information in the 64-bit DWARF format. There is no real
implementation yet, only a few compatibility checks.
Differential Revision: https://reviews.llvm.org/D81143
This patch extends MatchVectorAllZeroTest to handle OR vector reduction patterns where the result is compared against zero.
Fixes PR45378
Differential Revision: https://reviews.llvm.org/D81547
Have TTI::getInstructionThroughput call getUserCost for Br, Ret and
PHI. This now means that eveything in getInstructionThroughput is
handled by getUserCost.
Differential Revision: https://reviews.llvm.org/D79849
Only functions with floating-point return type accepts fast-math flags.
When adding such flags to function returning integer, we'll see a crash,
because there's still an undeleted value referencing the argument. This
patch manually removes the temporary instruction when error occurs.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D78355
Summary:
Normally, the Origin is passed over TLS, which seems like it introduces unnecessary overhead. It's in the (extremely) cold path though, so the only overhead is in code size.
But with eager-checks, calls to __msan_warning functions are extremely common, so this becomes a useful optimization.
This can save ~5% code size.
Reviewers: eugenis, vitalybuka
Reviewed By: eugenis, vitalybuka
Subscribers: hiraditya, #sanitizers, llvm-commits
Tags: #sanitizers, #llvm
Differential Revision: https://reviews.llvm.org/D81700