The functions lookForDIEsToKeep and keepDIEAndDependencies are mutually
recursive and can cause a stackoverflow for large projects. While this
has always been the case, it became a bigger issue when we parallelized
dsymutil, because threads get only a fraction of the stack space.
This patch removes the final recursive call from lookForDIEsToKeep. The
call was used to look at the current DIE's parent chain and mark
everything as kept.
This was tested by running dsymutil on clang built in debug (both with
and without modules) and comparing the MD5 hash of the generated dSYM
companion file.
Differential revision: https://reviews.llvm.org/D70994
The functions lookForDIEsToKeep and keepDIEAndDependencies are mutually
recursive and can cause a stackoverflow for large projects. While this
has always been the case, it became a bigger issue when we parallelized
dsymutil, because threads get only a fraction of the stack space.
In an attempt to tackle this issue, we removed part of the recursion in
r338536 by introducing a worklist. Processing of child DIEs was no
longer recursive. However, we still received bug reports where we'd run
out of stack space.
This patch removes another recursive call from lookForDIEsToKeep. The
call was used to look at DIEs that reference the current DIE. To make
this possible, we inlined keepDIEAndDependencies and added this work to
the existing worklist. Because the function is not tail recursive, we
needed to add two more types of worklist entries to perform the
subsequent work.
This was tested by running dsymutil on clang built in debug (both with
and without modules) and comparing the MD5 hash of the generated dSYM
companion file.
Differential revision: https://reviews.llvm.org/D70990
Revise the coverage mapping format to reduce binary size by:
1. Naming function records and marking them `linkonce_odr`, and
2. Compressing filenames.
This shrinks the size of llc's coverage segment by 82% (334MB -> 62MB)
and speeds up end-to-end single-threaded report generation by 10%. For
reference the compressed name data in llc is 81MB (__llvm_prf_names).
Rationale for changes to the format:
- With the current format, most coverage function records are discarded.
E.g., more than 97% of the records in llc are *duplicate* placeholders
for functions visible-but-not-used in TUs. Placeholders *are* used to
show under-covered functions, but duplicate placeholders waste space.
- We reached general consensus about giving (1) a try at the 2017 code
coverage BoF [1]. The thinking was that using `linkonce_odr` to merge
duplicates is simpler than alternatives like teaching build systems
about a coverage-aware database/module/etc on the side.
- Revising the format is expensive due to the backwards compatibility
requirement, so we might as well compress filenames while we're at it.
This shrinks the encoded filenames in llc by 86% (12MB -> 1.6MB).
See CoverageMappingFormat.rst for the details on what exactly has
changed.
Fixes PR34533 [2], hopefully.
[1] http://lists.llvm.org/pipermail/llvm-dev/2017-October/118428.html
[2] https://bugs.llvm.org/show_bug.cgi?id=34533
Differential Revision: https://reviews.llvm.org/D69471
This test was failing on non-X86 targets because the gold invocation did not
have the necessary -m flag.
Differential Revision: https://reviews.llvm.org/D70982
The PHI node checks for inner loop exits are too permissive currently.
As indicated by an existing comment, we should only allow LCSSA PHI
nodes that are part of reductions or are only used outside of the loop
nest. We ensure this by checking the users of the LCSSA PHIs.
Specifically, it is not safe to use an exiting value from the inner loop in the latch of the outer
loop.
It also moves the inner loop exit check before the outer loop exit
check.
Fixes PR43473.
Reviewers: efriedma, mcrosier
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D68144
Summary:
This is one more prep step necessary before the code gen pass instrumentation
code could go in.
Reviewers: davidxl
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70988
Variant on D70103. The caching is switched to always use a BB to
cache entry map, which then contains per-value caches. A separate
set contains value handles with a deletion callback. This allows us
to properly invalidate overdefined values.
A possible alternative would be to always cache by value first and
have per-BB maps/sets in the each cache entry. In that case we could
use a ValueMap and would avoid the separate value handle set. I went
with the BB indexing at the top level to make it easier to integrate
D69914, but possibly that's not the right choice.
Differential Revision: https://reviews.llvm.org/D70376
Summary:
Implement emitTCEntry for PPCTargetXCOFFStreamer.
Add TC csects to TOCCsects for object file writing.
Note:
1. I did not include any raw data testing for this object file generation
because TC entries raw data will all be 0 without relocation implemented.
I will add raw data testing as part of relocation testing later.
2. I removed "Symbol->setFragment(F);" for common symbols because we
don't need it, and if we have it then we would hit assertions below:
Assertion `(SymbolContents == SymContentsUnset ||
SymbolContents == SymContentsOffset) &&
"Cannot get offset for a common/variable symbol"' failed.
3.Fixed incorrect TOC-base alignment.
Differential Revision: https://reviews.llvm.org/D70798
This diff adds test coverage for thin archives including additions to
existing tests. In some cases I have updated the formats of these tests
to better match other tests in the archive.
Differential Revision: https://reviews.llvm.org/D70969
When basic blocks are killed, either due to being empty or to being an if.then
or if.else block whose complement contains identical instructions, some of the
debug intrinsics in that block are lost. This patch sinks those intrinsics
into the single successor block, setting them Undef if necessary to
prevent debug info from falling out-of-date.
Differential Revision: https://reviews.llvm.org/D70318
The PT_GNU_PROPERTY is generated by a linker to describe the
.note.gnu.property section. The Linux kernel uses this program header to
locate the .note.gnu.property section.
It is described in "The Linux gABI extension"
Include support for llvm-readelf, llvm-readobj and the yaml reader and
writers.
Differential Revision: https://reviews.llvm.org/D70959
Summary:
This is a follow-up to D70769 and D70222, which allows propagation of
current directory down to ExpandResponseFiles for handling of relative paths.
Previously clients had to mutate FS to achieve that, which is not thread-safe
and can even be thread-hostile in the case of real file system.
Reviewers: sammccall
Subscribers: hiraditya, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D70857
Summary:
Adds intrinsics for the following:
* rbit
* revb
* revh
* revw
Patterns are also defined to map the 'llvm.bswap.*' intrinsic to the SVE
revb instruction.
Reviewers: sdesmalen, huntergr, dancgr, rengolin, efriedma, rovka
Reviewed By: sdesmalen
Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70960
SCEV caches the exiting blocks when computing exit counts. In
SimpleLoopUnswitch, we split the exit block of the loop to unswitch.
Currently we only invalidate the loop containing that exit block, but if
that block is the exiting block for a parent loop, we have stale cache
entries. We have to invalidate the top-most loop that contains the exit
block as exiting block. We might also be able to skip invalidating the
loop containing the exit block, if the exit block is not an exiting
block of that loop.
There are also 2 more places in SimpleLoopUnswitch, that use a similar
problematic approach to get the loop to invalidate. If the patch makes
sense, I will also update those places to a similar approach (they deal
with multiple exit blocks, so we cannot directly re-use
getTopMostExitingLoop).
Fixes PR43972.
Reviewers: skatkov, reames, asbirlea, chandlerc
Reviewed By: asbirlea
Differential Revision: https://reviews.llvm.org/D70786
Summary:
We rely on this in our CHERI backend to address the GOT by generating a
$pc-relative addresses. For this we emit the following code sequence:
lui $1, %pcrel_hi(_CHERI_CAPABILITY_TABLE_-8)
daddiu $1, $1, %pcrel_lo(_CHERI_CAPABILITY_TABLE_-4)
cgetpccincoffset $c1, $1
However, without this change the addend is implicitly converted to
UINT32_MAX and an invalid pointer value is generated.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70953
Summary:
In our CHERI fork we use BUNDLE instructions to ensure that a
three-instruction sequence to generate a program-counter-relative value is
emitted without reordering or insertions (since that would break the 32-bit
offset computation).
Currently MipsAsmPrinter asserts when it encounters a pseudo instruction.
To handle BUNDLE we can simply skip the instruction which will then make
EmitInstruction() process the contents of the bundle in order.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70945
Summary:
In our CHERI fork we use BUNDLE instructions to ensure that a
three-instruction sequence to generate a program-counter-relative value is
emitted without reordering or insertions (since that would break the 32-bit
offset computation). This sequence is created in MipsExpandPseudo and we use
finalizeBundle() to create the BUNDLE instruction.
However, the delay slot filler currently breaks this pattern since the BUNDLE
will be removed and so all instructions are moved into the delay slot.
Since the delay slot only executes the first instruction, this results in
incorrect computations (and run-time crashes) if the branch is taken.
The original test cases uses CHERI instructions, so for the test case here
I simple filled a BUNDLE with a no-op DADDiu $sp_64, -16 and DADDiu $sp_64, 16.
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70944
Summary:
I was tracking down a code-generation bug in this pass and found that the
added output was useful. It is also helpful to find out why a delay slot
could not be filled even though there is clearly a valid instruction (which
appears to mostly be caused by CFI instructions).
Reviewers: atanasyan
Reviewed By: atanasyan
Subscribers: merge_guards_bot, sdardis, hiraditya, jrtc27, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70940
Currently, getIntImmCost returns TCC_Free for almost all intrinsics.
For most AArch64 specific intrinsics however, it looks like integer
constants cannot be folded into most of them (at least the ones I checked).
Unless we know that we can fold integer operands with the intrinsic, we
handle more cases correctly by returning the cost to materialize the
immediate than return TCC_Free.
Reviewers: SjoerdMeijer, dmgreen, t.p.northover, ributzka
Reviewed By: SjoerdMeijer
Differential Revision: https://reviews.llvm.org/D70669
Summary:
The default case handles the majority of MVTs so most of the individual
cases can be removed. Also added a case for floating point types.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D70955
Summary:
Catch the (admittedly unusual) case where SIFoldOperands attempts to fold 2
constant operands into the same SALU operation, with neither operand able to be
encoded as an inline constant.
Change-Id: Ibc48d662c9ffd8bbacd154976b0b1c257ace0927
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70896
We already have Symbols property to list regular symbols and
it is currently Optional<>. This patch makes DynamicSymbols to be optional
too. With this there is no need to define a dummy symbol anymore to trigger
creation of the .dynsym and it is now possible to define an empty .dynsym using
just the following line:
DynamicSymbols: []
(it is important to have when you do not want to have dynamic symbols,
but want to have a .dynsym)
Now the code is consistent and it helped to fix a bug: previously we
did not report an error when both Content/Size and an empty
Symbols/DynamicSymbols list were specified.
Differential revision: https://reviews.llvm.org/D70956
Constructor invocations such as `APFloat(APFloat::IEEEdouble(), 0.0)`
may seem like they accept a FP (floating point) value, but the overload
they reach is actually the `integerPart` one, not a `float` or `double`
overload (which only exists when `fltSemantics` isn't passed).
This may lead to possible loss of data, by the conversion from `float`
or `double` to `integerPart`.
To prevent future mistakes, a new constructor overload, which accepts
any FP value and marked with `delete`, to prevent its usage.
Fixes PR34095.
Differential Revision: https://reviews.llvm.org/D70425
Summary:
lldb's loclists parser has support for DW_LLE_start_end(x) encodings. To
avoid regressing when switching the implementation to llvm's, I add
parsing support for all previously unsupported location list encodings.
Reviewers: dblaikie, JDevlieghere, aprantl, SouraVX
Subscribers: hiraditya, probinson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70949
Summary:
The dump() function already accepts a callback. This makes
getAbsoluteRanges do the same. The existing DWARFUnit overload is
implemented on top of the new function.
This enables usage of the debug_rnglists parser from within lldb (which
has it's own dwarf parser).
Reviewers: dblaikie, JDevlieghere, aprantl
Subscribers: hiraditya, probinson, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70952
InstCombine may synthesize FMINNUM/FMAXNUM nodes from fcmp+select
sequences (where the fcmp is marked nnan). Currently, if the
target does not otherwise handle these nodes, they'll get expanded
to libcalls to fmin/fmax. However, these functions may reside in
libm, which may introduce a library dependency that was not originally
present in the source code, potentially resulting in link failures.
To fix this problem, add code to TargetLowering::expandFMINNUM_FMAXNUM
to expand FMINNUM/FMAXNUM to a compare+select sequence instead of the
libcall. This is done only if the node is marked as "nnan"; in this case,
the expansion to compare+select is always correct. This also suffices to
catch all cases where FMINNUM/FMAXNUM was synthesized as above.
Differential Revision: https://reviews.llvm.org/D70965
This is the example:
int foo(int a, int b, int c, int d) {
return a + b + c + d;
}
And this is the Dependency Graph:
+------+ +------+ +------+ +------+
| A | | B | | C | | D |
+--+--++ +---+--+ +--+---+ +--+---+
^ ^ ^ ^ ^ ^
| | | | | |
| | | |New1 +--------------+
| | | | |
| | | | +--+---+
| |New2 | +-------+ ADD1 |
| | | +--+---+
| | | Fuse ^
| | +-------------+
| +------------+
| |
| Fuse +--+---+
+----------->+ ADD2 |
| +------+
+--+---+
| ADD3 |
+------+
We need also create an artificial edge from ADD1 to A if
https://reviews.llvm.org/D69998 is landed. That will force the Node A scheduled
before the ADD1 and ADD2. But in fact, it is ok to schedule the Node A
in-between ADD3 and ADD2, as ADD3 and ADD2 are NOT a fusion pair because
ADD2 has been matched to ADD1. We are creating these unnecessary dependency
edges that override the heuristics.
Differential Revision: https://reviews.llvm.org/D70066
Summary: This is a follow-up of D70881. It models DAZ and FTZ for releated instructions.
Reviewers: craig.topper, RKSimon, andrew.w.kaylor
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D70938
Summary: There is a discussion of git-format-patch in GettingStarted guide, but no mention of it in the Phabricator.html page.
Reviewers: jyknight, delcypher
Reviewed By: delcypher
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69323
Jenkins sometimes starts a new working directory by appending @2 (or
incrementing the number if the @n suffix is already there). This causes
several clang tests to fail as:
s@INPUT_DIR@%/S/Inputs@g
gets expanded to the invalid:
s@INPUT_DIR@/path/to/workdir@2/Inputs@g
~~~~~~~~~~
where the part marked with ~'s is interpreted as the flags. These are
invalid and the test fails.
Previous fixes simply exchanged the @ character for another like | but
that's just moving the problem. Address it by adding an expansion that
escapes the @ character we're using as a delimiter as well as other magic
characters in the replacement of sed's s@@@.
There's still room for expansions to cause trouble though. One I ran into
while testing this was that having a directory called foo@bar causes lots
of `CHECK-NOT: foo` directives to match. There's also things like
directories containing `\1`