Remove dead virtual functions from vtables with
replaceNonMetadataUsesWith, so that CGProfile metadata gets cleaned up
correctly.
Original commit message:
Currently, it is hard for the compiler to remove unused C++ virtual
functions, because they are all referenced from vtables, which are referenced
by constructors. This means that if the constructor is called from any live
code, then we keep every virtual function in the final link, even if there
are no call sites which can use it.
This patch allows unused virtual functions to be removed during LTO (and
regular compilation in limited circumstances) by using type metadata to match
virtual function call sites to the vtable slots they might load from. This
information can then be used in the global dead code elimination pass instead
of the references from vtables to virtual functions, to more accurately
determine which functions are reachable.
To make this transformation safe, I have changed clang's code-generation to
always load virtual function pointers using the llvm.type.checked.load
intrinsic, instead of regular load instructions. I originally tried writing
this using clang's existing code-generation, which uses the llvm.type.test
and llvm.assume intrinsics after doing a normal load. However, it is possible
for optimisations to obscure the relationship between the GEP, load and
llvm.type.test, causing GlobalDCE to fail to find virtual function call
sites.
The existing linkage and visibility types don't accurately describe the scope
in which a virtual call could be made which uses a given vtable. This is
wider than the visibility of the type itself, because a virtual function call
could be made using a more-visible base class. I've added a new
!vcall_visibility metadata type to represent this, described in
TypeMetadata.rst. The internalization pass and libLTO have been updated to
change this metadata when linking is performed.
This doesn't currently work with ThinLTO, because it needs to see every call
to llvm.type.checked.load in the linkage unit. It might be possible to
extend this optimisation to be able to use the ThinLTO summary, as was done
for devirtualization, but until then that combination is rejected in the
clang driver.
To test this, I've written a fuzzer which generates random C++ programs with
complex class inheritance graphs, and virtual functions called through object
and function pointers of different types. The programs are spread across
multiple translation units and DSOs to test the different visibility
restrictions.
I've also tried doing bootstrap builds of LLVM to test this. This isn't
ideal, because only classes in anonymous namespaces can be optimised with
-fvisibility=default, and some parts of LLVM (plugins and bugpoint) do not
work correctly with -fvisibility=hidden. However, there are only 12 test
failures when building with -fvisibility=hidden (and an unmodified compiler),
and this change does not cause any new failures for either value of
-fvisibility.
On the 7 C++ sub-benchmarks of SPEC2006, this gives a geomean code-size
reduction of ~6%, over a baseline compiled with "-O2 -flto
-fvisibility=hidden -fwhole-program-vtables". The best cases are reductions
of ~14% in 450.soplex and 483.xalancbmk, and there are no code size
increases.
I've also run this on a set of 8 mbed-os examples compiled for Armv7M, which
show a geomean size reduction of ~3%, again with no size increases.
I had hoped that this would have no effect on performance, which would allow
it to awlays be enabled (when using -fwhole-program-vtables). However, the
changes in clang to use the llvm.type.checked.load intrinsic are causing ~1%
performance regression in the C++ parts of SPEC2006. It should be possible to
recover some of this perf loss by teaching optimisations about the
llvm.type.checked.load intrinsic, which would make it worth turning this on
by default (though it's still dependent on -fwhole-program-vtables).
Differential revision: https://reviews.llvm.org/D63932
llvm-svn: 375094
Summary:
Currently when computing a GEP offset using the function EmitGEPOffset
for the following instruction
getelementptr inbounds i32, i32* %p, i64 %offs
we get
mul nuw i64 %offs, 4
Unfortunately we cannot assume that unsigned wrapping won't happen
here because %offs is allowed to be negative.
Making such assumptions can lead to miscompilations: see the new test
test24_neg_offs in InstCombine/icmp.ll. Without the patch InstCombine
would generate the following comparison:
icmp eq i64 %offs, 4611686018427387902; 0x3ffffffffffffffe
Whereas the correct value to compare with is -2.
This patch replaces the NUW flag with NSW in the multiplication
instructions generated by EmitGEPOffset and adjusts the test suite.
https://bugs.llvm.org/show_bug.cgi?id=42699
Reviewers: chandlerc, craig.topper, ostannard, lebedev.ri, spatel, efriedma, nlopes, aqjune
Reviewed By: lebedev.ri
Subscribers: reames, lebedev.ri, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68342
llvm-svn: 375089
This broke llvm-objdump in 32-bit builds, see e.g.
http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/10925
> Summary:
> When listing the index in `llvm-objdump -h`, use a zero-based counter instead of the actual section index (e.g. shdr->sh_index for ELF).
>
> While this is effectively a noop for now (except one unit test for XCOFF), the index values will change in a future patch that filters certain sections out (e.g. symbol tables). See D68669 for more context. Note: the test case in `test/tools/llvm-objdump/X86/section-index.s` already covers the case of incrementing the section index counter when sections are skipped.
>
> Reviewers: grimar, jhenderson, espindola
>
> Reviewed By: grimar
>
> Subscribers: emaste, sbc100, arichardson, aheejin, arphaman, seiya, llvm-commits, MaskRay
>
> Tags: #llvm
>
> Differential Revision: https://reviews.llvm.org/D68848
llvm-svn: 375088
Summary:
This is a NFC change that removes the NFA->DFA construction and emission logic from DFAPacketizerEmitter and instead uses the generic DFAEmitter logic. This allows DFAPacketizer to use the Automaton class from Support and remove a bunch of logic there too.
After this patch, DFAPacketizer is mostly logic for grepping Itineraries and collecting functional units, with no state machine logic. This will allow us to modernize by removing the 16-functional-unit limit and supporting non-itinerary functional units. This is all for followup patches.
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68992
llvm-svn: 375086
Add generic DAG combine for extending masked loads.
Allow us to generate sext/zext masked loads which can access v4i8,
v8i8 and v4i16 memory to produce v4i32, v8i16 and v4i32 respectively.
Differential Revision: https://reviews.llvm.org/D68337
llvm-svn: 375085
Summary:
This is patch is part of a series to introduce an Alignment type.
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133851.html
See this patch for the introduction of the type: https://reviews.llvm.org/D64790
Reviewers: courbet
Subscribers: jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, sbc100, jgravelle-google, hiraditya, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, jsji, Jim, lenary, s.egerton, pzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68993
llvm-svn: 375084
Summary:
Each generated helper can be configured to generate an option that disables
rules in that helper. This can be used to bisect rulesets.
The disable bits are stored in a SparseVector as this is very cheap for the
common case where nothing is disabled. It gets more expensive the more rules
are disabled but you're generally doing that for debug purposes where
performance is less of a concern.
Depends on D68426
Reviewers: volkan, bogner
Reviewed By: volkan
Subscribers: hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68438
llvm-svn: 375067
Teach the combiner helper how to flatten concat_vectors of build_vectors
into a build_vector.
Add this combine as part of AArch64 pre-legalizer combiner.
Differential Revision: https://reviews.llvm.org/D69071
llvm-svn: 375066
Summary:
This is just moving the existing C++ code around and will be NFC w.r.t
AArch64. Renamed 'CombineBr' to something more descriptive
('ElideByByInvertingCond') at the same time.
The remaining combines in AArch64PreLegalizeCombiner require features that
aren't implemented at this point and will be hoisted as they are added.
Depends on D68424
Reviewers: bogner, volkan
Subscribers: kristof.beyls, hiraditya, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68426
llvm-svn: 375057
This reverts r375051 (git commit a409afaad64ce83ea44cc30ee5f96b6e613a6e98)
The patch does not work on Windows due to `\` in filenames being interpreted as escaping rather than literal path separators when used by lld linker scripts.
llvm-svn: 375052
Summary: Update GlobPattern in libSupport to handle a few more cases. It does not fully match the `fnmatch` used by GNU objcopy since named character classes (e.g. `[[:digit:]]`) are not supported, but this should support most existing use cases (mostly just `*` is what's used anyway).
This will be used to implement the `--wildcard` flag in llvm-objcopy to be more compatible with GNU objcopy.
This is split off of D66613 to land the libSupport changes separately. The llvm-objcopy part will land soon.
Reviewers: jhenderson, MaskRay, evgeny777, espindola, alexshap
Reviewed By: MaskRay
Subscribers: nickdesaulniers, emaste, arichardson, hiraditya, jakehehrlich, abrachet, seiya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66613
undo objcopy changes to make this libsupport only
llvm-svn: 375051
Summary:
There are two cases where a block is merged into its predecessor and the
MergeBlockIntoPredecessor API is not used. Update the API so it can be
reused in the other cases, in order to avoid code duplication.
Cleanup motivated by D68659.
Reviewers: chandlerc, sanjoy.google, george.burgess.iv
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68670
llvm-svn: 375050
* Remove outdated precautions for Python versions < 2.7
* Remove dead code related to `maxIndividualTestTime` option
* Move printing of test and result summary out of main into its own
function
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D68847
llvm-svn: 375046
After changing dsymutil to use libOption, we lost error reporting for
missing required arguments (input files). Additionally, we stopped
complaining about unknown arguments. This patch fixes both and adds a
test.
llvm-svn: 375044
r374772 changed Offset to be an int64_t but left NewOffset as an int.
Scale is unsigned, so in the calculation `Offset - NewOffset * Scale`,
`NewOffset * Scale` was promoted to unsigned and was then zero-extended
to 64 bits, leading to an incorrect computation which manifested as an
out-of-memory when building the Swift standard library for Android
aarch64. Promote NewOffset to int64_t to fix this, and promote
EmittableOffset as well, since its one user passes it to a function
which takes an int64_t anyway.
Test case based on a suggestion by Sander de Smalen!
Differential Revision: https://reviews.llvm.org/D69018
llvm-svn: 375043
The problem is that we can have two loop exits, 'a' and 'b', where 'a' and 'b' would exit at the same iteration, 'a' precedes 'b' along some path, and 'b' is predicated while 'a' is not. In this case (see the previously submitted test case), we causing the loop to exit through 'b' whereas it should have exited through 'a'.
This only applies to loop exits where the exit counts are not provably inequal, but that isn't as much of a restriction as it appears. If we could order the exit counts, we'd have already removed one of the two exits. In theory, we might be able to prove inequality w/o ordering, but I didn't really explore that piece. Instead, I went for the obvious restriction and ensured we didn't predicate exits following non-predicateable exits.
Credit goes to Evgeny Brevnov for figuring out the problematic case. Fuzzing probably also found it (failures seen), but due to some silly infrastructure problems I hadn't gotten to the results before Evgeny hand reduced it from a benchmark (he manually enabled the transform). Once this is fixed, I'll try to filter through the fuzzer failures to see if there's anything additional lurking.
Differential Revision https://reviews.llvm.org/D68956
llvm-svn: 375038
Summary: Also update the help modifier (h) so that it works as a modifier and not just as a standalone `h`. For example, `llvm-ar h` prints the help message, but `llvm-ar xh` currently prints `unknown option h`.
Reviewers: MaskRay, gbreynoo
Subscribers: llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69007
llvm-svn: 375028
The 1st attempt at this modified the cost model in a bad way to avoid the vectorization,
but that caused problems for other users (the loop vectorizer) of the cost model.
I don't see an ideal solution to these 2 related, potentially large, perf regressions:
https://bugs.llvm.org/show_bug.cgi?id=42708https://bugs.llvm.org/show_bug.cgi?id=43146
We decided that load combining was unsuitable for IR because it could obscure other
optimizations in IR. So we removed the LoadCombiner pass and deferred to the backend.
Therefore, preventing SLP from destroying load combine opportunities requires that it
recognizes patterns that could be combined later, but not do the optimization itself (
it's not a vector combine anyway, so it's probably out-of-scope for SLP).
Here, we add a cost-independent bailout with a conservative pattern match for a
multi-instruction sequence that can probably be reduced later.
In the x86 tests shown (and discussed in more detail in the bug reports), SDAG combining
will produce a single instruction on these tests like:
movbe rax, qword ptr [rdi]
or:
mov rax, qword ptr [rdi]
Not some (half) vector monstrosity as we currently do using SLP:
vpmovzxbq ymm0, dword ptr [rdi + 1] # ymm0 = mem[0],zero,zero,..
vpsllvq ymm0, ymm0, ymmword ptr [rip + .LCPI0_0]
movzx eax, byte ptr [rdi]
movzx ecx, byte ptr [rdi + 5]
shl rcx, 40
movzx edx, byte ptr [rdi + 6]
shl rdx, 48
or rdx, rcx
movzx ecx, byte ptr [rdi + 7]
shl rcx, 56
or rcx, rdx
or rcx, rax
vextracti128 xmm1, ymm0, 1
vpor xmm0, xmm0, xmm1
vpshufd xmm1, xmm0, 78 # xmm1 = xmm0[2,3,0,1]
vpor xmm0, xmm0, xmm1
vmovq rax, xmm0
or rax, rcx
vzeroupper
ret
Differential Revision: https://reviews.llvm.org/D67841
llvm-svn: 375025
The name of ControlSections is not expressive enough to convey what they really are.
CsectGroup can better communicate the concept of grouping csects together since they have similar property.
Reviewer: daltenty
Differential Revision: https://reviews.llvm.org/D69001
llvm-svn: 375021
Using GNU diff, `--strip-trailing-cr` removes a `\r` appearing before
a `\n` at the end of a line. Without this patch, lit's internal diff
only removes `\r` if it appears as the last character. That seems
useless. This patch fixes that.
This patch also adds `--strip-trailing-cr` to some tests that fail on
Windows bots when D68664 is applied. Based on what I see in the bot
logs, I think the following is happening. In each test there, lit
diff is comparing a file with `\r\n` line endings to a file with `\n`
line endings. Without D68664, lit diff reads those files in text
mode, which in Windows causes `\r\n` to be replaced with `\n`.
However, with D68664, lit diff reads the files in binary mode instead
and thus reports that every line is different, just as GNU diff does
(at least under Ubuntu). Adding `--strip-trailing-cr` to those tests
restores the previous behavior while permitting the behavior of lit
diff to be more like GNU diff.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D68839
llvm-svn: 375020
As suggested by rnk at D67643#1673043, instead of reading files
multiple times until an appropriate encoding is found, read them once
as binary, and then try to decode what was read.
For Python >= 3.5, don't fail when attempting to decode the
`diff_bytes` output in order to print it.
Avoid failures for Python 2.7 used on some Windows bots by
transforming diff output with `lit.util.to_string` before writing it
to stdout.
Finally, add some tests for encoding handling.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D68664
llvm-svn: 375018
Do not generate non-existing sdwa instructions. It reduces the
number of generated instructions by 185.
Differential Revision: https://reviews.llvm.org/D69010
llvm-svn: 375016