Corrected and simplified the help text.
It was clearly too difficult to maintain before (see e.g. @227296) making it
simpler and more consistent it should help people keep it up to date.
Differential Revision: https://reviews.llvm.org/D48577
llvm-svn: 338703
We don't need to use a map to store ResourceState objects. The number of
processor resources is known statically from the scheduling model. We can
therefore use a vector, and reserve a slot for each processor resource that we
want to simulate.
Every time the ResourceManager queries the ResourceState vector, the index to
the vector of ResourceState objects can be easily computed from the processor
resource mask.
This drastically reduces the time complexity of method ResourceManager::use() and
method ResourceManager::release(). This patch gives an average speedup of 12%.
llvm-svn: 338702
This is useful for understanding how our demangler processes
back references and for investigating issues related to
back references. But it's a feature only useful for debugging
the demangling process itself, so I'm marking it hidden.
llvm-svn: 338609
Summary:
Add support for --rename-section flags from gnu objcopy.
Not all flags appear to have an effect for ELF objects, but allowing them would allow easier drop-in replacement. Other unrecognized flags are rejected.
This was only tested by comparing flags printed by "readelf -e <.o>" against the output of gnu vs llvm objcopy, it hasn't been tested to be valid beyond that.
Reviewers: jakehehrlich, alexshap
Subscribers: llvm-commits, paulsemel, alexshap
Differential Revision: https://reviews.llvm.org/D49870
llvm-svn: 338582
The functions `lookForDIEsToKeep` and `keepDIEAndDependencies` can have
some very deep recursion. This tackles part of this problem by removing
the recursion from `lookForDIEsToKeep` by turning it into a worklist.
The difficulty in doing so is the computation of incompleteness, which
depends on the incompleteness of its children. To compute this, we
insert "continuation markers" into the worklist. This informs the work
loop to (re)compute the incompleteness property of the DIE associated
with it (i.e. the parent of the previously processed DIE).
This patch should generate byte-identical output. Unfortunately it also
has some impact of performance, regressing by about 4% when processing
clang on my machine.
Differential revision: https://reviews.llvm.org/D48899
llvm-svn: 338536
Summary:
See binutils-gdb/bfd/elf.c, GNU objcopy also strips .stab* (STABS)
.line* (DWARF 1) .gnu.linkonce.wi.* (linkonce section for .debug_info) but
I'm not sure we need to be compatible with it.
Reviewers: dblaikie, alexshap, jakehehrlich, jhenderson
Reviewed By: alexshap, jakehehrlich
Subscribers: aprantl, JDevlieghere, jakehehrlich, llvm-commits
Differential Revision: https://reviews.llvm.org/D50100
llvm-svn: 338443
A detailed description of the tool has been recently added by Matt to
CommandGuide/llvm-mca.rst. File README.txt is now redundant and can be removed;
all the relevant user-guide information has been improved and then moved to
llvm-mca.rst.
In future, we should add another .rst for the "llvm-mca developer manual" to
provide infromation about:
- llvm-mca internals.
- How to add custom stages to the simulated pipeline.
- How to provide extra processor info in the scheduling model to improve the
analysis performed by llvm-mca.
llvm-svn: 338386
This patch teaches llvm-mca how to identify dependency breaking instructions on
btver2.
An example of dependency breaking instructions is the zero-idiom XOR (example:
`XOR %eax, %eax`), which always generates zero regardless of the actual value of
the input register operands.
Dependency breaking instructions don't have to wait on their input register
operands before executing. This is because the computation is not dependent on
the inputs.
Not all dependency breaking idioms are also zero-latency instructions. For
example, `CMPEQ %xmm1, %xmm1` is independent on
the value of XMM1, and it generates a vector of all-ones.
That instruction is not eliminated at register renaming stage, and its opcode is
issued to a pipeline for execution. So, the latency is not zero.
This patch adds a new method named isDependencyBreaking() to the MCInstrAnalysis
interface. That method takes as input an instruction (i.e. MCInst) and a
MCSubtargetInfo.
The default implementation of isDependencyBreaking() conservatively returns
false for all instructions. Targets may override the default behavior for
specific CPUs, and return a value which better matches the subtarget behavior.
In future, we should teach to Tablegen how to automatically generate the body of
isDependencyBreaking from scheduling predicate definitions. This would allow us
to expose the knowledge about dependency breaking instructions to the machine
schedulers (and, potentially, other codegen passes).
Differential Revision: https://reviews.llvm.org/D49310
llvm-svn: 338372
Dsymutil's update functionality was broken on Windows because we tried
to rename a file while we're holding open handles to that file. TempFile
provides a solution for this through its keep(Twine) method. This patch
changes dsymutil to make use of that functionality.
Differential revision: https://reviews.llvm.org/D49860
llvm-svn: 338216
Summary:
These two cases will trigger a dereference on a nullptr, since the
SymbolTable can be nonexistent for a given library, in addition to just
being empty.
Reviewers: alexshap
Reviewed By: alexshap
Subscribers: meikeb, kongyi, chh, jakehehrlich, llvm-commits, pirama
Differential Revision: https://reviews.llvm.org/D49534
llvm-svn: 338062
The standard library functions ::isprint/std::isprint have platform-
and locale-dependent behavior which makes LLVM's output less
predictable. In particular, regression tests my fail depending on the
implementation of these functions.
Implement llvm::isPrint in StringExtras.h with a standard behavior and
replace all uses of ::isprint/std::isprint by a call it llvm::isPrint.
The function is inlined and does not look up language settings so it
should perform better than the standard library's version.
Such a replacement has already been done for isdigit, isalpha, isxdigit
in r314883. gtest does the same in gtest-printers.cc using the following
justification:
// Returns true if c is a printable ASCII character. We test the
// value of c directly instead of calling isprint(), which is buggy on
// Windows Mobile.
inline bool IsPrintableAscii(wchar_t c) {
return 0x20 <= c && c <= 0x7E;
}
Similar issues have also been encountered by Julia:
https://github.com/JuliaLang/julia/issues/7416
I noticed the problem myself when on Windows isprint('\t') started to
evaluate to true (see https://stackoverflow.com/questions/51435249) and
thus caused several unit tests to fail. The result of isprint doesn't
seem to be well-defined even for ASCII characters. Therefore I suggest
to replace isprint by a platform-independent version.
Differential Revision: https://reviews.llvm.org/D49680
llvm-svn: 338034
This patch add support for emitting DWARF5 accelerator tables
(.debug_names) from dsymutil. Just as with the Apple style accelerator
tables, it's possible to update existing dSYMs. This patch includes a
test that show how you can convert back and forth between the two types.
If no kind of table is specified, dsymutil will default to generating
Apple-style accelerator tables whenever it finds those in its input. The
same is true when there are no accelerator tables at all. Finally, in
the remaining case, where there's at least one DWARF v5 table and no
Apple ones, the output will contains a DWARF accelerator tables
(.debug_names).
Differential revision: https://reviews.llvm.org/D49137
llvm-svn: 337980
Helpers are available to make this option file format independant. This
patch adds the feature for Wasm file format. It doesn't change the
behavior of the other file format handling.
Differential Revision: https://reviews.llvm.org/D49545
llvm-svn: 337896
This new JIT event listener supports generating profiling data for
the linux 'perf' profiling tool, allowing it to generate function and
instruction level profiles.
Currently this functionality is not enabled by default, but must be
enabled with LLVM_USE_PERF=yes. Given that the listener has no
dependencies, it might be sensible to enable by default once the
initial issues have been shaken out.
I followed existing precedent in registering the listener by default
in lli. Should there be a decision to enable this by default on linux,
that should probably be changed.
Please note that until https://reviews.llvm.org/D47343 is resolved,
using this functionality with mcjit rather than orcjit will not
reliably work.
Disregarding the previous comment, here's an example:
$ cat /tmp/expensive_loop.c
bool stupid_isprime(uint64_t num)
{
if (num == 2)
return true;
if (num < 1 || num % 2 == 0)
return false;
for(uint64_t i = 3; i < num / 2; i+= 2) {
if (num % i == 0)
return false;
}
return true;
}
int main(int argc, char **argv)
{
int numprimes = 0;
for (uint64_t num = argc; num < 100000; num++)
{
if (stupid_isprime(num))
numprimes++;
}
return numprimes;
}
$ clang -ggdb -S -c -emit-llvm /tmp/expensive_loop.c -o
/tmp/expensive_loop.ll
$ perf record -o perf.data -g -k 1 ./bin/lli -jit-kind=mcjit /tmp/expensive_loop.ll 1
$ perf inject --jit -i perf.data -o perf.jit.data
$ perf report -i perf.jit.data
- 92.59% lli jitted-5881-2.so [.] stupid_isprime
stupid_isprime
main
llvm::MCJIT::runFunction
llvm::ExecutionEngine::runFunctionAsMain
main
__libc_start_main
0x4bf6258d4c544155
+ 0.85% lli ld-2.27.so [.] do_lookup_x
And line-level annotations also work:
│ for(uint64_t i = 3; i < num / 2; i+= 2) {
│1 30: movq $0x3,-0x18(%rbp)
0.03 │1 38: mov -0x18(%rbp),%rax
0.03 │ mov -0x10(%rbp),%rcx
│ shr $0x1,%rcx
3.63 │ ┌──cmp %rcx,%rax
│ ├──jae 6f
│ │ if (num % i == 0)
0.03 │ │ mov -0x10(%rbp),%rax
│ │ xor %edx,%edx
89.00 │ │ divq -0x18(%rbp)
│ │ cmp $0x0,%rdx
0.22 │ │↓ jne 5f
│ │ return false;
│ │ movb $0x0,-0x1(%rbp)
│ │↓ jmp 73
│ │ }
3.22 │1 5f:│↓ jmp 61
│ │ for(uint64_t i = 3; i < num / 2; i+= 2) {
Subscribers: mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D44892
llvm-svn: 337789
Add a -debugify-export option to opt. This exports per-pass `debugify`
loss statistics to a file in CSV format.
For some interesting numbers on debug value loss during an -O2 build
of the sqlite3 amalgamation, see the review thread.
Differential Revision: https://reviews.llvm.org/D49003
llvm-svn: 337787
This is a minor cleanup in preparation for a change to export DI
statistics from -check-debugify. To do that, it would be cleaner to have
a dedicated header for the debugify interface.
llvm-svn: 337786
Dynamic section holds a table, so the sh_entsize might be set. As the
dynamic section entry size never changes, we can default it to the size
of a dynamic entry.
Differential Revision: https://reviews.llvm.org/D49619
llvm-svn: 337725
If an error occurs and we write it to stderr, it could appear
before we wrote the mangled name which we're undecorating.
By flushing stdout first, we ensure that the messages are always
sequenced in the correct order.
llvm-svn: 337645
Summary:
Add basic support for --rename-section=old=new to llvm-objcopy.
A full replacement for GNU objcopy requires also modifying flags (i.e. --rename-section=old=new,flag1,flag2); I'd like to keep that in a separate change to keep this simple.
Reviewers: jakehehrlich, alexshap
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49576
llvm-svn: 337604
This adds initial support for a demangling library (LLVMDemangle)
and tool (llvm-undname) for demangling Microsoft names. This
doesn't cover 100% of cases and there are some known limitations
which I intend to address in followup patches, at least until such
time that we have (near) 100% test coverage matching up with all
of the test cases in clang/test/CodeGenCXX/mangle-ms-*.
Differential Revision: https://reviews.llvm.org/D49552
llvm-svn: 337584
This is a new modernized VS integration installer. It adds a
Visual Studio .sln file which, when built, outputs a VSIX that can
be used to install ourselves as a "real" Visual Studio Extension.
We can even upload this extension to the visual studio marketplace.
This fixes a longstanding problem where we didn't support installing
into VS 2017 and higher. In addition to supporting VS 2017, due
to the way this is written we now longer need to do anything special
to support future versions of VS as well. Everything should
"just work". This also fixes several bugs with our old integration,
such as MSBuild triggering full rebuilds when /Zi was used.
Finally, we add a new UI page called "LLVM" which becomes visible
when the LLVM toolchain is selected. For now this only contains
one option which is the path to clang-cl.exe, but in the future
we can add more things here.
Differential Revision: https://reviews.llvm.org/D42762
llvm-svn: 337572
When output style is GNU and amount of sections is >= SHN_LORESERVE,
llvm-readobj reports zero number of sections instead of actual value.
The patch fixes that.
Differential revision: https://reviews.llvm.org/D49544
llvm-svn: 337462
Imagine we have a file with few sections, and one of them is .foo
with index N != 0.
Problem is that when llvm-objdump is given a -section=.foo parameter
it lists .foo as a section at index 0. That makes impossible to write
test cases which needs to find the index of the particular section,
while ignoring dumping of others.
The patch fixes that.
Differential revision: https://reviews.llvm.org/D49372
llvm-svn: 337361
http://www.sco.com/developers/gabi/2003-12-17/ch4.eheader.html
says that e_shnum and/or e_shstrndx may have special values if
"the number of sections is greater than or equal to SHN_LORESERVE" or
"the section name string table section index is greater than or equal to SHN_LORESERVE (0xff00)"
Previously llvm-readobj was unable to dump such files, patch changes that.
I had to add a precompiled test case because it does not seem possible to
prepare a test using yaml2obj or llvm-mc (not clear how to make .shstrtab
to have index >= SHN_LORESERVE).
Differential revision: https://reviews.llvm.org/D49369
llvm-svn: 337360
Nest any classes not used outside of a file into anon. Nest any classes used
across files in llvm-objcopy into namespace llvm::objcopy.
Differential Revision: https://reviews.llvm.org/D49449
llvm-svn: 337337
This support was partial and temporary. Now that we have
wasm object file support its no longer needed.
Differential Revision: https://reviews.llvm.org/D48744
llvm-svn: 337222
Anywhere in tools/llvm-objcopy where functions or classes are not referenced
outside of a given file, we change things to make the function or class static
or put inside an anonymous namespace.
llvm-svn: 337220
This patch is an update of an older patch that never landed
(see here: https://reviews.llvm.org/D42516)
Recently various users have run into this issue and it just 100%
has to be solved at this point. The main difference in this patch
is that I use gunzip instead of unzip which should hopefully allow
tests to pass. Please review this as if it is a new patch however.
I found some issues along the way and made some minor modifications.
The binary used in this patch for testing (a zip file to make it small)
can be found here:
https://drive.google.com/file/d/1UjsnTO9edLttZibbr-2T1bJl92KEQFAO/view?usp=sharing
Differential Revision: https://reviews.llvm.org/D49206
llvm-svn: 337204
This reverts commit r337081, therefore restoring r337050 (and fix in
r337059), with test fix for bot failure described after the original
description below.
In order to always import the same copy of a linkonce function,
even when encountering it with different thresholds (a higher one then a
lower one), keep track of the summary we decided to import.
This ensures that the backend only gets a single definition to import
for each GUID, so that it doesn't need to choose one.
Move the largest threshold the GUID was considered for import into the
current module out of the ImportMap (which is part of a larger map
maintained across the whole index), and into a new map just maintained
for the current module we are computing imports for. This saves some
memory since we no longer have the thresholds maintained across the
whole index (and throughout the in-process backends when doing a normal
non-distributed ThinLTO build), at the cost of some additional
information being maintained for each invocation of ComputeImportForModule
(the selected summary pointer for each import).
There is an additional map lookup for each callee being considered for
importing, however, this was able to subsume a map lookup in the
Worklist iteration that invokes computeImportForFunction. We also are
able to avoid calling selectCallee if we already failed to import at the
same or higher threshold.
I compared the run time and peak memory for the SPEC2006 471.omnetpp
benchmark (running in-process ThinLTO backends), as well as for a large
internal benchmark with a distributed ThinLTO build (so just looking at
the thin link time/memory). Across a number of runs with and without
this change there was no significant change in the time and memory.
(I tried a few other variations of the change but they also didn't
improve time or peak memory).
The new commit removes a test that no longer makes sense
(Transforms/FunctionImport/hotness_based_import2.ll), as exposed by the
reverse-iteration bot. The test depends on the order of processing the
summary call edges, and actually depended on the old problematic
behavior of selecting more than one summary for a given GUID when
encountered with different thresholds. There was no guarantee even
before that we would eventually pick the linkonce copy with the hottest
call edges, it just happened to work with the test and the old code, and
there was no guarantee that we would end up importing the selected
version of the copy that had the hottest call edges (since the backend
would effectively import only one of the selected copies).
Reviewers: davidxl
Subscribers: mehdi_amini, inglorion, llvm-commits
Differential Revision: https://reviews.llvm.org/D48670
llvm-svn: 337184
As suggested in the review for r337007, this makes cfi-verify abort on unsupported targets instead of producing incorrect results. It also updates the design document to reflect this.
Differential Revision: https://reviews.llvm.org/D49304
llvm-svn: 337181
registers.
The goal of this patch is to improve the throughput analysis in llvm-mca for the
case where instructions perform partial register writes.
On x86, partial register writes are quite difficult to model, mainly because
different processors tend to implement different register merging schemes in
hardware.
When the code contains partial register writes, the IPC (instructions per
cycles) estimated by llvm-mca tends to diverge quite significantly from the
observed IPC (using perf).
Modern AMD processors (at least, from Bulldozer onwards) don't rename partial
registers. Quoting Agner Fog's microarchitecture.pdf:
" The processor always keeps the different parts of an integer register together.
For example, AL and AH are not treated as independent by the out-of-order
execution mechanism. An instruction that writes to part of a register will
therefore have a false dependence on any previous write to the same register or
any part of it."
This patch is a first important step towards improving the analysis of partial
register updates. It changes the semantic of RegisterFile descriptors in
tablegen, and teaches llvm-mca how to identify false dependences in the presence
of partial register writes (for more details: see the new code comments in
include/Target/TargetSchedule.h - class RegisterFile).
This patch doesn't address the case where a write to a part of a register is
followed by a read from the whole register. On Intel chips, high8 registers
(AH/BH/CH/DH)) can be stored in separate physical registers. However, a later
(dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which
adds extra latency (and potentially affects the pipe usage).
This is a very interesting article on the subject with a very informative answer
from Peter Cordes:
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to
In future, the definition of RegisterFile can be extended with extra information
that may be used to identify delays caused by merge opcodes triggered by a dirty
read of a partial write.
Differential Revision: https://reviews.llvm.org/D49196
llvm-svn: 337123