This patch builds on some existing code to do CFG reconstruction from
a disassembled binary:
- MCModule represents the binary, and has a list of MCAtoms.
- MCAtom represents either disassembled instructions (MCTextAtom), or
contiguous data (MCDataAtom), and covers a specific range of addresses.
- MCBasicBlock and MCFunction form the reconstructed CFG. An MCBB is
backed by an MCTextAtom, and has the usual successors/predecessors.
- MCObjectDisassembler creates a module from an ObjectFile using a
disassembler. It first builds an atom for each section. It can also
construct the CFG, and this splits the text atoms into basic blocks.
MCModule and MCAtom were only sketched out; MCFunction and MCBB were
implemented under the experimental "-cfg" llvm-objdump -macho option.
This cleans them up for further use; llvm-objdump -d -cfg now generates
graphviz files for each function found in the binary.
In the future, MCObjectDisassembler may be the right place to do
"intelligent" disassembly: for example, handling constant islands is just
a matter of splitting the atom, using information that may be available
in the ObjectFile. Also, better initial atom formation than just using
sections is possible using symbols (and things like Mach-O's
function_starts load command).
This brings two minor regressions in llvm-objdump -macho -cfg:
- The printing of a relocation's referenced symbol.
- An annotation on loop BBs, i.e., which are their own successor.
Relocation printing is replaced by the MCSymbolizer; the basic CFG
annotation will be superseded by more related functionality.
llvm-svn: 182628
This is a basic first step towards symbolization of disassembled
instructions. This used to be done using externally provided (C API)
callbacks. This patch introduces:
- the MCSymbolizer class, that mimics the same functions that were used
in the X86 and ARM disassemblers to symbolize immediate operands and
to annotate loads based off PC (for things like c string literals).
- the MCExternalSymbolizer class, which implements the old C API.
- the MCRelocationInfo class, which provides a way for targets to
translate relocations (either object::RelocationRef, or disassembler
C API VariantKinds) to MCExprs.
- the MCObjectSymbolizer class, which does symbolization using what it
finds in an object::ObjectFile. This makes simple symbolization (with
no fancy relocation stuff) work for all object formats!
- x86-64 Mach-O and ELF MCRelocationInfos.
- A basic ARM Mach-O MCRelocationInfo, that provides just enough to
support the C API VariantKinds.
Most of what works in otool (the only user of the old symbolization API
that I know of) for x86-64 symbolic disassembly (-tvV) works, namely:
- symbol references: call _foo; jmp 15 <_foo+50>
- relocations: call _foo-_bar; call _foo-4
- __cf?string: leaq 193(%rip), %rax ## literal pool for "hello"
Stub support is the main missing part (because libObject doesn't know,
among other things, about mach-o indirect symbols).
As for the MCSymbolizer API, instead of relying on the disassemblers
to call the tryAdding* methods, maybe this could be done automagically
using InstrInfo? For instance, even though PC-relative LEAs are used
to get the address of string literals in a typical Mach-O file, a MOV
would be used in an ELF file. And right now, the explicit symbolization
only recognizes PC-relative LEAs. InstrInfo should have already have
most of what is needed to know what to symbolize, so this can
definitely be improved.
I'd also like to remove object::RelocationRef::getValueString (it seems
only used by relocation printing in objdump), as simply printing the
created MCExpr is definitely enough (and cleaner than string concats).
llvm-svn: 182625
Move the processing of the command line options to right before we create the
TargetMachine instead of after.
<rdar://problem/13468287>
llvm-svn: 182611
On 32-bit hosts %p can print garbage when given a uint64_t, we should
use %llx instead. This only affects the output of the debugging text
produced by lli.
llvm-svn: 182209
BitVector/SmallBitVector::reference::operator bool remain implicit since
they model more exactly a bool, rather than something else that can be
boolean tested.
The most common (non-buggy) case are where such objects are used as
return expressions in bool-returning functions or as boolean function
arguments. In those cases I've used (& added if necessary) a named
function to provide the equivalent (or sometimes negative, depending on
convenient wording) test.
One behavior change (YAMLParser) was made, though no test case is
included as I'm not sure how to reach that code path. Essentially any
comparison of llvm::yaml::document_iterators would be invalid if neither
iterator was at the end.
This helped uncover a couple of bugs in Clang - test cases provided for
those in a separate commit along with similar changes to `operator bool`
instances in Clang.
llvm-svn: 181868
EngineBuilder interface required a JITMemoryManager even if it was being used
to construct an MCJIT. But the MCJIT actually wants a RTDyldMemoryManager.
Consequently, the SectionMemoryManager, which is meant for MCJIT, derived
from the JITMemoryManager and then stubbed out a bunch of JITMemoryManager
methods that weren't relevant to the MCJIT.
This patch fixes the situation: it teaches the EngineBuilder that
RTDyldMemoryManager is a supertype of JITMemoryManager, and that it's
appropriate to pass a RTDyldMemoryManager instead of a JITMemoryManager if
we're using the MCJIT. This allows us to remove the stub methods from
SectionMemoryManager, and make SectionMemoryManager a direct subtype of
RTDyldMemoryManager.
llvm-svn: 181820
It was just a less powerful and more confusing version of
MCCFIInstruction. A side effect is that, since MCCFIInstruction uses
dwarf register numbers, calls to getDwarfRegNum are pushed out, which
should allow further simplifications.
I left the MachineModuleInfo::addFrameMove interface unchanged since
this patch was already fairly big.
llvm-svn: 181680
- requires existing debug information to be present
- fixes up file name and line number information in metadata
- emits a "<orig_filename>-debug.ll" succinct IR file (without !dbg metadata
or debug intrinsics) that can be read by a debugger
- initialize pass in opt tool to enable the "-debug-ir" flag
- lit tests to follow
llvm-svn: 181467
The alignment is just a byte in the middle of Characteristics, not an
independent flag. Making it an independent field in the yaml
representation makes it more yamlio friendly.
llvm-svn: 181243
Update comments, fix * placement, fix method names that are not
used in clang, add a linkInModule that takes a Mode and put it
in Linker.cpp.
llvm-svn: 181099
Build attribute sections can now be read if they exist via ELFObjectFile, and
the llvm-readobj tool has been extended with an option to dump this information
if requested. Regression tests are also included which exercise these features.
Also update the docs with a fixed ARM ABI link and a new link to the Addenda
which provides the build attributes specification.
llvm-svn: 181009
For Mach-O there were 2 implementations for parsing object files. A
standalone llvm/Object/MachOObject.h and llvm/Object/MachO.h which
implements the generic interface in llvm/Object/ObjectFile.h.
This patch adds the missing features to MachO.h, moves macho-dump to
use MachO.h and removes ObjectFile.h.
In addition to making sure that check-all is clean, I checked that the
new version produces exactly the same output in all Mach-O files in a
llvm+clang build directory (including executables and shared
libraries).
To test the performance, I ran macho-dump over all the files in a
llvm+clang build directory again, but this time redirecting the output
to /dev/null. Both the old and new versions take about 4.6 seconds
(2.5 user) to finish.
llvm-svn: 180624
getRelocationAddress is for dynamic libraries and executables,
getRelocationOffset for relocatable objects.
Mark the getRelocationAddress of COFF and MachO as not implemented yet. Add a
test of ELF's. llvm-readobj -r now prints the same values as readelf -r.
llvm-svn: 180259
While here, don't report a dummy symbol for relocations that don't have symbols.
We used to says such relocations were for the first defined symbol, but now we
return end_symbols(). The llvm-readobj output change agrees with otool.
llvm-svn: 180214
LTO was always creating an empty llvm.compiler.used. With this patch we
now first check if there is anything to be added first.
Unfortunately, there is no good way to test libLTO in isolation as it needs gold
or ld64, but there are bots doing LTO builds that found this problem.
llvm-svn: 180202
The COFFParser now contains only a COFFYAML::Object and the string table
(which is recomputed, not serialized).
The structs in COFFParser now all begin with a Header field with what is
actually on the COFF object. The other fields are things that are semantically
part of the struct (relocations in a section for exmaple), but are not actually
represented that way in the object file.
llvm-svn: 180134
This is part of a future patch to use yamlio that incorrectly ended up in a
cleanup patch.
Thanks to Benjamin Kramer for reporting it.
llvm-svn: 179938
Instead, use MappingNormalization to directly parse COFF::header. Also change
the naming convention of the helper classes to be a bit shorter.
llvm-svn: 179917
Thanks to Evgeniy Stepanov for reporting this.
It might be a good idea to add a command iterator abstraction to MachO.h, but
this fixes the bug for now.
llvm-svn: 179848