mirror of
https://github.com/RPCS3/llvm-mirror.git
synced 2024-11-23 19:23:23 +01:00
669dd9e6f5
llvm-svn: 155199
2859 lines
96 KiB
HTML
2859 lines
96 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
|
<title>Source Level Debugging with LLVM</title>
|
|
<link rel="stylesheet" href="_static/llvm.css" type="text/css">
|
|
</head>
|
|
<body>
|
|
|
|
<h1>Source Level Debugging with LLVM</h1>
|
|
|
|
<table class="layout" style="width:100%">
|
|
<tr class="layout">
|
|
<td class="left">
|
|
<ul>
|
|
<li><a href="#introduction">Introduction</a>
|
|
<ol>
|
|
<li><a href="#phil">Philosophy behind LLVM debugging information</a></li>
|
|
<li><a href="#consumers">Debug information consumers</a></li>
|
|
<li><a href="#debugopt">Debugging optimized code</a></li>
|
|
</ol></li>
|
|
<li><a href="#format">Debugging information format</a>
|
|
<ol>
|
|
<li><a href="#debug_info_descriptors">Debug information descriptors</a>
|
|
<ul>
|
|
<li><a href="#format_compile_units">Compile unit descriptors</a></li>
|
|
<li><a href="#format_files">File descriptors</a></li>
|
|
<li><a href="#format_global_variables">Global variable descriptors</a></li>
|
|
<li><a href="#format_subprograms">Subprogram descriptors</a></li>
|
|
<li><a href="#format_blocks">Block descriptors</a></li>
|
|
<li><a href="#format_basic_type">Basic type descriptors</a></li>
|
|
<li><a href="#format_derived_type">Derived type descriptors</a></li>
|
|
<li><a href="#format_composite_type">Composite type descriptors</a></li>
|
|
<li><a href="#format_subrange">Subrange descriptors</a></li>
|
|
<li><a href="#format_enumeration">Enumerator descriptors</a></li>
|
|
<li><a href="#format_variables">Local variables</a></li>
|
|
</ul></li>
|
|
<li><a href="#format_common_intrinsics">Debugger intrinsic functions</a>
|
|
<ul>
|
|
<li><a href="#format_common_declare">llvm.dbg.declare</a></li>
|
|
<li><a href="#format_common_value">llvm.dbg.value</a></li>
|
|
</ul></li>
|
|
</ol></li>
|
|
<li><a href="#format_common_lifetime">Object lifetimes and scoping</a></li>
|
|
<li><a href="#ccxx_frontend">C/C++ front-end specific debug information</a>
|
|
<ol>
|
|
<li><a href="#ccxx_compile_units">C/C++ source file information</a></li>
|
|
<li><a href="#ccxx_global_variable">C/C++ global variable information</a></li>
|
|
<li><a href="#ccxx_subprogram">C/C++ function information</a></li>
|
|
<li><a href="#ccxx_basic_types">C/C++ basic types</a></li>
|
|
<li><a href="#ccxx_derived_types">C/C++ derived types</a></li>
|
|
<li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li>
|
|
<li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li>
|
|
</ol></li>
|
|
<li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a>
|
|
<ol>
|
|
<li><a href="#objcproperty">Debugging Information Extension
|
|
for Objective C Properties</a>
|
|
<ul>
|
|
<li><a href="#objcpropertyintroduction">Introduction</a></li>
|
|
<li><a href="#objcpropertyproposal">Proposal</a></li>
|
|
<li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li>
|
|
<li><a href="#objcpropertynewconstants">New DWARF Constants</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#acceltable">Name Accelerator Tables</a>
|
|
<ul>
|
|
<li><a href="#acceltableintroduction">Introduction</a></li>
|
|
<li><a href="#acceltablehashes">Hash Tables</a></li>
|
|
<li><a href="#acceltabledetails">Details</a></li>
|
|
<li><a href="#acceltablecontents">Contents</a></li>
|
|
<li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li>
|
|
</ul>
|
|
</li>
|
|
</ol>
|
|
</li>
|
|
</ul>
|
|
</td>
|
|
</tr></table>
|
|
|
|
<div class="doc_author">
|
|
<p>Written by <a href="mailto:sabre@nondot.org">Chris Lattner</a>
|
|
and <a href="mailto:jlaskey@mac.com">Jim Laskey</a></p>
|
|
</div>
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h2><a name="introduction">Introduction</a></h2>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
|
|
<p>This document is the central repository for all information pertaining to
|
|
debug information in LLVM. It describes the <a href="#format">actual format
|
|
that the LLVM debug information</a> takes, which is useful for those
|
|
interested in creating front-ends or dealing directly with the information.
|
|
Further, this document provides specific examples of what debug information
|
|
for C/C++ looks like.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="phil">Philosophy behind LLVM debugging information</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>The idea of the LLVM debugging information is to capture how the important
|
|
pieces of the source-language's Abstract Syntax Tree map onto LLVM code.
|
|
Several design aspects have shaped the solution that appears here. The
|
|
important ones are:</p>
|
|
|
|
<ul>
|
|
<li>Debugging information should have very little impact on the rest of the
|
|
compiler. No transformations, analyses, or code generators should need to
|
|
be modified because of debugging information.</li>
|
|
|
|
<li>LLVM optimizations should interact in <a href="#debugopt">well-defined and
|
|
easily described ways</a> with the debugging information.</li>
|
|
|
|
<li>Because LLVM is designed to support arbitrary programming languages,
|
|
LLVM-to-LLVM tools should not need to know anything about the semantics of
|
|
the source-level-language.</li>
|
|
|
|
<li>Source-level languages are often <b>widely</b> different from one another.
|
|
LLVM should not put any restrictions of the flavor of the source-language,
|
|
and the debugging information should work with any language.</li>
|
|
|
|
<li>With code generator support, it should be possible to use an LLVM compiler
|
|
to compile a program to native machine code and standard debugging
|
|
formats. This allows compatibility with traditional machine-code level
|
|
debuggers, like GDB or DBX.</li>
|
|
</ul>
|
|
|
|
<p>The approach used by the LLVM implementation is to use a small set
|
|
of <a href="#format_common_intrinsics">intrinsic functions</a> to define a
|
|
mapping between LLVM program objects and the source-level objects. The
|
|
description of the source-level program is maintained in LLVM metadata
|
|
in an <a href="#ccxx_frontend">implementation-defined format</a>
|
|
(the C/C++ front-end currently uses working draft 7 of
|
|
the <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3
|
|
standard</a>).</p>
|
|
|
|
<p>When a program is being debugged, a debugger interacts with the user and
|
|
turns the stored debug information into source-language specific information.
|
|
As such, a debugger must be aware of the source-language, and is thus tied to
|
|
a specific language or family of languages.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="consumers">Debug information consumers</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>The role of debug information is to provide meta information normally
|
|
stripped away during the compilation process. This meta information provides
|
|
an LLVM user a relationship between generated code and the original program
|
|
source code.</p>
|
|
|
|
<p>Currently, debug information is consumed by DwarfDebug to produce dwarf
|
|
information used by the gdb debugger. Other targets could use the same
|
|
information to produce stabs or other debug forms.</p>
|
|
|
|
<p>It would also be reasonable to use debug information to feed profiling tools
|
|
for analysis of generated code, or, tools for reconstructing the original
|
|
source from generated code.</p>
|
|
|
|
<p>TODO - expound a bit more.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="debugopt">Debugging optimized code</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>An extremely high priority of LLVM debugging information is to make it
|
|
interact well with optimizations and analysis. In particular, the LLVM debug
|
|
information provides the following guarantees:</p>
|
|
|
|
<ul>
|
|
<li>LLVM debug information <b>always provides information to accurately read
|
|
the source-level state of the program</b>, regardless of which LLVM
|
|
optimizations have been run, and without any modification to the
|
|
optimizations themselves. However, some optimizations may impact the
|
|
ability to modify the current state of the program with a debugger, such
|
|
as setting program variables, or calling functions that have been
|
|
deleted.</li>
|
|
|
|
<li>As desired, LLVM optimizations can be upgraded to be aware of the LLVM
|
|
debugging information, allowing them to update the debugging information
|
|
as they perform aggressive optimizations. This means that, with effort,
|
|
the LLVM optimizers could optimize debug code just as well as non-debug
|
|
code.</li>
|
|
|
|
<li>LLVM debug information does not prevent optimizations from
|
|
happening (for example inlining, basic block reordering/merging/cleanup,
|
|
tail duplication, etc).</li>
|
|
|
|
<li>LLVM debug information is automatically optimized along with the rest of
|
|
the program, using existing facilities. For example, duplicate
|
|
information is automatically merged by the linker, and unused information
|
|
is automatically removed.</li>
|
|
</ul>
|
|
|
|
<p>Basically, the debug information allows you to compile a program with
|
|
"<tt>-O0 -g</tt>" and get full debug information, allowing you to arbitrarily
|
|
modify the program as it executes from a debugger. Compiling a program with
|
|
"<tt>-O3 -g</tt>" gives you full debug information that is always available
|
|
and accurate for reading (e.g., you get accurate stack traces despite tail
|
|
call elimination and inlining), but you might lose the ability to modify the
|
|
program and call functions where were optimized out of the program, or
|
|
inlined away completely.</p>
|
|
|
|
<p><a href="TestingGuide.html#quicktestsuite">LLVM test suite</a> provides a
|
|
framework to test optimizer's handling of debugging information. It can be
|
|
run like this:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
% cd llvm/projects/test-suite/MultiSource/Benchmarks # or some other level
|
|
% make TEST=dbgopt
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This will test impact of debugging information on optimization passes. If
|
|
debugging information influences optimization passes then it will be reported
|
|
as a failure. See <a href="TestingGuide.html">TestingGuide</a> for more
|
|
information on LLVM test infrastructure and how to run various tests.</p>
|
|
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h2>
|
|
<a name="format">Debugging information format</a>
|
|
</h2>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
|
|
<p>LLVM debugging information has been carefully designed to make it possible
|
|
for the optimizer to optimize the program and debugging information without
|
|
necessarily having to know anything about debugging information. In
|
|
particular, the use of metadata avoids duplicated debugging information from
|
|
the beginning, and the global dead code elimination pass automatically
|
|
deletes debugging information for a function if it decides to delete the
|
|
function. </p>
|
|
|
|
<p>To do this, most of the debugging information (descriptors for types,
|
|
variables, functions, source files, etc) is inserted by the language
|
|
front-end in the form of LLVM metadata. </p>
|
|
|
|
<p>Debug information is designed to be agnostic about the target debugger and
|
|
debugging information representation (e.g. DWARF/Stabs/etc). It uses a
|
|
generic pass to decode the information that represents variables, types,
|
|
functions, namespaces, etc: this allows for arbitrary source-language
|
|
semantics and type-systems to be used, as long as there is a module
|
|
written for the target debugger to interpret the information. </p>
|
|
|
|
<p>To provide basic functionality, the LLVM debugger does have to make some
|
|
assumptions about the source-level language being debugged, though it keeps
|
|
these to a minimum. The only common features that the LLVM debugger assumes
|
|
exist are <a href="#format_files">source files</a>,
|
|
and <a href="#format_global_variables">program objects</a>. These abstract
|
|
objects are used by a debugger to form stack traces, show information about
|
|
local variables, etc.</p>
|
|
|
|
<p>This section of the documentation first describes the representation aspects
|
|
common to any source-language. The <a href="#ccxx_frontend">next section</a>
|
|
describes the data layout conventions used by the C and C++ front-ends.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="debug_info_descriptors">Debug information descriptors</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>In consideration of the complexity and volume of debug information, LLVM
|
|
provides a specification for well formed debug descriptors. </p>
|
|
|
|
<p>Consumers of LLVM debug information expect the descriptors for program
|
|
objects to start in a canonical format, but the descriptors can include
|
|
additional information appended at the end that is source-language
|
|
specific. All LLVM debugging information is versioned, allowing backwards
|
|
compatibility in the case that the core structures need to change in some
|
|
way. Also, all debugging information objects start with a tag to indicate
|
|
what type of object it is. The source-language is allowed to define its own
|
|
objects, by using unreserved tag numbers. We recommend using with tags in
|
|
the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base =
|
|
0x1000.)</p>
|
|
|
|
<p>The fields of debug descriptors used internally by LLVM
|
|
are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>,
|
|
<tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!1 = metadata !{
|
|
i32, ;; A tag
|
|
...
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p><a name="LLVMDebugVersion">The first field of a descriptor is always an
|
|
<tt>i32</tt> containing a tag value identifying the content of the
|
|
descriptor. The remaining fields are specific to the descriptor. The values
|
|
of tags are loosely bound to the tag values of DWARF information entries.
|
|
However, that does not restrict the use of the information supplied to DWARF
|
|
targets. To facilitate versioning of debug information, the tag is augmented
|
|
with the current debug version (LLVMDebugVersion = 8 << 16 or
|
|
0x80000 or 524288.)</a></p>
|
|
|
|
<p>The details of the various descriptors follow.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_compile_units">Compile unit descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!0 = metadata !{
|
|
i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
|
|
;; (DW_TAG_compile_unit)
|
|
i32, ;; Unused field.
|
|
i32, ;; DWARF language identifier (ex. DW_LANG_C89)
|
|
metadata, ;; Source file name
|
|
metadata, ;; Source file directory (includes trailing slash)
|
|
metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)")
|
|
i1, ;; True if this is a main compile unit.
|
|
i1, ;; True if this is optimized.
|
|
metadata, ;; Flags
|
|
i32 ;; Runtime version
|
|
metadata ;; List of enums types
|
|
metadata ;; List of retained types
|
|
metadata ;; List of subprograms
|
|
metadata ;; List of global variables
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors contain a source language ID for the file (we use the DWARF
|
|
3.0 ID numbers, such as <tt>DW_LANG_C89</tt>, <tt>DW_LANG_C_plus_plus</tt>,
|
|
<tt>DW_LANG_Cobol74</tt>, etc), three strings describing the filename,
|
|
working directory of the compiler, and an identifier string for the compiler
|
|
that produced it.</p>
|
|
|
|
<p>Compile unit descriptors provide the root context for objects declared in a
|
|
specific compilation unit. File descriptors are defined using this context.
|
|
These descriptors are collected by a named metadata
|
|
<tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms,
|
|
global variables and type information.
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_files">File descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!0 = metadata !{
|
|
i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
|
|
;; (DW_TAG_file_type)
|
|
metadata, ;; Source file name
|
|
metadata, ;; Source file directory (includes trailing slash)
|
|
metadata ;; Unused
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors contain information for a file. Global variables and top
|
|
level functions would be defined using this context.k File descriptors also
|
|
provide context for source line correspondence. </p>
|
|
|
|
<p>Each input file is encoded as a separate file descriptor in LLVM debugging
|
|
information output. </p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_global_variables">Global variable descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!1 = metadata !{
|
|
i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
|
|
;; (DW_TAG_variable)
|
|
i32, ;; Unused field.
|
|
metadata, ;; Reference to context descriptor
|
|
metadata, ;; Name
|
|
metadata, ;; Display name (fully qualified C++ name)
|
|
metadata, ;; MIPS linkage name (for C++)
|
|
metadata, ;; Reference to file where defined
|
|
i32, ;; Line number where defined
|
|
metadata, ;; Reference to type descriptor
|
|
i1, ;; True if the global is local to compile unit (static)
|
|
i1, ;; True if the global is defined in the compile unit (not extern)
|
|
{}* ;; Reference to the global variable
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors provide debug information about globals variables. The
|
|
provide details such as name, type and where the variable is defined. All
|
|
global variables are collected inside the named metadata
|
|
<tt>!llvm.dbg.cu</tt>.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_subprograms">Subprogram descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32, ;; Tag = 46 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
|
|
;; (DW_TAG_subprogram)
|
|
i32, ;; Unused field.
|
|
metadata, ;; Reference to context descriptor
|
|
metadata, ;; Name
|
|
metadata, ;; Display name (fully qualified C++ name)
|
|
metadata, ;; MIPS linkage name (for C++)
|
|
metadata, ;; Reference to file where defined
|
|
i32, ;; Line number where defined
|
|
metadata, ;; Reference to type descriptor
|
|
i1, ;; True if the global is local to compile unit (static)
|
|
i1, ;; True if the global is defined in the compile unit (not extern)
|
|
i32, ;; Line number where the scope of the subprogram begins
|
|
i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual
|
|
i32, ;; Index into a virtual function
|
|
metadata, ;; indicates which base type contains the vtable pointer for the
|
|
;; derived class
|
|
i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped.
|
|
i1, ;; isOptimized
|
|
Function *,;; Pointer to LLVM function
|
|
metadata, ;; Lists function template parameters
|
|
metadata ;; Function declaration descriptor
|
|
metadata ;; List of function variables
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors provide debug information about functions, methods and
|
|
subprograms. They provide details such as name, return types and the source
|
|
location where the subprogram is defined.
|
|
</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_blocks">Block descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!3 = metadata !{
|
|
i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
|
|
metadata,;; Reference to context descriptor
|
|
i32, ;; Line number
|
|
i32, ;; Column number
|
|
metadata,;; Reference to source file
|
|
i32 ;; Unique ID to identify blocks from a template function
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This descriptor provides debug information about nested blocks within a
|
|
subprogram. The line number and column numbers are used to dinstinguish
|
|
two lexical blocks at same depth. </p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!3 = metadata !{
|
|
i32, ;; Tag = 11 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_lexical_block)
|
|
metadata ;; Reference to the scope we're annotating with a file change
|
|
metadata,;; Reference to the file the scope is enclosed in.
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This descriptor provides a wrapper around a lexical scope to handle file
|
|
changes in the middle of a lexical block.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_basic_type">Basic type descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!4 = metadata !{
|
|
i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
|
|
;; (DW_TAG_base_type)
|
|
metadata, ;; Reference to context
|
|
metadata, ;; Name (may be "" for anonymous types)
|
|
metadata, ;; Reference to file where defined (may be NULL)
|
|
i32, ;; Line number where defined (may be 0)
|
|
i64, ;; Size in bits
|
|
i64, ;; Alignment in bits
|
|
i64, ;; Offset in bits
|
|
i32, ;; Flags
|
|
i32 ;; DWARF type encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors define primitive types used in the code. Example int, bool
|
|
and float. The context provides the scope of the type, which is usually the
|
|
top level. Since basic types are not usually user defined the context
|
|
and line number can be left as NULL and 0. The size, alignment and offset
|
|
are expressed in bits and can be 64 bit values. The alignment is used to
|
|
round the offset when embedded in a
|
|
<a href="#format_composite_type">composite type</a> (example to keep float
|
|
doubles on 64 bit boundaries.) The offset is the bit offset if embedded in
|
|
a <a href="#format_composite_type">composite type</a>.</p>
|
|
|
|
<p>The type encoding provides the details of the type. The values are typically
|
|
one of the following:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
DW_ATE_address = 1
|
|
DW_ATE_boolean = 2
|
|
DW_ATE_float = 4
|
|
DW_ATE_signed = 5
|
|
DW_ATE_signed_char = 6
|
|
DW_ATE_unsigned = 7
|
|
DW_ATE_unsigned_char = 8
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_derived_type">Derived type descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!5 = metadata !{
|
|
i32, ;; Tag (see below)
|
|
metadata, ;; Reference to context
|
|
metadata, ;; Name (may be "" for anonymous types)
|
|
metadata, ;; Reference to file where defined (may be NULL)
|
|
i32, ;; Line number where defined (may be 0)
|
|
i64, ;; Size in bits
|
|
i64, ;; Alignment in bits
|
|
i64, ;; Offset in bits
|
|
i32, ;; Flags to encode attributes, e.g. private
|
|
metadata, ;; Reference to type derived from
|
|
metadata, ;; (optional) Name of the Objective C property associated with
|
|
;; Objective-C an ivar
|
|
metadata, ;; (optional) Name of the Objective C property getter selector.
|
|
metadata, ;; (optional) Name of the Objective C property setter selector.
|
|
i32 ;; (optional) Objective C property attributes.
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors are used to define types derived from other types. The
|
|
value of the tag varies depending on the meaning. The following are possible
|
|
tag values:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
DW_TAG_formal_parameter = 5
|
|
DW_TAG_member = 13
|
|
DW_TAG_pointer_type = 15
|
|
DW_TAG_reference_type = 16
|
|
DW_TAG_typedef = 22
|
|
DW_TAG_const_type = 38
|
|
DW_TAG_volatile_type = 53
|
|
DW_TAG_restrict_type = 55
|
|
</pre>
|
|
</div>
|
|
|
|
<p><tt>DW_TAG_member</tt> is used to define a member of
|
|
a <a href="#format_composite_type">composite type</a>
|
|
or <a href="#format_subprograms">subprogram</a>. The type of the member is
|
|
the <a href="#format_derived_type">derived
|
|
type</a>. <tt>DW_TAG_formal_parameter</tt> is used to define a member which
|
|
is a formal argument of a subprogram.</p>
|
|
|
|
<p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p>
|
|
|
|
<p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>,
|
|
<tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and
|
|
<tt>DW_TAG_restrict_type</tt> are used to qualify
|
|
the <a href="#format_derived_type">derived type</a>. </p>
|
|
|
|
<p><a href="#format_derived_type">Derived type</a> location can be determined
|
|
from the context and line number. The size, alignment and offset are
|
|
expressed in bits and can be 64 bit values. The alignment is used to round
|
|
the offset when embedded in a <a href="#format_composite_type">composite
|
|
type</a> (example to keep float doubles on 64 bit boundaries.) The offset is
|
|
the bit offset if embedded in a <a href="#format_composite_type">composite
|
|
type</a>.</p>
|
|
|
|
<p>Note that the <tt>void *</tt> type is expressed as a type derived from NULL.
|
|
</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_composite_type">Composite type descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!6 = metadata !{
|
|
i32, ;; Tag (see below)
|
|
metadata, ;; Reference to context
|
|
metadata, ;; Name (may be "" for anonymous types)
|
|
metadata, ;; Reference to file where defined (may be NULL)
|
|
i32, ;; Line number where defined (may be 0)
|
|
i64, ;; Size in bits
|
|
i64, ;; Alignment in bits
|
|
i64, ;; Offset in bits
|
|
i32, ;; Flags
|
|
metadata, ;; Reference to type derived from
|
|
metadata, ;; Reference to array of member descriptors
|
|
i32 ;; Runtime languages
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors are used to define types that are composed of 0 or more
|
|
elements. The value of the tag varies depending on the meaning. The following
|
|
are possible tag values:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
DW_TAG_array_type = 1
|
|
DW_TAG_enumeration_type = 4
|
|
DW_TAG_structure_type = 19
|
|
DW_TAG_union_type = 23
|
|
DW_TAG_vector_type = 259
|
|
DW_TAG_subroutine_type = 21
|
|
DW_TAG_inheritance = 28
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The vector flag indicates that an array type is a native packed vector.</p>
|
|
|
|
<p>The members of array types (tag = <tt>DW_TAG_array_type</tt>) or vector types
|
|
(tag = <tt>DW_TAG_vector_type</tt>) are <a href="#format_subrange">subrange
|
|
descriptors</a>, each representing the range of subscripts at that level of
|
|
indexing.</p>
|
|
|
|
<p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are
|
|
<a href="#format_enumeration">enumerator descriptors</a>, each representing
|
|
the definition of enumeration value for the set. All enumeration type
|
|
descriptors are collected inside the named metadata
|
|
<tt>!llvm.dbg.cu</tt>.</p>
|
|
|
|
<p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag
|
|
= <tt>DW_TAG_union_type</tt>) types are any one of
|
|
the <a href="#format_basic_type">basic</a>,
|
|
<a href="#format_derived_type">derived</a>
|
|
or <a href="#format_composite_type">composite</a> type descriptors, each
|
|
representing a field member of the structure or union.</p>
|
|
|
|
<p>For C++ classes (tag = <tt>DW_TAG_structure_type</tt>), member descriptors
|
|
provide information about base classes, static members and member
|
|
functions. If a member is a <a href="#format_derived_type">derived type
|
|
descriptor</a> and has a tag of <tt>DW_TAG_inheritance</tt>, then the type
|
|
represents a base class. If the member of is
|
|
a <a href="#format_global_variables">global variable descriptor</a> then it
|
|
represents a static member. And, if the member is
|
|
a <a href="#format_subprograms">subprogram descriptor</a> then it represents
|
|
a member function. For static members and member
|
|
functions, <tt>getName()</tt> returns the members link or the C++ mangled
|
|
name. <tt>getDisplayName()</tt> the simplied version of the name.</p>
|
|
|
|
<p>The first member of subroutine (tag = <tt>DW_TAG_subroutine_type</tt>) type
|
|
elements is the return type for the subroutine. The remaining elements are
|
|
the formal arguments to the subroutine.</p>
|
|
|
|
<p><a href="#format_composite_type">Composite type</a> location can be
|
|
determined from the context and line number. The size, alignment and
|
|
offset are expressed in bits and can be 64 bit values. The alignment is used
|
|
to round the offset when embedded in
|
|
a <a href="#format_composite_type">composite type</a> (as an example, to keep
|
|
float doubles on 64 bit boundaries.) The offset is the bit offset if embedded
|
|
in a <a href="#format_composite_type">composite type</a>.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_subrange">Subrange descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!42 = metadata !{
|
|
i32, ;; Tag = 33 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> (DW_TAG_subrange_type)
|
|
i64, ;; Low value
|
|
i64 ;; High value
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors are used to define ranges of array subscripts for an array
|
|
<a href="#format_composite_type">composite type</a>. The low value defines
|
|
the lower bounds typically zero for C/C++. The high value is the upper
|
|
bounds. Values are 64 bit. High - low + 1 is the size of the array. If low
|
|
> high the array bounds are not included in generated debugging information.
|
|
</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_enumeration">Enumerator descriptors</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!6 = metadata !{
|
|
i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a>
|
|
;; (DW_TAG_enumerator)
|
|
metadata, ;; Name
|
|
i64 ;; Value
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors are used to define members of an
|
|
enumeration <a href="#format_composite_type">composite type</a>, it
|
|
associates the name to the value.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_variables">Local variables</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!7 = metadata !{
|
|
i32, ;; Tag (see below)
|
|
metadata, ;; Context
|
|
metadata, ;; Name
|
|
metadata, ;; Reference to file where defined
|
|
i32, ;; 24 bit - Line number where defined
|
|
;; 8 bit - Argument number. 1 indicates 1st argument.
|
|
metadata, ;; Type descriptor
|
|
i32, ;; flags
|
|
metadata ;; (optional) Reference to inline location
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>These descriptors are used to define variables local to a sub program. The
|
|
value of the tag depends on the usage of the variable:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
DW_TAG_auto_variable = 256
|
|
DW_TAG_arg_variable = 257
|
|
DW_TAG_return_variable = 258
|
|
</pre>
|
|
</div>
|
|
|
|
<p>An auto variable is any variable declared in the body of the function. An
|
|
argument variable is any variable that appears as a formal argument to the
|
|
function. A return variable is used to track the result of a function and
|
|
has no source correspondent.</p>
|
|
|
|
<p>The context is either the subprogram or block where the variable is defined.
|
|
Name the source variable name. Context and line indicate where the
|
|
variable was defined. Type descriptor defines the declared type of the
|
|
variable.</p>
|
|
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="format_common_intrinsics">Debugger intrinsic functions</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>LLVM uses several intrinsic functions (name prefixed with "llvm.dbg") to
|
|
provide debug information at various points in generated code.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_common_declare">llvm.dbg.declare</a>
|
|
</h4>
|
|
|
|
<div>
|
|
<pre>
|
|
void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata)
|
|
</pre>
|
|
|
|
<p>This intrinsic provides information about a local element (e.g., variable). The
|
|
first argument is metadata holding the alloca for the variable. The
|
|
second argument is metadata containing a description of the variable.</p>
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="format_common_value">llvm.dbg.value</a>
|
|
</h4>
|
|
|
|
<div>
|
|
<pre>
|
|
void %<a href="#format_common_value">llvm.dbg.value</a>(metadata, i64, metadata)
|
|
</pre>
|
|
|
|
<p>This intrinsic provides information when a user source variable is set to a
|
|
new value. The first argument is the new value (wrapped as metadata). The
|
|
second argument is the offset in the user source variable where the new value
|
|
is written. The third argument is metadata containing a description of the
|
|
user source variable.</p>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="format_common_lifetime">Object lifetimes and scoping</a>
|
|
</h3>
|
|
|
|
<div>
|
|
<p>In many languages, the local variables in functions can have their lifetimes
|
|
or scopes limited to a subset of a function. In the C family of languages,
|
|
for example, variables are only live (readable and writable) within the
|
|
source block that they are defined in. In functional languages, values are
|
|
only readable after they have been defined. Though this is a very obvious
|
|
concept, it is non-trivial to model in LLVM, because it has no notion of
|
|
scoping in this sense, and does not want to be tied to a language's scoping
|
|
rules.</p>
|
|
|
|
<p>In order to handle this, the LLVM debug format uses the metadata attached to
|
|
llvm instructions to encode line number and scoping information. Consider
|
|
the following C fragment, for example:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
1. void foo() {
|
|
2. int X = 21;
|
|
3. int Y = 22;
|
|
4. {
|
|
5. int Z = 23;
|
|
6. Z = X;
|
|
7. }
|
|
8. X = Y;
|
|
9. }
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Compiled to LLVM, this function would be represented like this:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
define void @foo() nounwind ssp {
|
|
entry:
|
|
%X = alloca i32, align 4 ; <i32*> [#uses=4]
|
|
%Y = alloca i32, align 4 ; <i32*> [#uses=4]
|
|
%Z = alloca i32, align 4 ; <i32*> [#uses=3]
|
|
%0 = bitcast i32* %X to {}* ; <{}*> [#uses=1]
|
|
call void @llvm.dbg.declare(metadata !{i32 * %X}, metadata !0), !dbg !7
|
|
store i32 21, i32* %X, !dbg !8
|
|
%1 = bitcast i32* %Y to {}* ; <{}*> [#uses=1]
|
|
call void @llvm.dbg.declare(metadata !{i32 * %Y}, metadata !9), !dbg !10
|
|
store i32 22, i32* %Y, !dbg !11
|
|
%2 = bitcast i32* %Z to {}* ; <{}*> [#uses=1]
|
|
call void @llvm.dbg.declare(metadata !{i32 * %Z}, metadata !12), !dbg !14
|
|
store i32 23, i32* %Z, !dbg !15
|
|
%tmp = load i32* %X, !dbg !16 ; <i32> [#uses=1]
|
|
%tmp1 = load i32* %Y, !dbg !16 ; <i32> [#uses=1]
|
|
%add = add nsw i32 %tmp, %tmp1, !dbg !16 ; <i32> [#uses=1]
|
|
store i32 %add, i32* %Z, !dbg !16
|
|
%tmp2 = load i32* %Y, !dbg !17 ; <i32> [#uses=1]
|
|
store i32 %tmp2, i32* %X, !dbg !17
|
|
ret void, !dbg !18
|
|
}
|
|
|
|
declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone
|
|
|
|
!0 = metadata !{i32 459008, metadata !1, metadata !"X",
|
|
metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ]
|
|
!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
|
|
!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo",
|
|
metadata !"foo", metadata !3, i32 1, metadata !4,
|
|
i1 false, i1 true}; [DW_TAG_subprogram ]
|
|
!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c",
|
|
metadata !"/private/tmp", metadata !"clang 1.1", i1 true,
|
|
i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ]
|
|
!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0,
|
|
i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ]
|
|
!5 = metadata !{null}
|
|
!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0,
|
|
i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ]
|
|
!7 = metadata !{i32 2, i32 7, metadata !1, null}
|
|
!8 = metadata !{i32 2, i32 3, metadata !1, null}
|
|
!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3,
|
|
metadata !6}; [ DW_TAG_auto_variable ]
|
|
!10 = metadata !{i32 3, i32 7, metadata !1, null}
|
|
!11 = metadata !{i32 3, i32 3, metadata !1, null}
|
|
!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5,
|
|
metadata !6}; [ DW_TAG_auto_variable ]
|
|
!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
|
|
!14 = metadata !{i32 5, i32 9, metadata !13, null}
|
|
!15 = metadata !{i32 5, i32 5, metadata !13, null}
|
|
!16 = metadata !{i32 6, i32 5, metadata !13, null}
|
|
!17 = metadata !{i32 8, i32 3, metadata !1, null}
|
|
!18 = metadata !{i32 9, i32 1, metadata !2, null}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>This example illustrates a few important details about LLVM debugging
|
|
information. In particular, it shows how the <tt>llvm.dbg.declare</tt>
|
|
intrinsic and location information, which are attached to an instruction,
|
|
are applied together to allow a debugger to analyze the relationship between
|
|
statements, variable definitions, and the code used to implement the
|
|
function.</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The first intrinsic
|
|
<tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
|
|
encodes debugging information for the variable <tt>X</tt>. The metadata
|
|
<tt>!dbg !7</tt> attached to the intrinsic provides scope information for the
|
|
variable <tt>X</tt>.</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!7 = metadata !{i32 2, i32 7, metadata !1, null}
|
|
!1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ]
|
|
!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo",
|
|
metadata !"foo", metadata !"foo", metadata !3, i32 1,
|
|
metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ]
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Here <tt>!7</tt> is metadata providing location information. It has four
|
|
fields: line number, column number, scope, and original scope. The original
|
|
scope represents inline location if this instruction is inlined inside a
|
|
caller, and is null otherwise. In this example, scope is encoded by
|
|
<tt>!1</tt>. <tt>!1</tt> represents a lexical block inside the scope
|
|
<tt>!2</tt>, where <tt>!2</tt> is a
|
|
<a href="#format_subprograms">subprogram descriptor</a>. This way the
|
|
location information attached to the intrinsics indicates that the
|
|
variable <tt>X</tt> is declared at line number 2 at a function level scope in
|
|
function <tt>foo</tt>.</p>
|
|
|
|
<p>Now lets take another example.</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The second intrinsic
|
|
<tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt>
|
|
encodes debugging information for variable <tt>Z</tt>. The metadata
|
|
<tt>!dbg !14</tt> attached to the intrinsic provides scope information for
|
|
the variable <tt>Z</tt>.</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ]
|
|
!14 = metadata !{i32 5, i32 9, metadata !13, null}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>Here <tt>!14</tt> indicates that <tt>Z</tt> is declared at line number 5 and
|
|
column number 9 inside of lexical scope <tt>!13</tt>. The lexical scope
|
|
itself resides inside of lexical scope <tt>!1</tt> described above.</p>
|
|
|
|
<p>The scope information attached with each instruction provides a
|
|
straightforward way to find instructions covered by a scope.</p>
|
|
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h2>
|
|
<a name="ccxx_frontend">C/C++ front-end specific debug information</a>
|
|
</h2>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
|
|
<p>The C and C++ front-ends represent information about the program in a format
|
|
that is effectively identical
|
|
to <a href="http://www.eagercon.com/dwarf/dwarf3std.htm">DWARF 3.0</a> in
|
|
terms of information content. This allows code generators to trivially
|
|
support native debuggers by generating standard dwarf information, and
|
|
contains enough information for non-dwarf targets to translate it as
|
|
needed.</p>
|
|
|
|
<p>This section describes the forms used to represent C and C++ programs. Other
|
|
languages could pattern themselves after this (which itself is tuned to
|
|
representing programs in the same way that DWARF 3 does), or they could
|
|
choose to provide completely different forms if they don't fit into the DWARF
|
|
model. As support for debugging information gets added to the various LLVM
|
|
source-language front-ends, the information used should be documented
|
|
here.</p>
|
|
|
|
<p>The following sections provide examples of various C/C++ constructs and the
|
|
debug information that would best describe those constructs.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_compile_units">C/C++ source file information</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>Given the source files <tt>MySource.cpp</tt> and <tt>MyHeader.h</tt> located
|
|
in the directory <tt>/Users/mine/sources</tt>, the following code:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
#include "MyHeader.h"
|
|
|
|
int main(int argc, char *argv[]) {
|
|
return 0;
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>a C/C++ front-end would generate the following descriptors:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
...
|
|
;;
|
|
;; Define the compile unit for the main source file "/Users/mine/sources/MySource.cpp".
|
|
;;
|
|
!2 = metadata !{
|
|
i32 524305, ;; Tag
|
|
i32 0, ;; Unused
|
|
i32 4, ;; Language Id
|
|
metadata !"MySource.cpp",
|
|
metadata !"/Users/mine/sources",
|
|
metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)",
|
|
i1 true, ;; Main Compile Unit
|
|
i1 false, ;; Optimized compile unit
|
|
metadata !"", ;; Compiler flags
|
|
i32 0} ;; Runtime version
|
|
|
|
;;
|
|
;; Define the file for the file "/Users/mine/sources/MySource.cpp".
|
|
;;
|
|
!1 = metadata !{
|
|
i32 524329, ;; Tag
|
|
metadata !"MySource.cpp",
|
|
metadata !"/Users/mine/sources",
|
|
metadata !2 ;; Compile unit
|
|
}
|
|
|
|
;;
|
|
;; Define the file for the file "/Users/mine/sources/Myheader.h"
|
|
;;
|
|
!3 = metadata !{
|
|
i32 524329, ;; Tag
|
|
metadata !"Myheader.h"
|
|
metadata !"/Users/mine/sources",
|
|
metadata !2 ;; Compile unit
|
|
}
|
|
|
|
...
|
|
</pre>
|
|
</div>
|
|
|
|
<p>llvm::Instruction provides easy access to metadata attached with an
|
|
instruction. One can extract line number information encoded in LLVM IR
|
|
using <tt>Instruction::getMetadata()</tt> and
|
|
<tt>DILocation::getLineNumber()</tt>.
|
|
<pre>
|
|
if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction
|
|
DILocation Loc(N); // DILocation is in DebugInfo.h
|
|
unsigned Line = Loc.getLineNumber();
|
|
StringRef File = Loc.getFilename();
|
|
StringRef Dir = Loc.getDirectory();
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_global_variable">C/C++ global variable information</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>Given an integer global variable declared as follows:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
int MyGlobal = 100;
|
|
</pre>
|
|
</div>
|
|
|
|
<p>a C/C++ front-end would generate the following descriptors:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
;;
|
|
;; Define the global itself.
|
|
;;
|
|
%MyGlobal = global int 100
|
|
...
|
|
;;
|
|
;; List of debug info of globals
|
|
;;
|
|
!llvm.dbg.cu = !{!0}
|
|
|
|
;; Define the compile unit.
|
|
!0 = metadata !{
|
|
i32 786449, ;; Tag
|
|
i32 0, ;; Context
|
|
i32 4, ;; Language
|
|
metadata !"foo.cpp", ;; File
|
|
metadata !"/Volumes/Data/tmp", ;; Directory
|
|
metadata !"clang version 3.1 ", ;; Producer
|
|
i1 true, ;; Deprecated field
|
|
i1 false, ;; "isOptimized"?
|
|
metadata !"", ;; Flags
|
|
i32 0, ;; Runtime Version
|
|
metadata !1, ;; Enum Types
|
|
metadata !1, ;; Retained Types
|
|
metadata !1, ;; Subprograms
|
|
metadata !3 ;; Global Variables
|
|
} ; [ DW_TAG_compile_unit ]
|
|
|
|
;; The Array of Global Variables
|
|
!3 = metadata !{
|
|
metadata !4
|
|
}
|
|
|
|
!4 = metadata !{
|
|
metadata !5
|
|
}
|
|
|
|
;;
|
|
;; Define the global variable itself.
|
|
;;
|
|
!5 = metadata !{
|
|
i32 786484, ;; Tag
|
|
i32 0, ;; Unused
|
|
null, ;; Unused
|
|
metadata !"MyGlobal", ;; Name
|
|
metadata !"MyGlobal", ;; Display Name
|
|
metadata !"", ;; Linkage Name
|
|
metadata !6, ;; File
|
|
i32 1, ;; Line
|
|
metadata !7, ;; Type
|
|
i32 0, ;; IsLocalToUnit
|
|
i32 1, ;; IsDefinition
|
|
i32* @MyGlobal ;; LLVM-IR Value
|
|
} ; [ DW_TAG_variable ]
|
|
|
|
;;
|
|
;; Define the file
|
|
;;
|
|
!6 = metadata !{
|
|
i32 786473, ;; Tag
|
|
metadata !"foo.cpp", ;; File
|
|
metadata !"/Volumes/Data/tmp", ;; Directory
|
|
null ;; Unused
|
|
} ; [ DW_TAG_file_type ]
|
|
|
|
;;
|
|
;; Define the type
|
|
;;
|
|
!7 = metadata !{
|
|
i32 786468, ;; Tag
|
|
null, ;; Unused
|
|
metadata !"int", ;; Name
|
|
null, ;; Unused
|
|
i32 0, ;; Line
|
|
i64 32, ;; Size in Bits
|
|
i64 32, ;; Align in Bits
|
|
i64 0, ;; Offset
|
|
i32 0, ;; Flags
|
|
i32 5 ;; Encoding
|
|
} ; [ DW_TAG_base_type ]
|
|
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_subprogram">C/C++ function information</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>Given a function declared as follows:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
int main(int argc, char *argv[]) {
|
|
return 0;
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
<p>a C/C++ front-end would generate the following descriptors:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
;;
|
|
;; Define the anchor for subprograms. Note that the second field of the
|
|
;; anchor is 46, which is the same as the tag for subprograms
|
|
;; (46 = DW_TAG_subprogram.)
|
|
;;
|
|
!6 = metadata !{
|
|
i32 524334, ;; Tag
|
|
i32 0, ;; Unused
|
|
metadata !1, ;; Context
|
|
metadata !"main", ;; Name
|
|
metadata !"main", ;; Display name
|
|
metadata !"main", ;; Linkage name
|
|
metadata !1, ;; File
|
|
i32 1, ;; Line number
|
|
metadata !4, ;; Type
|
|
i1 false, ;; Is local
|
|
i1 true, ;; Is definition
|
|
i32 0, ;; Virtuality attribute, e.g. pure virtual function
|
|
i32 0, ;; Index into virtual table for C++ methods
|
|
i32 0, ;; Type that holds virtual table.
|
|
i32 0, ;; Flags
|
|
i1 false, ;; True if this function is optimized
|
|
Function *, ;; Pointer to llvm::Function
|
|
null ;; Function template parameters
|
|
}
|
|
;;
|
|
;; Define the subprogram itself.
|
|
;;
|
|
define i32 @main(i32 %argc, i8** %argv) {
|
|
...
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_basic_types">C/C++ basic types</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>The following are the basic type descriptors for C/C++ core types:</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_type_bool">bool</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"bool", ;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 8, ;; Size in Bits
|
|
i64 8, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 2 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_char">char</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"char", ;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 8, ;; Size in Bits
|
|
i64 8, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 6 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_unsigned_char">unsigned char</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"unsigned char",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 8, ;; Size in Bits
|
|
i64 8, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 8 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_short">short</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"short int",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 16, ;; Size in Bits
|
|
i64 16, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 5 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_unsigned_short">unsigned short</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"short unsigned int",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 16, ;; Size in Bits
|
|
i64 16, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 7 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_int">int</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"int", ;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 32, ;; Size in Bits
|
|
i64 32, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 5 ;; Encoding
|
|
}
|
|
</pre></div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_unsigned_int">unsigned int</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"unsigned int",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 32, ;; Size in Bits
|
|
i64 32, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 7 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_long_long">long long</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"long long int",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 64, ;; Size in Bits
|
|
i64 64, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 5 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_unsigned_long_long">unsigned long long</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"long long unsigned int",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 64, ;; Size in Bits
|
|
i64 64, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 7 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_float">float</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"float",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 32, ;; Size in Bits
|
|
i64 32, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 4 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="ccxx_basic_double">double</a>
|
|
</h4>
|
|
|
|
<div>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
!2 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"double",;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 64, ;; Size in Bits
|
|
i64 64, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 4 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_derived_types">C/C++ derived types</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>Given the following as an example of C/C++ derived type:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
typedef const int *IntPtr;
|
|
</pre>
|
|
</div>
|
|
|
|
<p>a C/C++ front-end would generate the following descriptors:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
;;
|
|
;; Define the typedef "IntPtr".
|
|
;;
|
|
!2 = metadata !{
|
|
i32 524310, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"IntPtr", ;; Name
|
|
metadata !3, ;; File
|
|
i32 0, ;; Line number
|
|
i64 0, ;; Size in bits
|
|
i64 0, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
metadata !4 ;; Derived From type
|
|
}
|
|
|
|
;;
|
|
;; Define the pointer type.
|
|
;;
|
|
!4 = metadata !{
|
|
i32 524303, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"", ;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 64, ;; Size in bits
|
|
i64 64, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
metadata !5 ;; Derived From type
|
|
}
|
|
;;
|
|
;; Define the const type.
|
|
;;
|
|
!5 = metadata !{
|
|
i32 524326, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"", ;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 32, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
metadata !6 ;; Derived From type
|
|
}
|
|
;;
|
|
;; Define the int type.
|
|
;;
|
|
!6 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"int", ;; Name
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 32, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
5 ;; Encoding
|
|
}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_composite_types">C/C++ struct/union types</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>Given the following as an example of C/C++ struct type:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
struct Color {
|
|
unsigned Red;
|
|
unsigned Green;
|
|
unsigned Blue;
|
|
};
|
|
</pre>
|
|
</div>
|
|
|
|
<p>a C/C++ front-end would generate the following descriptors:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
;;
|
|
;; Define basic type for unsigned int.
|
|
;;
|
|
!5 = metadata !{
|
|
i32 524324, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"unsigned int",
|
|
metadata !1, ;; File
|
|
i32 0, ;; Line number
|
|
i64 32, ;; Size in Bits
|
|
i64 32, ;; Align in Bits
|
|
i64 0, ;; Offset in Bits
|
|
i32 0, ;; Flags
|
|
i32 7 ;; Encoding
|
|
}
|
|
;;
|
|
;; Define composite type for struct Color.
|
|
;;
|
|
!2 = metadata !{
|
|
i32 524307, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"Color", ;; Name
|
|
metadata !1, ;; Compile unit
|
|
i32 1, ;; Line number
|
|
i64 96, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
null, ;; Derived From
|
|
metadata !3, ;; Elements
|
|
i32 0 ;; Runtime Language
|
|
}
|
|
|
|
;;
|
|
;; Define the Red field.
|
|
;;
|
|
!4 = metadata !{
|
|
i32 524301, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"Red", ;; Name
|
|
metadata !1, ;; File
|
|
i32 2, ;; Line number
|
|
i64 32, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
metadata !5 ;; Derived From type
|
|
}
|
|
|
|
;;
|
|
;; Define the Green field.
|
|
;;
|
|
!6 = metadata !{
|
|
i32 524301, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"Green", ;; Name
|
|
metadata !1, ;; File
|
|
i32 3, ;; Line number
|
|
i64 32, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 32, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
metadata !5 ;; Derived From type
|
|
}
|
|
|
|
;;
|
|
;; Define the Blue field.
|
|
;;
|
|
!7 = metadata !{
|
|
i32 524301, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"Blue", ;; Name
|
|
metadata !1, ;; File
|
|
i32 4, ;; Line number
|
|
i64 32, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 64, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
metadata !5 ;; Derived From type
|
|
}
|
|
|
|
;;
|
|
;; Define the array of fields used by the composite type Color.
|
|
;;
|
|
!3 = metadata !{metadata !4, metadata !6, metadata !7}
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="ccxx_enumeration_types">C/C++ enumeration types</a>
|
|
</h3>
|
|
|
|
<div>
|
|
|
|
<p>Given the following as an example of C/C++ enumeration type:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
enum Trees {
|
|
Spruce = 100,
|
|
Oak = 200,
|
|
Maple = 300
|
|
};
|
|
</pre>
|
|
</div>
|
|
|
|
<p>a C/C++ front-end would generate the following descriptors:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
;;
|
|
;; Define composite type for enum Trees
|
|
;;
|
|
!2 = metadata !{
|
|
i32 524292, ;; Tag
|
|
metadata !1, ;; Context
|
|
metadata !"Trees", ;; Name
|
|
metadata !1, ;; File
|
|
i32 1, ;; Line number
|
|
i64 32, ;; Size in bits
|
|
i64 32, ;; Align in bits
|
|
i64 0, ;; Offset in bits
|
|
i32 0, ;; Flags
|
|
null, ;; Derived From type
|
|
metadata !3, ;; Elements
|
|
i32 0 ;; Runtime language
|
|
}
|
|
|
|
;;
|
|
;; Define the array of enumerators used by composite type Trees.
|
|
;;
|
|
!3 = metadata !{metadata !4, metadata !5, metadata !6}
|
|
|
|
;;
|
|
;; Define Spruce enumerator.
|
|
;;
|
|
!4 = metadata !{i32 524328, metadata !"Spruce", i64 100}
|
|
|
|
;;
|
|
;; Define Oak enumerator.
|
|
;;
|
|
!5 = metadata !{i32 524328, metadata !"Oak", i64 200}
|
|
|
|
;;
|
|
;; Define Maple enumerator.
|
|
;;
|
|
!6 = metadata !{i32 524328, metadata !"Maple", i64 300}
|
|
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
</div>
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h2>
|
|
<a name="llvmdwarfextension">Debugging information format</a>
|
|
</h2>
|
|
<!-- *********************************************************************** -->
|
|
<div>
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="objcproperty">Debugging Information Extension for Objective C Properties</a>
|
|
</h3>
|
|
<div>
|
|
<!-- *********************************************************************** -->
|
|
<h4>
|
|
<a name="objcpropertyintroduction">Introduction</a>
|
|
</h4>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
<p>Objective C provides a simpler way to declare and define accessor methods
|
|
using declared properties. The language provides features to declare a
|
|
property and to let compiler synthesize accessor methods.
|
|
</p>
|
|
|
|
<p>The debugger lets developer inspect Objective C interfaces and their
|
|
instance variables and class variables. However, the debugger does not know
|
|
anything about the properties defined in Objective C interfaces. The debugger
|
|
consumes information generated by compiler in DWARF format. The format does
|
|
not support encoding of Objective C properties. This proposal describes DWARF
|
|
extensions to encode Objective C properties, which the debugger can use to let
|
|
developers inspect Objective C properties.
|
|
</p>
|
|
|
|
</div>
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h4>
|
|
<a name="objcpropertyproposal">Proposal</a>
|
|
</h4>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
<p>Objective C properties exist separately from class members. A property
|
|
can be defined only by "setter" and "getter" selectors, and
|
|
be calculated anew on each access. Or a property can just be a direct access
|
|
to some declared ivar. Finally it can have an ivar "automatically
|
|
synthesized" for it by the compiler, in which case the property can be
|
|
referred to in user code directly using the standard C dereference syntax as
|
|
well as through the property "dot" syntax, but there is no entry in
|
|
the @interface declaration corresponding to this ivar.
|
|
</p>
|
|
<p>
|
|
To facilitate debugging, these properties we will add a new DWARF TAG into the
|
|
DW_TAG_structure_type definition for the class to hold the description of a
|
|
given property, and a set of DWARF attributes that provide said description.
|
|
The property tag will also contain the name and declared type of the property.
|
|
</p>
|
|
<p>
|
|
If there is a related ivar, there will also be a DWARF property attribute placed
|
|
in the DW_TAG_member DIE for that ivar referring back to the property TAG for
|
|
that property. And in the case where the compiler synthesizes the ivar directly,
|
|
the compiler is expected to generate a DW_TAG_member for that ivar (with the
|
|
DW_AT_artificial set to 1), whose name will be the name used to access this
|
|
ivar directly in code, and with the property attribute pointing back to the
|
|
property it is backing.
|
|
</p>
|
|
<p>
|
|
The following examples will serve as illustration for our discussion:
|
|
</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
@interface I1 {
|
|
int n2;
|
|
}
|
|
|
|
@property int p1;
|
|
@property int p2;
|
|
@end
|
|
|
|
@implementation I1
|
|
@synthesize p1;
|
|
@synthesize p2 = n2;
|
|
@end
|
|
</pre>
|
|
</div>
|
|
|
|
<p>
|
|
This produces the following DWARF (this is a "pseudo dwarfdump" output):
|
|
</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
0x00000100: TAG_structure_type [7] *
|
|
AT_APPLE_runtime_class( 0x10 )
|
|
AT_name( "I1" )
|
|
AT_decl_file( "Objc_Property.m" )
|
|
AT_decl_line( 3 )
|
|
|
|
0x00000110 TAG_APPLE_property
|
|
AT_name ( "p1" )
|
|
AT_type ( {0x00000150} ( int ) )
|
|
|
|
0x00000120: TAG_APPLE_property
|
|
AT_name ( "p2" )
|
|
AT_type ( {0x00000150} ( int ) )
|
|
|
|
0x00000130: TAG_member [8]
|
|
AT_name( "_p1" )
|
|
AT_APPLE_property ( {0x00000110} "p1" )
|
|
AT_type( {0x00000150} ( int ) )
|
|
AT_artificial ( 0x1 )
|
|
|
|
0x00000140: TAG_member [8]
|
|
AT_name( "n2" )
|
|
AT_APPLE_property ( {0x00000120} "p2" )
|
|
AT_type( {0x00000150} ( int ) )
|
|
|
|
0x00000150: AT_type( ( int ) )
|
|
</pre>
|
|
</div>
|
|
|
|
<p> Note, the current convention is that the name of the ivar for an
|
|
auto-synthesized property is the name of the property from which it derives with
|
|
an underscore prepended, as is shown in the example.
|
|
But we actually don't need to know this convention, since we are given the name
|
|
of the ivar directly.
|
|
</p>
|
|
|
|
<p>
|
|
Also, it is common practice in ObjC to have different property declarations in
|
|
the @interface and @implementation - e.g. to provide a read-only property in
|
|
the interface,and a read-write interface in the implementation. In that case,
|
|
the compiler should emit whichever property declaration will be in force in the
|
|
current translation unit.
|
|
</p>
|
|
|
|
<p> Developers can decorate a property with attributes which are encoded using
|
|
DW_AT_APPLE_property_attribute.
|
|
</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
@property (readonly, nonatomic) int pr;
|
|
</pre>
|
|
</div>
|
|
<p>
|
|
Which produces a property tag:
|
|
<p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
TAG_APPLE_property [8]
|
|
AT_name( "pr" )
|
|
AT_type ( {0x00000147} (int) )
|
|
AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic)
|
|
</pre>
|
|
</div>
|
|
|
|
<p> The setter and getter method names are attached to the property using
|
|
DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes.
|
|
</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
@interface I1
|
|
@property (setter=myOwnP3Setter:) int p3;
|
|
-(void)myOwnP3Setter:(int)a;
|
|
@end
|
|
|
|
@implementation I1
|
|
@synthesize p3;
|
|
-(void)myOwnP3Setter:(int)a{ }
|
|
@end
|
|
</pre>
|
|
</div>
|
|
|
|
<p>
|
|
The DWARF for this would be:
|
|
</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
0x000003bd: TAG_structure_type [7] *
|
|
AT_APPLE_runtime_class( 0x10 )
|
|
AT_name( "I1" )
|
|
AT_decl_file( "Objc_Property.m" )
|
|
AT_decl_line( 3 )
|
|
|
|
0x000003cd TAG_APPLE_property
|
|
AT_name ( "p3" )
|
|
AT_APPLE_property_setter ( "myOwnP3Setter:" )
|
|
AT_type( {0x00000147} ( int ) )
|
|
|
|
0x000003f3: TAG_member [8]
|
|
AT_name( "_p3" )
|
|
AT_type ( {0x00000147} ( int ) )
|
|
AT_APPLE_property ( {0x000003cd} )
|
|
AT_artificial ( 0x1 )
|
|
</pre>
|
|
</div>
|
|
|
|
</div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h4>
|
|
<a name="objcpropertynewtags">New DWARF Tags</a>
|
|
</h4>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
<table border="1" cellspacing="0">
|
|
<col width="200">
|
|
<col width="200">
|
|
<tr>
|
|
<th>TAG</th>
|
|
<th>Value</th>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_TAG_APPLE_property</td>
|
|
<td>0x4200</td>
|
|
</tr>
|
|
</table>
|
|
|
|
</div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h4>
|
|
<a name="objcpropertynewattributes">New DWARF Attributes</a>
|
|
</h4>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
<table border="1" cellspacing="0">
|
|
<col width="200">
|
|
<col width="200">
|
|
<col width="200">
|
|
<tr>
|
|
<th>Attribute</th>
|
|
<th>Value</th>
|
|
<th>Classes</th>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_property</td>
|
|
<td>0x3fed</td>
|
|
<td>Reference</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_property_getter</td>
|
|
<td>0x3fe9</td>
|
|
<td>String</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_property_setter</td>
|
|
<td>0x3fea</td>
|
|
<td>String</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_property_attribute</td>
|
|
<td>0x3feb</td>
|
|
<td>Constant</td>
|
|
</tr>
|
|
</table>
|
|
|
|
</div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
<h4>
|
|
<a name="objcpropertynewconstants">New DWARF Constants</a>
|
|
</h4>
|
|
<!-- *********************************************************************** -->
|
|
|
|
<div>
|
|
<table border="1" cellspacing="0">
|
|
<col width="200">
|
|
<col width="200">
|
|
<tr>
|
|
<th>Name</th>
|
|
<th>Value</th>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_PROPERTY_readonly</td>
|
|
<td>0x1</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_PROPERTY_readwrite</td>
|
|
<td>0x2</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_PROPERTY_assign</td>
|
|
<td>0x4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_PROPERTY_retain</td>
|
|
<td>0x8</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_PROPERTY_copy</td>
|
|
<td>0x10</td>
|
|
</tr>
|
|
<tr>
|
|
<td>DW_AT_APPLE_PROPERTY_nonatomic</td>
|
|
<td>0x20</td>
|
|
</tr>
|
|
</table>
|
|
|
|
</div>
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h3>
|
|
<a name="acceltable">Name Accelerator Tables</a>
|
|
</h3>
|
|
<!-- ======================================================================= -->
|
|
<div>
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="acceltableintroduction">Introduction</a>
|
|
</h4>
|
|
<!-- ======================================================================= -->
|
|
<div>
|
|
<p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger
|
|
needs. The "pub" in the section name indicates that the entries in the
|
|
table are publicly visible names only. This means no static or hidden
|
|
functions show up in the .debug_pubnames. No static variables or private class
|
|
variables are in the .debug_pubtypes. Many compilers add different things to
|
|
these tables, so we can't rely upon the contents between gcc, icc, or clang.</p>
|
|
|
|
<p>The typical query given by users tends not to match up with the contents of
|
|
these tables. For example, the DWARF spec states that "In the case of the
|
|
name of a function member or static data member of a C++ structure, class or
|
|
union, the name presented in the .debug_pubnames section is not the simple
|
|
name given by the DW_AT_name attribute of the referenced debugging information
|
|
entry, but rather the fully qualified name of the data or function member."
|
|
So the only names in these tables for complex C++ entries is a fully
|
|
qualified name. Debugger users tend not to enter their search strings as
|
|
"a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So
|
|
the name entered in the name table must be demangled in order to chop it up
|
|
appropriately and additional names must be manually entered into the table
|
|
to make it effective as a name lookup table for debuggers to use.</p>
|
|
|
|
<p>All debuggers currently ignore the .debug_pubnames table as a result of
|
|
its inconsistent and useless public-only name content making it a waste of
|
|
space in the object file. These tables, when they are written to disk, are
|
|
not sorted in any way, leaving every debugger to do its own parsing
|
|
and sorting. These tables also include an inlined copy of the string values
|
|
in the table itself making the tables much larger than they need to be on
|
|
disk, especially for large C++ programs.</p>
|
|
|
|
<p>Can't we just fix the sections by adding all of the names we need to this
|
|
table? No, because that is not what the tables are defined to contain and we
|
|
won't know the difference between the old bad tables and the new good tables.
|
|
At best we could make our own renamed sections that contain all of the data
|
|
we need.</p>
|
|
|
|
<p>These tables are also insufficient for what a debugger like LLDB needs.
|
|
LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is
|
|
then often asked to look for type "foo" or namespace "bar", or list items in
|
|
namespace "baz". Namespaces are not included in the pubnames or pubtypes
|
|
tables. Since clang asks a lot of questions when it is parsing an expression,
|
|
we need to be very fast when looking up names, as it happens a lot. Having new
|
|
accelerator tables that are optimized for very quick lookups will benefit
|
|
this type of debugging experience greatly.</p>
|
|
|
|
<p>We would like to generate name lookup tables that can be mapped into
|
|
memory from disk, and used as is, with little or no up-front parsing. We would
|
|
also be able to control the exact content of these different tables so they
|
|
contain exactly what we need. The Name Accelerator Tables were designed
|
|
to fix these issues. In order to solve these issues we need to:</p>
|
|
|
|
<ul>
|
|
<li>Have a format that can be mapped into memory from disk and used as is</li>
|
|
<li>Lookups should be very fast</li>
|
|
<li>Extensible table format so these tables can be made by many producers</li>
|
|
<li>Contain all of the names needed for typical lookups out of the box</li>
|
|
<li>Strict rules for the contents of tables</li>
|
|
</ul>
|
|
|
|
<p>Table size is important and the accelerator table format should allow the
|
|
reuse of strings from common string tables so the strings for the names are
|
|
not duplicated. We also want to make sure the table is ready to be used as-is
|
|
by simply mapping the table into memory with minimal header parsing.</p>
|
|
|
|
<p>The name lookups need to be fast and optimized for the kinds of lookups
|
|
that debuggers tend to do. Optimally we would like to touch as few parts of
|
|
the mapped table as possible when doing a name lookup and be able to quickly
|
|
find the name entry we are looking for, or discover there are no matches. In
|
|
the case of debuggers we optimized for lookups that fail most of the time.</p>
|
|
|
|
<p>Each table that is defined should have strict rules on exactly what is in
|
|
the accelerator tables and documented so clients can rely on the content.</p>
|
|
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="acceltablehashes">Hash Tables</a>
|
|
</h4>
|
|
<!-- ======================================================================= -->
|
|
|
|
<div>
|
|
<h5>Standard Hash Tables</h5>
|
|
|
|
<p>Typical hash tables have a header, buckets, and each bucket points to the
|
|
bucket contents:
|
|
</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
.------------.
|
|
| HEADER |
|
|
|------------|
|
|
| BUCKETS |
|
|
|------------|
|
|
| DATA |
|
|
`------------'
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The BUCKETS are an array of offsets to DATA for each hash:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
.------------.
|
|
| 0x00001000 | BUCKETS[0]
|
|
| 0x00002000 | BUCKETS[1]
|
|
| 0x00002200 | BUCKETS[2]
|
|
| 0x000034f0 | BUCKETS[3]
|
|
| | ...
|
|
| 0xXXXXXXXX | BUCKETS[n_buckets]
|
|
'------------'
|
|
</pre>
|
|
</div>
|
|
|
|
<p>So for bucket[3] in the example above, we have an offset into the table
|
|
0x000034f0 which points to a chain of entries for the bucket. Each bucket
|
|
must contain a next pointer, full 32 bit hash value, the string itself,
|
|
and the data for the current string value.</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
.------------.
|
|
0x000034f0: | 0x00003500 | next pointer
|
|
| 0x12345678 | 32 bit hash
|
|
| "erase" | string value
|
|
| data[n] | HashData for this bucket
|
|
|------------|
|
|
0x00003500: | 0x00003550 | next pointer
|
|
| 0x29273623 | 32 bit hash
|
|
| "dump" | string value
|
|
| data[n] | HashData for this bucket
|
|
|------------|
|
|
0x00003550: | 0x00000000 | next pointer
|
|
| 0x82638293 | 32 bit hash
|
|
| "main" | string value
|
|
| data[n] | HashData for this bucket
|
|
`------------'
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The problem with this layout for debuggers is that we need to optimize for
|
|
the negative lookup case where the symbol we're searching for is not present.
|
|
So if we were to lookup "printf" in the table above, we would make a 32 hash
|
|
for "printf", it might match bucket[3]. We would need to go to the offset
|
|
0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we
|
|
need to read the next pointer, then read the hash, compare it, and skip to
|
|
the next bucket. Each time we are skipping many bytes in memory and touching
|
|
new cache pages just to do the compare on the full 32 bit hash. All of these
|
|
accesses then tell us that we didn't have a match.</p>
|
|
|
|
<h5>Name Hash Tables</h5>
|
|
|
|
<p>To solve the issues mentioned above we have structured the hash tables
|
|
a bit differently: a header, buckets, an array of all unique 32 bit hash
|
|
values, followed by an array of hash value data offsets, one for each hash
|
|
value, then the data for all hash values:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
.-------------.
|
|
| HEADER |
|
|
|-------------|
|
|
| BUCKETS |
|
|
|-------------|
|
|
| HASHES |
|
|
|-------------|
|
|
| OFFSETS |
|
|
|-------------|
|
|
| DATA |
|
|
`-------------'
|
|
</pre>
|
|
</div>
|
|
|
|
<p>The BUCKETS in the name tables are an index into the HASHES array. By
|
|
making all of the full 32 bit hash values contiguous in memory, we allow
|
|
ourselves to efficiently check for a match while touching as little
|
|
memory as possible. Most often checking the 32 bit hash values is as far as
|
|
the lookup goes. If it does match, it usually is a match with no collisions.
|
|
So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash
|
|
values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
.-------------------------.
|
|
| HEADER.magic | uint32_t
|
|
| HEADER.version | uint16_t
|
|
| HEADER.hash_function | uint16_t
|
|
| HEADER.bucket_count | uint32_t
|
|
| HEADER.hashes_count | uint32_t
|
|
| HEADER.header_data_len | uint32_t
|
|
| HEADER_DATA | HeaderData
|
|
|-------------------------|
|
|
| BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes
|
|
|-------------------------|
|
|
| HASHES | uint32_t[n_buckets] // 32 bit hash values
|
|
|-------------------------|
|
|
| OFFSETS | uint32_t[n_buckets] // 32 bit offsets to hash value data
|
|
|-------------------------|
|
|
| ALL HASH DATA |
|
|
`-------------------------'
|
|
</pre>
|
|
</div>
|
|
|
|
<p>So taking the exact same data from the standard hash example above we end up
|
|
with:</p>
|
|
|
|
<div class="doc_code">
|
|
<pre>
|
|
.------------.
|
|
| HEADER |
|
|
|------------|
|
|
| 0 | BUCKETS[0]
|
|
| 2 | BUCKETS[1]
|
|
| 5 | BUCKETS[2]
|
|
| 6 | BUCKETS[3]
|
|
| | ...
|
|
| ... | BUCKETS[n_buckets]
|
|
|------------|
|
|
| 0x........ | HASHES[0]
|
|
| 0x........ | HASHES[1]
|
|
| 0x........ | HASHES[2]
|
|
| 0x........ | HASHES[3]
|
|
| 0x........ | HASHES[4]
|
|
| 0x........ | HASHES[5]
|
|
| 0x12345678 | HASHES[6] hash for BUCKETS[3]
|
|
| 0x29273623 | HASHES[7] hash for BUCKETS[3]
|
|
| 0x82638293 | HASHES[8] hash for BUCKETS[3]
|
|
| 0x........ | HASHES[9]
|
|
| 0x........ | HASHES[10]
|
|
| 0x........ | HASHES[11]
|
|
| 0x........ | HASHES[12]
|
|
| 0x........ | HASHES[13]
|
|
| 0x........ | HASHES[n_hashes]
|
|
|------------|
|
|
| 0x........ | OFFSETS[0]
|
|
| 0x........ | OFFSETS[1]
|
|
| 0x........ | OFFSETS[2]
|
|
| 0x........ | OFFSETS[3]
|
|
| 0x........ | OFFSETS[4]
|
|
| 0x........ | OFFSETS[5]
|
|
| 0x000034f0 | OFFSETS[6] offset for BUCKETS[3]
|
|
| 0x00003500 | OFFSETS[7] offset for BUCKETS[3]
|
|
| 0x00003550 | OFFSETS[8] offset for BUCKETS[3]
|
|
| 0x........ | OFFSETS[9]
|
|
| 0x........ | OFFSETS[10]
|
|
| 0x........ | OFFSETS[11]
|
|
| 0x........ | OFFSETS[12]
|
|
| 0x........ | OFFSETS[13]
|
|
| 0x........ | OFFSETS[n_hashes]
|
|
|------------|
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
| |
|
|
|------------|
|
|
0x000034f0: | 0x00001203 | .debug_str ("erase")
|
|
| 0x00000004 | A 32 bit array count - number of HashData with name "erase"
|
|
| 0x........ | HashData[0]
|
|
| 0x........ | HashData[1]
|
|
| 0x........ | HashData[2]
|
|
| 0x........ | HashData[3]
|
|
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|
|
|------------|
|
|
0x00003500: | 0x00001203 | String offset into .debug_str ("collision")
|
|
| 0x00000002 | A 32 bit array count - number of HashData with name "collision"
|
|
| 0x........ | HashData[0]
|
|
| 0x........ | HashData[1]
|
|
| 0x00001203 | String offset into .debug_str ("dump")
|
|
| 0x00000003 | A 32 bit array count - number of HashData with name "dump"
|
|
| 0x........ | HashData[0]
|
|
| 0x........ | HashData[1]
|
|
| 0x........ | HashData[2]
|
|
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|
|
|------------|
|
|
0x00003550: | 0x00001203 | String offset into .debug_str ("main")
|
|
| 0x00000009 | A 32 bit array count - number of HashData with name "main"
|
|
| 0x........ | HashData[0]
|
|
| 0x........ | HashData[1]
|
|
| 0x........ | HashData[2]
|
|
| 0x........ | HashData[3]
|
|
| 0x........ | HashData[4]
|
|
| 0x........ | HashData[5]
|
|
| 0x........ | HashData[6]
|
|
| 0x........ | HashData[7]
|
|
| 0x........ | HashData[8]
|
|
| 0x00000000 | String offset into .debug_str (terminate data for hash)
|
|
`------------'
|
|
</pre>
|
|
</div>
|
|
|
|
<p>So we still have all of the same data, we just organize it more efficiently
|
|
for debugger lookup. If we repeat the same "printf" lookup from above, we
|
|
would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash
|
|
value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index
|
|
into the HASHES table. We would then compare any consecutive 32 bit hashes
|
|
values in the HASHES array as long as the hashes would be in BUCKETS[3]. We
|
|
do this by verifying that each subsequent hash value modulo n_buckets is still
|
|
3. In the case of a failed lookup we would access the memory for BUCKETS[3], and
|
|
then compare a few consecutive 32 bit hashes before we know that we have no match.
|
|
We don't end up marching through multiple words of memory and we really keep the
|
|
number of processor data cache lines being accessed as small as possible.</p>
|
|
|
|
<p>The string hash that is used for these lookup tables is the Daniel J.
|
|
Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very
|
|
good hash for all kinds of names in programs with very few hash collisions.</p>
|
|
|
|
<p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p>
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="acceltabledetails">Details</a>
|
|
</h4>
|
|
<!-- ======================================================================= -->
|
|
<div>
|
|
<p>These name hash tables are designed to be generic where specializations of
|
|
the table get to define additional data that goes into the header
|
|
("HeaderData"), how the string value is stored ("KeyType") and the content
|
|
of the data for each hash value.</p>
|
|
|
|
<h5>Header Layout</h5>
|
|
<p>The header has a fixed part, and the specialized part. The exact format of
|
|
the header is:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
struct Header
|
|
{
|
|
uint32_t magic; // 'HASH' magic value to allow endian detection
|
|
uint16_t version; // Version number
|
|
uint16_t hash_function; // The hash function enumeration that was used
|
|
uint32_t bucket_count; // The number of buckets in this hash table
|
|
uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table
|
|
uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment
|
|
// Specifically the length of the following HeaderData field - this does not
|
|
// include the size of the preceding fields
|
|
HeaderData header_data; // Implementation specific header data
|
|
};
|
|
</pre>
|
|
</div>
|
|
<p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as
|
|
an ASCII integer. This allows the detection of the start of the hash table and
|
|
also allows the table's byte order to be determined so the table can be
|
|
correctly extracted. The "magic" value is followed by a 16 bit version number
|
|
which allows the table to be revised and modified in the future. The current
|
|
version number is 1. "hash_function" is a uint16_t enumeration that specifies
|
|
which hash function was used to produce this table. The current values for the
|
|
hash function enumerations include:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
enum HashFunctionType
|
|
{
|
|
eHashFunctionDJB = 0u, // Daniel J Bernstein hash function
|
|
};
|
|
</pre>
|
|
</div>
|
|
<p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets
|
|
are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash
|
|
values that are in the HASHES array, and is the same number of offsets are
|
|
contained in the OFFSETS array. "header_data_len" specifies the size in
|
|
bytes of the HeaderData that is filled in by specialized versions of this
|
|
table.</p>
|
|
|
|
<h5>Fixed Lookup</h5>
|
|
<p>The header is followed by the buckets, hashes, offsets, and hash value
|
|
data.
|
|
<div class="doc_code">
|
|
<pre>
|
|
struct FixedTable
|
|
{
|
|
uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below
|
|
uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table
|
|
uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above
|
|
};
|
|
</pre>
|
|
</div>
|
|
<p>"buckets" is an array of 32 bit indexes into the "hashes" array. The
|
|
"hashes" array contains all of the 32 bit hash values for all names in the
|
|
hash table. Each hash in the "hashes" table has an offset in the "offsets"
|
|
array that points to the data for the hash value.</p>
|
|
|
|
<p>This table setup makes it very easy to repurpose these tables to contain
|
|
different data, while keeping the lookup mechanism the same for all tables.
|
|
This layout also makes it possible to save the table to disk and map it in
|
|
later and do very efficient name lookups with little or no parsing.</p>
|
|
|
|
<p>DWARF lookup tables can be implemented in a variety of ways and can store
|
|
a lot of information for each name. We want to make the DWARF tables
|
|
extensible and able to store the data efficiently so we have used some of the
|
|
DWARF features that enable efficient data storage to define exactly what kind
|
|
of data we store for each name.</p>
|
|
|
|
<p>The "HeaderData" contains a definition of the contents of each HashData
|
|
chunk. We might want to store an offset to all of the debug information
|
|
entries (DIEs) for each name. To keep things extensible, we create a list of
|
|
items, or Atoms, that are contained in the data for each name. First comes the
|
|
type of the data in each atom:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
enum AtomType
|
|
{
|
|
eAtomTypeNULL = 0u,
|
|
eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding
|
|
eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question
|
|
eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2
|
|
eAtomTypeNameFlags = 4u, // Flags from enum NameFlags
|
|
eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags
|
|
};
|
|
</pre>
|
|
</div>
|
|
<p>The enumeration values and their meanings are:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
eAtomTypeNULL - a termination atom that specifies the end of the atom list
|
|
eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name
|
|
eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE
|
|
eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is
|
|
eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...)
|
|
eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...)
|
|
</pre>
|
|
</div>
|
|
<p>Then we allow each atom type to define the atom type and how the data for
|
|
each atom type data is encoded:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
struct Atom
|
|
{
|
|
uint16_t type; // AtomType enum value
|
|
uint16_t form; // DWARF DW_FORM_XXX defines
|
|
};
|
|
</pre>
|
|
</div>
|
|
<p>The "form" type above is from the DWARF specification and defines the
|
|
exact encoding of the data for the Atom type. See the DWARF specification for
|
|
the DW_FORM_ definitions.</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
struct HeaderData
|
|
{
|
|
uint32_t die_offset_base;
|
|
uint32_t atom_count;
|
|
Atoms atoms[atom_count0];
|
|
};
|
|
</pre>
|
|
</div>
|
|
<p>"HeaderData" defines the base DIE offset that should be added to any atoms
|
|
that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4,
|
|
DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in
|
|
each "HashData" object -- Atom.form tells us how large each field will be in
|
|
the HashData and the Atom.type tells us how this data should be interpreted.</p>
|
|
|
|
<p>For the current implementations of the ".apple_names" (all functions + globals),
|
|
the ".apple_types" (names of all types that are defined), and the
|
|
".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
HeaderData.atom_count = 1;
|
|
HeaderData.atoms[0].type = eAtomTypeDIEOffset;
|
|
HeaderData.atoms[0].form = DW_FORM_data4;
|
|
</pre>
|
|
</div>
|
|
<p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is
|
|
encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have
|
|
multiple matching DIEs in a single file, which could come up with an inlined
|
|
function for instance. Future tables could include more information about the
|
|
DIE such as flags indicating if the DIE is a function, method, block,
|
|
or inlined.</p>
|
|
|
|
<p>The KeyType for the DWARF table is a 32 bit string table offset into the
|
|
".debug_str" table. The ".debug_str" is the string table for the DWARF which
|
|
may already contain copies of all of the strings. This helps make sure, with
|
|
help from the compiler, that we reuse the strings between all of the DWARF
|
|
sections and keeps the hash table size down. Another benefit to having the
|
|
compiler generate all strings as DW_FORM_strp in the debug info, is that
|
|
DWARF parsing can be made much faster.</p>
|
|
|
|
<p>After a lookup is made, we get an offset into the hash data. The hash data
|
|
needs to be able to deal with 32 bit hash collisions, so the chunk of data
|
|
at the offset in the hash data consists of a triple:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
uint32_t str_offset
|
|
uint32_t hash_data_count
|
|
HashData[hash_data_count]
|
|
</pre>
|
|
</div>
|
|
<p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the
|
|
hash data chunks contain a single item (no 32 bit hash collision):</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
.------------.
|
|
| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
|
|
| 0x00000004 | uint32_t HashData count
|
|
| 0x........ | uint32_t HashData[0] DIE offset
|
|
| 0x........ | uint32_t HashData[1] DIE offset
|
|
| 0x........ | uint32_t HashData[2] DIE offset
|
|
| 0x........ | uint32_t HashData[3] DIE offset
|
|
| 0x00000000 | uint32_t KeyType (end of hash chain)
|
|
`------------'
|
|
</pre>
|
|
</div>
|
|
<p>If there are collisions, you will have multiple valid string offsets:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
.------------.
|
|
| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main")
|
|
| 0x00000004 | uint32_t HashData count
|
|
| 0x........ | uint32_t HashData[0] DIE offset
|
|
| 0x........ | uint32_t HashData[1] DIE offset
|
|
| 0x........ | uint32_t HashData[2] DIE offset
|
|
| 0x........ | uint32_t HashData[3] DIE offset
|
|
| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print")
|
|
| 0x00000002 | uint32_t HashData count
|
|
| 0x........ | uint32_t HashData[0] DIE offset
|
|
| 0x........ | uint32_t HashData[1] DIE offset
|
|
| 0x00000000 | uint32_t KeyType (end of hash chain)
|
|
`------------'
|
|
</pre>
|
|
</div>
|
|
<p>Current testing with real world C++ binaries has shown that there is around 1
|
|
32 bit hash collision per 100,000 name entries.</p>
|
|
</div>
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="acceltablecontents">Contents</a>
|
|
</h4>
|
|
<!-- ======================================================================= -->
|
|
<div>
|
|
<p>As we said, we want to strictly define exactly what is included in the
|
|
different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types",
|
|
and ".apple_namespaces".</p>
|
|
|
|
<p>".apple_names" sections should contain an entry for each DWARF DIE whose
|
|
DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that
|
|
has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or
|
|
DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr
|
|
in the location (global and static variables). All global and static variables
|
|
should be included, including those scoped withing functions and classes. For
|
|
example using the following code:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
static int var = 0;
|
|
|
|
void f ()
|
|
{
|
|
static int var = 0;
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p>Both of the static "var" variables would be included in the table. All
|
|
functions should emit both their full names and their basenames. For C or C++,
|
|
the full name is the mangled name (if available) which is usually in the
|
|
DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function
|
|
basename. If global or static variables have a mangled name in a
|
|
DW_AT_MIPS_linkage_name attribute, this should be emitted along with the
|
|
simple name found in the DW_AT_name attribute.</p>
|
|
|
|
<p>".apple_types" sections should contain an entry for each DWARF DIE whose
|
|
tag is one of:</p>
|
|
<ul>
|
|
<li>DW_TAG_array_type</li>
|
|
<li>DW_TAG_class_type</li>
|
|
<li>DW_TAG_enumeration_type</li>
|
|
<li>DW_TAG_pointer_type</li>
|
|
<li>DW_TAG_reference_type</li>
|
|
<li>DW_TAG_string_type</li>
|
|
<li>DW_TAG_structure_type</li>
|
|
<li>DW_TAG_subroutine_type</li>
|
|
<li>DW_TAG_typedef</li>
|
|
<li>DW_TAG_union_type</li>
|
|
<li>DW_TAG_ptr_to_member_type</li>
|
|
<li>DW_TAG_set_type</li>
|
|
<li>DW_TAG_subrange_type</li>
|
|
<li>DW_TAG_base_type</li>
|
|
<li>DW_TAG_const_type</li>
|
|
<li>DW_TAG_constant</li>
|
|
<li>DW_TAG_file_type</li>
|
|
<li>DW_TAG_namelist</li>
|
|
<li>DW_TAG_packed_type</li>
|
|
<li>DW_TAG_volatile_type</li>
|
|
<li>DW_TAG_restrict_type</li>
|
|
<li>DW_TAG_interface_type</li>
|
|
<li>DW_TAG_unspecified_type</li>
|
|
<li>DW_TAG_shared_type</li>
|
|
</ul>
|
|
<p>Only entries with a DW_AT_name attribute are included, and the entry must
|
|
not be a forward declaration (DW_AT_declaration attribute with a non-zero value).
|
|
For example, using the following code:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
int main ()
|
|
{
|
|
int *b = 0;
|
|
return *b;
|
|
}
|
|
</pre>
|
|
</div>
|
|
<p>We get a few type DIEs:</p>
|
|
<div class="doc_code">
|
|
<pre>
|
|
0x00000067: TAG_base_type [5]
|
|
AT_encoding( DW_ATE_signed )
|
|
AT_name( "int" )
|
|
AT_byte_size( 0x04 )
|
|
|
|
0x0000006e: TAG_pointer_type [6]
|
|
AT_type( {0x00000067} ( int ) )
|
|
AT_byte_size( 0x08 )
|
|
</pre>
|
|
</div>
|
|
<p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p>
|
|
|
|
<p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If
|
|
we run into a namespace that has no name this is an anonymous namespace,
|
|
and the name should be output as "(anonymous namespace)" (without the quotes).
|
|
Why? This matches the output of the abi::cxa_demangle() that is in the standard
|
|
C++ library that demangles mangled names.</p>
|
|
</div>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h4>
|
|
<a name="acceltableextensions">Language Extensions and File Format Changes</a>
|
|
</h4>
|
|
<!-- ======================================================================= -->
|
|
<div>
|
|
<h5>Objective-C Extensions</h5>
|
|
<p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an
|
|
Objective-C class. The name used in the hash table is the name of the
|
|
Objective-C class itself. If the Objective-C class has a category, then an
|
|
entry is made for both the class name without the category, and for the class
|
|
name with the category. So if we have a DIE at offset 0x1234 with a name
|
|
of method "-[NSString(my_additions) stringWithSpecialString:]", we would add
|
|
an entry for "NSString" that points to DIE 0x1234, and an entry for
|
|
"NSString(my_additions)" that points to 0x1234. This allows us to quickly
|
|
track down all Objective-C methods for an Objective-C class when doing
|
|
expressions. It is needed because of the dynamic nature of Objective-C where
|
|
anyone can add methods to a class. The DWARF for Objective-C methods is also
|
|
emitted differently from C++ classes where the methods are not usually
|
|
contained in the class definition, they are scattered about across one or more
|
|
compile units. Categories can also be defined in different shared libraries.
|
|
So we need to be able to quickly find all of the methods and class functions
|
|
given the Objective-C class name, or quickly find all methods and class
|
|
functions for a class + category name. This table does not contain any selector
|
|
names, it just maps Objective-C class names (or class names + category) to all
|
|
of the methods and class functions. The selectors are added as function
|
|
basenames in the .debug_names section.</p>
|
|
|
|
<p>In the ".apple_names" section for Objective-C functions, the full name is the
|
|
entire function name with the brackets ("-[NSString stringWithCString:]") and the
|
|
basename is the selector only ("stringWithCString:").</p>
|
|
|
|
<h5>Mach-O Changes</h5>
|
|
<p>The sections names for the apple hash tables are for non mach-o files. For
|
|
mach-o files, the sections should be contained in the "__DWARF" segment with
|
|
names as follows:</p>
|
|
<ul>
|
|
<li>".apple_names" -> "__apple_names"</li>
|
|
<li>".apple_types" -> "__apple_types"</li>
|
|
<li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li>
|
|
<li> ".apple_objc" -> "__apple_objc"</li>
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<hr>
|
|
<address>
|
|
<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
|
|
src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
|
|
<a href="http://validator.w3.org/check/referer"><img
|
|
src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
|
|
|
|
<a href="mailto:sabre@nondot.org">Chris Lattner</a><br>
|
|
<a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
|
|
Last modified: $Date$
|
|
</address>
|
|
|
|
</body>
|
|
</html>
|