2020-02-04 21:44:09 +01:00
|
|
|
===================
|
|
|
|
Bisecting LLVM code
|
|
|
|
===================
|
|
|
|
|
|
|
|
Introduction
|
|
|
|
============
|
|
|
|
|
|
|
|
``git bisect`` is a useful tool for finding which revision caused a bug.
|
|
|
|
|
|
|
|
This document describes how to use ``git bisect``. In particular, while LLVM
|
|
|
|
has a mostly linear history, it has a few merge commits that added projects --
|
|
|
|
and these merged the linear history of those projects. As a consequence, the
|
|
|
|
LLVM repository has multiple roots: One "normal" root, and then one for each
|
|
|
|
toplevel project that was developed out-of-tree and then merged later.
|
|
|
|
As of early 2020, the only such merged project is MLIR, but flang will likely
|
|
|
|
be merged in a similar way soon.
|
|
|
|
|
|
|
|
Basic operation
|
|
|
|
===============
|
|
|
|
|
|
|
|
See https://git-scm.com/docs/git-bisect for a good overview. In summary:
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
git bisect start
|
2020-12-21 20:06:17 +01:00
|
|
|
git bisect bad main
|
2020-02-04 21:44:09 +01:00
|
|
|
git bisect good f00ba
|
|
|
|
|
|
|
|
git will check out a revision in between. Try to reproduce your problem at
|
|
|
|
that revision, and run ``git bisect good`` or ``git bisect bad``.
|
|
|
|
|
|
|
|
If you can't repro at the current commit (maybe the build is broken), run
|
|
|
|
``git bisect skip`` and git will pick a nearby alternate commit.
|
|
|
|
|
|
|
|
(To abort a bisect, run ``git bisect reset``, and if git complains about not
|
2020-12-21 20:06:17 +01:00
|
|
|
being able to reset, do the usual ``git checkout -f main; git reset --hard
|
|
|
|
origin/main`` dance and try again).
|
2020-02-04 21:44:09 +01:00
|
|
|
|
|
|
|
``git bisect run``
|
|
|
|
==================
|
|
|
|
|
|
|
|
A single bisect step often requires first building clang, and then compiling
|
|
|
|
a large code base with just-built clang. This can take a long time, so it's
|
|
|
|
good if it can happen completely automatically. ``git bisect run`` can do
|
|
|
|
this for you if you write a run script that reproduces the problem
|
|
|
|
automatically. Writing the script can take 10-20 minutes, but it's almost
|
|
|
|
always worth it -- you can do something else while the bisect runs (such
|
|
|
|
as writing this document).
|
|
|
|
|
|
|
|
Here's an example run script. It assumes that you're in ``llvm-project`` and
|
|
|
|
that you have a sibling ``llvm-build-project`` build directory where you
|
|
|
|
configured CMake to use Ninja. You have a file ``repro.c`` in the current
|
|
|
|
directory that makes clang crash at trunk, but it worked fine at revision
|
|
|
|
``f00ba``.
|
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
# Build clang. If the build fails, `exit 125` causes this
|
|
|
|
# revision to be skipped
|
|
|
|
ninja -C ../llvm-build-project clang || exit 125
|
|
|
|
|
|
|
|
../llvm-build-project/bin/clang repro.c
|
|
|
|
|
|
|
|
To make sure your run script works, it's a good idea to run ``./run.sh`` by
|
|
|
|
hand and tweak the script until it works, then run ``git bisect good`` or
|
|
|
|
``git bisect bad`` manually once based on the result of the script
|
|
|
|
(check ``echo $?`` after your script ran), and only then run ``git bisect run
|
|
|
|
./run.sh``. Don't forget to mark your run script as executable -- ``git bisect
|
|
|
|
run`` doesn't check for that, it just assumes the run script failed each time.
|
|
|
|
|
|
|
|
Once your run script works, run ``git bisect run ./run.sh`` and a few hours
|
|
|
|
later you'll know which commit caused the regression.
|
|
|
|
|
|
|
|
(This is a very simple run script. Often, you want to use just-built clang
|
|
|
|
to build a different project and then run a built executable of that project
|
|
|
|
in the run script.)
|
|
|
|
|
|
|
|
Bisecting across multiple roots
|
|
|
|
===============================
|
|
|
|
|
|
|
|
Here's how LLVM's history currently looks:
|
|
|
|
|
2020-02-10 21:18:15 +01:00
|
|
|
.. code-block:: none
|
2020-02-04 21:44:09 +01:00
|
|
|
|
|
|
|
A-o-o-......-o-D-o-o-HEAD
|
|
|
|
/
|
|
|
|
B-o-...-o-C-
|
|
|
|
|
|
|
|
``A`` is the first commit in LLVM ever, ``97724f18c79c``.
|
|
|
|
|
2020-02-10 21:47:46 +01:00
|
|
|
``B`` is the first commit in MLIR, ``aed0d21a62db``.
|
|
|
|
|
|
|
|
``D`` is the merge commit that merged MLIR into the main LLVM repository,
|
|
|
|
``0f0d0ed1c78f``.
|
|
|
|
|
|
|
|
``C`` is the last commit in MLIR before it got merged, ``0f0d0ed1c78f^2``. (The
|
|
|
|
``^n`` modifier selects the n'th parent of a merge commit.)
|
2020-02-04 21:44:09 +01:00
|
|
|
|
|
|
|
``git bisect`` goes through all parent revisions. Due to the way MLIR was
|
|
|
|
merged, at every revision at ``C`` or earlier, *only* the ``mlir/`` directory
|
|
|
|
exists, and nothing else does.
|
|
|
|
|
|
|
|
As of early 2020, there is no flag to ``git bisect`` to tell it to not
|
|
|
|
descend into all reachable commits. Ideally, we'd want to tell it to only
|
|
|
|
follow the first parent of ``D``.
|
|
|
|
|
|
|
|
The best workaround is to pass a list of directories to ``git bisect``:
|
|
|
|
If you know the bug is due to a change in llvm, clang, or compiler-rt, use
|
2020-02-10 21:47:46 +01:00
|
|
|
|
|
|
|
.. code-block:: bash
|
|
|
|
|
|
|
|
git bisect start -- clang llvm compiler-rt
|
|
|
|
|
|
|
|
That way, the commits in ``mlir`` are never evaluated.
|
2020-02-04 21:44:09 +01:00
|
|
|
|
|
|
|
Alternatively, ``git bisect skip aed0d21a6 aed0d21a6..0f0d0ed1c78f`` explicitly
|
|
|
|
skips all commits on that branch. It takes 1.5 minutes to run on a fast
|
|
|
|
machine, and makes ``git bisect log`` output unreadable. (``aed0d21a6`` is
|
|
|
|
listed twice because git ranges exclude the revision listed on the left,
|
|
|
|
so it needs to be ignored explicitly.)
|
|
|
|
|
|
|
|
More Resources
|
|
|
|
==============
|
|
|
|
|
|
|
|
https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection
|