Add docs+a script for building clang/LLVM with PGO
Depending on who you ask, PGO grants a 15%-25% improvement in build
times when using clang. Sadly, hooking everything up properly to
generate a profile and apply it to clang isn't always straightforward.
This script (and the accompanying docs) aim to make this process easier;
ideally, a single invocation of the given script.
In terms of testing, I've got a cronjob on my Debian box that's meant to
run this a few times per week, and I tried manually running it on a puny
Gentoo box I have (four whole Atom cores!). Nothing obviously broke.
¯\_(ツ)_/¯
I don't know if we have a Python style guide, so I just shoved this
through yapf with all the defaults on.
Finally, though the focus is clang at the moment, the hope is that this
is easily applicable to other LLVM-y tools with minimal effort (e.g.
lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat
ambiguous about naming.
Differential Revision: https://reviews.llvm.org/D53598
llvm-svn: 345427
2018-10-26 22:56:03 +02:00
|
|
|
=============================================================
|
|
|
|
How To Build Clang and LLVM with Profile-Guided Optimizations
|
|
|
|
=============================================================
|
|
|
|
|
|
|
|
Introduction
|
|
|
|
============
|
|
|
|
|
|
|
|
PGO (Profile-Guided Optimization) allows your compiler to better optimize code
|
|
|
|
for how it actually runs. Users report that applying this to Clang and LLVM can
|
|
|
|
decrease overall compile time by 20%.
|
|
|
|
|
|
|
|
This guide walks you through how to build Clang with PGO, though it also applies
|
|
|
|
to other subprojects, such as LLD.
|
|
|
|
|
|
|
|
|
|
|
|
Using the script
|
|
|
|
================
|
|
|
|
|
|
|
|
We have a script at ``utils/collect_and_build_with_pgo.py``. This script is
|
|
|
|
tested on a few Linux flavors, and requires a checkout of LLVM, Clang, and
|
|
|
|
compiler-rt. Despite the the name, it performs four clean builds of Clang, so it
|
|
|
|
can take a while to run to completion. Please see the script's ``--help`` for
|
|
|
|
more information on how to run it, and the different options available to you.
|
|
|
|
If you want to get the most out of PGO for a particular use-case (e.g. compiling
|
|
|
|
a specific large piece of software), please do read the section below on
|
|
|
|
'benchmark' selection.
|
|
|
|
|
|
|
|
Please note that this script is only tested on a few Linux distros. Patches to
|
|
|
|
add support for other platforms, as always, are highly appreciated. :)
|
|
|
|
|
|
|
|
This script also supports a ``--dry-run`` option, which causes it to print
|
|
|
|
important commands instead of running them.
|
|
|
|
|
|
|
|
|
|
|
|
Selecting 'benchmarks'
|
|
|
|
======================
|
|
|
|
|
|
|
|
PGO does best when the profiles gathered represent how the user plans to use the
|
|
|
|
compiler. Notably, highly accurate profiles of llc building x86_64 code aren't
|
|
|
|
incredibly helpful if you're going to be targeting ARM.
|
|
|
|
|
|
|
|
By default, the script above does two things to get solid coverage. It:
|
|
|
|
|
|
|
|
- runs all of Clang and LLVM's lit tests, and
|
|
|
|
- uses the instrumented Clang to build Clang, LLVM, and all of the other
|
|
|
|
LLVM subprojects available to it.
|
|
|
|
|
|
|
|
Together, these should give you:
|
|
|
|
|
|
|
|
- solid coverage of building C++,
|
|
|
|
- good coverage of building C,
|
|
|
|
- great coverage of running optimizations,
|
|
|
|
- great coverage of the backend for your host's architecture, and
|
|
|
|
- some coverage of other architectures (if other arches are supported backends).
|
|
|
|
|
|
|
|
Altogether, this should cover a diverse set of uses for Clang and LLVM. If you
|
|
|
|
have very specific needs (e.g. your compiler is meant to compile a large browser
|
|
|
|
for four different platforms, or similar), you may want to do something else.
|
|
|
|
This is configurable in the script itself.
|
|
|
|
|
|
|
|
|
|
|
|
Building Clang with PGO
|
|
|
|
=======================
|
|
|
|
|
|
|
|
If you prefer to not use the script, this briefly goes over how to build
|
|
|
|
Clang/LLVM with PGO.
|
|
|
|
|
|
|
|
First, you should have at least LLVM, Clang, and compiler-rt checked out
|
|
|
|
locally.
|
|
|
|
|
|
|
|
Next, at a high level, you're going to need to do the following:
|
|
|
|
|
|
|
|
1. Build a standard Release Clang and the relevant libclang_rt.profile library
|
|
|
|
2. Build Clang using the Clang you built above, but with instrumentation
|
|
|
|
3. Use the instrumented Clang to generate profiles, which consists of two steps:
|
|
|
|
|
|
|
|
- Running the instrumented Clang/LLVM/lld/etc. on tasks that represent how
|
|
|
|
users will use said tools.
|
|
|
|
- Using a tool to convert the "raw" profiles generated above into a single,
|
|
|
|
final PGO profile.
|
|
|
|
|
|
|
|
4. Build a final release Clang (along with whatever other binaries you need)
|
|
|
|
using the profile collected from your benchmark
|
|
|
|
|
|
|
|
In more detailed steps:
|
|
|
|
|
|
|
|
1. Configure a Clang build as you normally would. It's highly recommended that
|
|
|
|
you use the Release configuration for this, since it will be used to build
|
|
|
|
another Clang. Because you need Clang and supporting libraries, you'll want
|
|
|
|
to build the ``all`` target (e.g. ``ninja all`` or ``make -j4 all``).
|
|
|
|
|
|
|
|
2. Configure a Clang build as above, but add the following CMake args:
|
|
|
|
|
|
|
|
- ``-DLLVM_BUILD_INSTRUMENTED=IR`` -- This causes us to build everything
|
|
|
|
with instrumentation.
|
|
|
|
- ``-DLLVM_BUILD_RUNTIME=No`` -- A few projects have bad interactions when
|
|
|
|
built with profiling, and aren't necessary to build. This flag turns them
|
|
|
|
off.
|
|
|
|
- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
|
|
|
|
step 1.
|
|
|
|
- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
|
|
|
|
|
|
|
|
In this build directory, you simply need to build the ``clang`` target (and
|
|
|
|
whatever supporting tooling your benchmark requires).
|
|
|
|
|
|
|
|
3. As mentioned above, this has two steps: gathering profile data, and then
|
|
|
|
massaging it into a useful form:
|
|
|
|
|
|
|
|
a. Build your benchmark using the Clang generated in step 2. The 'standard'
|
|
|
|
benchmark recommended is to run ``check-clang`` and ``check-llvm`` in your
|
|
|
|
instrumented Clang's build directory, and to do a full build of Clang/LLVM
|
|
|
|
using your instrumented Clang. So, create yet another build directory,
|
|
|
|
with the following CMake arguments:
|
|
|
|
|
|
|
|
- ``-DCMAKE_C_COMPILER=/path/to/stage2/clang`` - Use the Clang we built in
|
|
|
|
step 2.
|
|
|
|
- ``-DCMAKE_CXX_COMPILER=/path/to/stage2/clang++`` - Same as above.
|
|
|
|
|
|
|
|
If your users are fans of debug info, you may want to consider using
|
|
|
|
``-DCMAKE_BUILD_TYPE=RelWithDebInfo`` instead of
|
|
|
|
``-DCMAKE_BUILD_TYPE=Release``. This will grant better coverage of
|
|
|
|
debug info pieces of clang, but will take longer to complete and will
|
|
|
|
result in a much larger build directory.
|
|
|
|
|
|
|
|
It's recommended to build the ``all`` target with your instrumented Clang,
|
|
|
|
since more coverage is often better.
|
|
|
|
|
2018-12-05 09:35:30 +01:00
|
|
|
b. You should now have a few ``*.profraw`` files in
|
Add docs+a script for building clang/LLVM with PGO
Depending on who you ask, PGO grants a 15%-25% improvement in build
times when using clang. Sadly, hooking everything up properly to
generate a profile and apply it to clang isn't always straightforward.
This script (and the accompanying docs) aim to make this process easier;
ideally, a single invocation of the given script.
In terms of testing, I've got a cronjob on my Debian box that's meant to
run this a few times per week, and I tried manually running it on a puny
Gentoo box I have (four whole Atom cores!). Nothing obviously broke.
¯\_(ツ)_/¯
I don't know if we have a Python style guide, so I just shoved this
through yapf with all the defaults on.
Finally, though the focus is clang at the moment, the hope is that this
is easily applicable to other LLVM-y tools with minimal effort (e.g.
lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat
ambiguous about naming.
Differential Revision: https://reviews.llvm.org/D53598
llvm-svn: 345427
2018-10-26 22:56:03 +02:00
|
|
|
``path/to/stage2/profiles/``. You need to merge these using
|
|
|
|
``llvm-profdata`` (even if you only have one! The profile merge transforms
|
|
|
|
profraw into actual profile data, as well). This can be done with
|
2018-12-05 09:35:30 +01:00
|
|
|
``/path/to/stage1/llvm-profdata merge
|
|
|
|
-output=/path/to/output/profdata.prof path/to/stage2/profiles/*.profraw``.
|
Add docs+a script for building clang/LLVM with PGO
Depending on who you ask, PGO grants a 15%-25% improvement in build
times when using clang. Sadly, hooking everything up properly to
generate a profile and apply it to clang isn't always straightforward.
This script (and the accompanying docs) aim to make this process easier;
ideally, a single invocation of the given script.
In terms of testing, I've got a cronjob on my Debian box that's meant to
run this a few times per week, and I tried manually running it on a puny
Gentoo box I have (four whole Atom cores!). Nothing obviously broke.
¯\_(ツ)_/¯
I don't know if we have a Python style guide, so I just shoved this
through yapf with all the defaults on.
Finally, though the focus is clang at the moment, the hope is that this
is easily applicable to other LLVM-y tools with minimal effort (e.g.
lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat
ambiguous about naming.
Differential Revision: https://reviews.llvm.org/D53598
llvm-svn: 345427
2018-10-26 22:56:03 +02:00
|
|
|
|
|
|
|
4. Now, build your final, PGO-optimized Clang. To do this, you'll want to pass
|
|
|
|
the following additional arguments to CMake.
|
|
|
|
|
|
|
|
- ``-DLLVM_PROFDATA_FILE=/path/to/output/profdata.prof`` - Use the PGO
|
|
|
|
profile from the previous step.
|
|
|
|
- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
|
|
|
|
step 1.
|
|
|
|
- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.
|
|
|
|
|
|
|
|
From here, you can build whatever targets you need.
|
|
|
|
|
|
|
|
.. note::
|
|
|
|
You may see warnings about a mismatched profile in the build output. These
|
|
|
|
are generally harmless. To silence them, you can add
|
|
|
|
``-DCMAKE_C_FLAGS='-Wno-backend-plugin'
|
|
|
|
-DCMAKE_CXX_FLAGS='-Wno-backend-plugin'`` to your CMake invocation.
|
|
|
|
|
|
|
|
|
|
|
|
Congrats! You now have a Clang built with profile-guided optimizations, and you
|
|
|
|
can delete all but the final build directory if you'd like.
|
|
|
|
|
|
|
|
If this worked well for you and you plan on doing it often, there's a slight
|
|
|
|
optimization that can be made: LLVM and Clang have a tool called tblgen that's
|
|
|
|
built and run during the build process. While it's potentially nice to build
|
|
|
|
this for coverage as part of step 3, none of your other builds should benefit
|
|
|
|
from building it. You can pass the CMake options
|
|
|
|
``-DCLANG_TABLEGEN=/path/to/stage1/bin/clang-tblgen
|
|
|
|
-DLLVM_TABLEGEN=/path/to/stage1/bin/llvm-tblgen`` to steps 2 and onward to avoid
|
|
|
|
these useless rebuilds.
|