1
0
mirror of https://github.com/RPCS3/llvm-mirror.git synced 2024-11-22 10:42:39 +01:00

[docs][scudo] Update Scudo documentation

Update the Scudo document to align with the standalone version.
Add some more verbiage about the various component of the
allocator, rework a bit everything.
The build instructions have been updated.
The options and their default values have been updated, and
the `mallopt` ones have been added.

Differential Revision: https://reviews.llvm.org/D100230
This commit is contained in:
Kostya Kortchinsky 2021-04-09 14:15:13 -07:00
parent d7da6c0d43
commit 6d9d112a9c

View File

@ -4,100 +4,137 @@ Scudo Hardened Allocator
.. contents::
:local:
:depth: 1
:depth: 2
Introduction
============
The Scudo Hardened Allocator is a user-mode allocator based on LLVM Sanitizer's
CombinedAllocator, which aims at providing additional mitigations against heap
based vulnerabilities, while maintaining good performance.
The Scudo Hardened Allocator is a user-mode allocator, originally based on LLVM
Sanitizers'
`CombinedAllocator <https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_common/sanitizer_allocator_combined.h>`_.
It aims at providing additional mitigation against heap based vulnerabilities,
while maintaining good performance. Scudo is currently the default allocator in
`Fuchsia <https://fuchsia.dev/>`_, and in `Android <https://www.android.com/>`_
since Android 11.
Currently, the allocator supports (was tested on) the following architectures:
- i386 (& i686) (32-bit);
- x86_64 (64-bit);
- armhf (32-bit);
- AArch64 (64-bit);
- MIPS (32-bit & 64-bit).
The name "Scudo" has been retained from the initial implementation (Escudo
meaning Shield in Spanish and Portuguese).
The name "Scudo" comes from the Italian word for
`shield <https://www.collinsdictionary.com/dictionary/italian-english/scudo>`_
(and Escudo in Spanish).
Design
======
Allocator
---------
Scudo can be considered a Frontend to the Sanitizers' common allocator (later
referenced as the Backend). It is split between a Primary allocator, fast and
efficient, that services smaller allocation sizes, and a Secondary allocator
that services larger allocation sizes and is backed by the operating system
memory mapping primitives.
Scudo was designed with security in mind, but aims at striking a good balance
between security and performance. It is highly tunable and configurable.
between security and performance. It was designed to be highly tunable and
configurable, and while we provide some default configurations, we encourage
consumers to come up with the parameters that will work best for their use
cases.
Chunk Header
------------
Every chunk of heap memory will be preceded by a chunk header. This has two
purposes, the first one being to store various information about the chunk,
the second one being to detect potential heap overflows. In order to achieve
this, the header will be checksummed, involving the pointer to the chunk itself
and a global secret. Any corruption of the header will be detected when said
header is accessed, and the process terminated.
The allocator combines several components that serve distinct purposes:
- the Primary allocator: fast and efficient, it services smaller allocation
sizes by carving reserved memory regions into blocks of identical size. There
are currently two Primary allocators implemented, specific to 32 and 64 bit
architectures. It is configurable via compile time options.
- the Secondary allocator: slower, it services larger allocation sizes via the
memory mapping primitives of the underlying operating system. Secondary backed
allocations are surrounded by Guard Pages. It is also configurable via compile
time options.
- the thread specific data Registry: defines how local caches operate for each
thread. There are currently two models implemented: the exlusive model where
each thread holds its own caches (using the ELF TLS); or the shared model
where threads share a fixed size pool of caches.
- the Quarantine: offers a way to delay the deallocation operations, preventing
blocks to be immediately available for reuse. Blocks held will be recycled
once certain size criteria are reached. This is essentially a delayed freelist
which can help mitigate some use-after-free situations. This feature is fairly
costly in terms of performance and memory footprint, is mostly controlled by
runtime options and is disabled by default.
Allocations Header
------------------
Every chunk of heap memory returned to an application by the allocator will be
preceded by a header. This has two purposes:
- being to store various information about the chunk, that can be leveraged to
ensure consistency of the heap operations;
- being able to detect potential corruption. For this purpose, the header is
checksummed and corruption of the header will be detected when said header is
accessed (note that if the corrupted header is not accessed, the corruption
will remain undetected).
The following information is stored in the header:
- the 16-bit checksum;
- the class ID for that chunk, which is the "bucket" where the chunk resides
for Primary backed allocations, or 0 for Secondary backed allocations;
- the size (Primary) or unused bytes amount (Secondary) for that chunk, which is
necessary for computing the size of the chunk;
- the class ID for that chunk, which identifies the region where the chunk
resides for Primary backed allocations, or 0 for Secondary backed allocations;
- the state of the chunk (available, allocated or quarantined);
- the allocation type (malloc, new, new[] or memalign), to detect potential
mismatches in the allocation APIs used;
- the size (Primary) or unused bytes amount (Secondary) for that chunk, which is
necessary for reallocation or sized-deallocation operations;
- the offset of the chunk, which is the distance in bytes from the beginning of
the returned chunk to the beginning of the Backend allocation;
the returned chunk to the beginning of the backend allocation (the "block");
This header fits within 8 bytes, on all platforms supported.
- the 16-bit checksum;
The checksum is computed as a CRC32 (made faster with hardware support)
This header fits within 8 bytes on all platforms supported, and contributes to a
small overhead for each allocation.
The checksum is computed using a CRC32 (made faster with hardware support)
of the global secret, the chunk pointer itself, and the 8 bytes of header with
the checksum field zeroed out. It is not intended to be cryptographically
strong.
strong.
The header is atomically loaded and stored to prevent races. This is important
as two consecutive chunks could belong to different threads. We also want to
avoid any type of double fetches of information located in the header, and use
local copies of the header for this purpose.
Delayed Freelist
-----------------
A delayed freelist allows us to not return a chunk directly to the Backend, but
to keep it aside for a while. Once a criterion is met, the delayed freelist is
emptied, and the quarantined chunks are returned to the Backend. This helps
mitigate use-after-free vulnerabilities by reducing the determinism of the
allocation and deallocation patterns.
This feature is using the Sanitizer's Quarantine as its base, and the amount of
memory that it can hold is configurable by the user (see the Options section
below).
as two consecutive chunks could belong to different threads. We work on local
copies and use compare-exchange primitives to update the headers in the heap
memory, and avoid any type of double-fetching.
Randomness
----------
It is important for the allocator to not make use of fixed addresses. We use
the dynamic base option for the SizeClassAllocator, allowing us to benefit
from the randomness of the system memory mapping functions.
Randomness is a critical factor to the additional security provided by the
allocator. The allocator trusts the memory mapping primitives of the OS to
provide pages at (mostly) non-predictable locations in memory, as well as the
binaries to be compiled with ASLR. In the event one of those assumptions is
incorrect, the security will be greatly reduced. Scudo further randomizes how
blocks are allocated in the Primary, can randomize how caches are assigned to
threads.
Memory reclaiming
-----------------
Primary and Secondary allocators have different behaviors with regard to
reclaiming. While Secondary mapped allocations can be unmapped on deallocation,
it isn't the case for the Primary, which could lead to a steady growth of the
RSS of a process. To counteracty this, if the underlying OS allows it, pages
that are covered by contiguous free memory blocks in the Primary can be
released: this generally means they won't count towards the RSS of a process and
be zero filled on subsequent accesses). This is done in the deallocation path,
and several options exist to tune this behavior.
Usage
=====
Platform
--------
If using Fuchsia or an Android version greater than 11, your memory allocations
are already service by Scudo (note that Android Svelte configurations still use
jemalloc).
Library
-------
The allocator static library can be built from the LLVM build tree thanks to
the ``scudo`` CMake rule. The associated tests can be exercised thanks to the
``check-scudo`` CMake rule.
The allocator static library can be built from the LLVM tree thanks to the
``scudo_standalone`` CMake rule. The associated tests can be exercised thanks to
the ``check-scudo_standalone`` CMake rule.
Linking the static library to your project can require the use of the
``whole-archive`` linker flag (or equivalent), depending on your linker.
@ -106,28 +143,32 @@ Additional flags might also be necessary.
Your linked binary should now make use of the Scudo allocation and deallocation
functions.
You may also build Scudo like this:
You may also build Scudo like this:
.. code:: console
cd $LLVM/projects/compiler-rt/lib
clang++ -fPIC -std=c++11 -msse4.2 -O2 -I. scudo/*.cpp \
$(\ls sanitizer_common/*.{cc,S} | grep -v "sanitizer_termination\|sanitizer_common_nolibc\|sancov_\|sanitizer_unwind\|sanitizer_symbol") \
-shared -o libscudo.so -pthread
cd $LLVM/compiler-rt/lib
clang++ -fPIC -std=c++17 -msse4.2 -O2 -pthread -shared \
-I scudo/standalone/include \
scudo/standalone/*.cpp \
-o $HOME/libscudo.so
and then use it with existing binaries as follows:
.. code:: console
LD_PRELOAD=`pwd`/libscudo.so ./a.out
LD_PRELOAD=$HOME/libscudo.so ./a.out
Clang
-----
With a recent version of Clang (post rL317337), the allocator can be linked with
a binary at compilation using the ``-fsanitize=scudo`` command-line argument, if
the target platform is supported. Currently, the only other Sanitizer Scudo is
compatible with is UBSan (eg: ``-fsanitize=scudo,undefined``). Compiling with
Scudo will also enforce PIE for the output binary.
With a recent version of Clang (post rL317337), the "old" version of the
allocator can be linked with a binary at compilation using the
``-fsanitize=scudo`` command-line argument, if the target platform is supported.
Currently, the only other sanitizer Scudo is compatible with is UBSan
(eg: ``-fsanitize=scudo,undefined``). Compiling with Scudo will also enforce
PIE for the output binary.
We will transition this to the standalone Scudo version in the future.
Options
-------
@ -146,61 +187,104 @@ through the following ways:
to be parsed. Options defined this way will override any definition made
through ``__scudo_default_options``.
The options string follows a syntax similar to ASan, where distinct options
can be assigned in the same string, separated by colons.
- via the standard ``mallopt`` `API <https://man7.org/linux/man-pages/man3/mallopt.3.html>`_,
using parameters that are Scudo specific.
When dealing with the options string, it follows a syntax similar to ASan, where
distinct options can be assigned in the same string, separated by colons.
For example, using the environment variable:
.. code:: console
SCUDO_OPTIONS="DeleteSizeMismatch=1:QuarantineSizeKb=64" ./a.out
SCUDO_OPTIONS="delete_size_mismatch=false:release_to_os_interval_ms=-1" ./a.out
Or using the function:
.. code:: cpp
extern "C" const char *__scudo_default_options() {
return "DeleteSizeMismatch=1:QuarantineSizeKb=64";
return "delete_size_mismatch=false:release_to_os_interval_ms=-1";
}
The following options are available:
The following "string" options are available:
+-----------------------------+----------------+----------------+------------------------------------------------+
| Option | 64-bit default | 32-bit default | Description |
+-----------------------------+----------------+----------------+------------------------------------------------+
| QuarantineSizeKb | 256 | 64 | The size (in Kb) of quarantine used to delay |
| | | | the actual deallocation of chunks. Lower value |
| | | | may reduce memory usage but decrease the |
| | | | effectiveness of the mitigation; a negative |
| | | | value will fallback to the defaults. Setting |
| | | | *both* this and ThreadLocalQuarantineSizeKb to |
| | | | zero will disable the quarantine entirely. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| QuarantineChunksUpToSize | 2048 | 512 | Size (in bytes) up to which chunks can be |
| | | | quarantined. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| ThreadLocalQuarantineSizeKb | 1024 | 256 | The size (in Kb) of per-thread cache use to |
| | | | offload the global quarantine. Lower value may |
| | | | reduce memory usage but might increase |
| | | | contention on the global quarantine. Setting |
| | | | *both* this and QuarantineSizeKb to zero will |
| | | | disable the quarantine entirely. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| DeallocationTypeMismatch | true | true | Whether or not we report errors on |
| | | | malloc/delete, new/free, new/delete[], etc. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| DeleteSizeMismatch | true | true | Whether or not we report errors on mismatch |
| | | | between sizes of new and delete. |
+-----------------------------+----------------+----------------+------------------------------------------------+
| ZeroContents | false | false | Whether or not we zero chunk contents on |
| | | | allocation and deallocation. |
+-----------------------------+----------------+----------------+------------------------------------------------+
+---------------------------------+----------------+----------------+-------------------------------------------------+
| Option | 64-bit default | 32-bit default | Description |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| quarantine_size_kb | 0 | 0 | The size (in Kb) of quarantine used to delay |
| | | | the actual deallocation of chunks. Lower value |
| | | | may reduce memory usage but decrease the |
| | | | effectiveness of the mitigation; a negative |
| | | | value will fallback to the defaults. Setting |
| | | | *both* this and thread_local_quarantine_size_kb |
| | | | to zero will disable the quarantine entirely. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| quarantine_max_chunk_size | 0 | 0 | Size (in bytes) up to which chunks can be |
| | | | quarantined. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| thread_local_quarantine_size_kb | 0 | 0 | The size (in Kb) of per-thread cache use to |
| | | | offload the global quarantine. Lower value may |
| | | | reduce memory usage but might increase |
| | | | contention on the global quarantine. Setting |
| | | | *both* this and quarantine_size_kb to zero will |
| | | | disable the quarantine entirely. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| dealloc_type_mismatch | false | false | Whether or not we report errors on |
| | | | malloc/delete, new/free, new/delete[], etc. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| delete_size_mismatch | true | true | Whether or not we report errors on mismatch |
| | | | between sizes of new and delete. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| zero_contents | false | false | Whether or not we zero chunk contents on |
| | | | allocation. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| pattern_fill_contents | false | false | Whether or not we fill chunk contents with a |
| | | | byte pattern on allocation. |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| may_return_null | true | true | Whether or not a non-fatal failure can return a |
| | | | NULL pointer (as opposed to terminating). |
+---------------------------------+----------------+----------------+-------------------------------------------------+
| release_to_os_interval_ms | 5000 | 5000 | The minimum interval (in ms) at which a release |
| | | | can be attempted (a negative value disables |
| | | | reclaiming). |
+---------------------------------+----------------+----------------+-------------------------------------------------+
Allocator related common Sanitizer options can also be passed through Scudo
options, such as ``allocator_may_return_null`` or ``abort_on_error``. A detailed
list including those can be found here:
https://github.com/google/sanitizers/wiki/SanitizerCommonFlags.
Additional flags can be specified, for example if Scudo if compiled with
`GWP-ASan <https://llvm.org/docs/GwpAsan.html>`_ support.
The following "mallopt" options are available (options are defined in
``include/scudo/interface.h``):
+---------------------------+-------------------------------------------------------+
| Option | Description |
+---------------------------+-------------------------------------------------------+
| M_DECAY_TIME | Sets the release interval option to the specified |
| | value (Android only allows 0 or 1 to respectively set |
| | the interval to the mininum and maximum value as |
| | specified at compile time). |
+---------------------------+-------------------------------------------------------+
| M_PURGE | Forces immediate memory reclaiming (value is unused). |
+---------------------------+-------------------------------------------------------+
| M_MEMTAG_TUNING | Tunes the allocator's choice of memory tags to make |
| | it more likely that a certain class of memory errors |
| | will be detected. The value argument should be one of |
| | the enumerators of ``scudo_memtag_tuning``. |
+---------------------------+-------------------------------------------------------+
| M_THREAD_DISABLE_MEM_INIT | Tunes the per-thread memory initialization, 0 being |
| | the normal behavior, 1 disabling the automatic heap |
| | initialization. |
+---------------------------+-------------------------------------------------------+
| M_CACHE_COUNT_MAX | Set the maximum number of entries than can be cached |
| | in the Secondary cache. |
+---------------------------+-------------------------------------------------------+
| M_CACHE_SIZE_MAX | Sets the maximum size of entries that can be cached |
| | in the Secondary cache. |
+---------------------------+-------------------------------------------------------+
| M_TSDS_COUNT_MAX | Increases the maximum number of TSDs that can be used |
| | up to the limit specified at compile time. |
+---------------------------+-------------------------------------------------------+
Error Types
===========
@ -251,3 +335,4 @@ Here is a list of the current error messages and their potential cause:
Several other error messages relate to parameter checking on the libc allocation
APIs and are fairly straightforward to understand.