2011-08-09 23:07:10 +02:00
|
|
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
|
|
|
<html>
|
|
|
|
<head>
|
|
|
|
<title>LLVM Atomic Instructions and Concurrency Guide</title>
|
2011-08-10 01:02:53 +02:00
|
|
|
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
|
2011-08-09 23:07:10 +02:00
|
|
|
<link rel="stylesheet" href="llvm.css" type="text/css">
|
|
|
|
</head>
|
|
|
|
<body>
|
|
|
|
|
|
|
|
<h1>
|
|
|
|
LLVM Atomic Instructions and Concurrency Guide
|
|
|
|
</h1>
|
|
|
|
|
|
|
|
<ol>
|
|
|
|
<li><a href="#introduction">Introduction</a></li>
|
|
|
|
<li><a href="#loadstore">Load and store</a></li>
|
|
|
|
<li><a href="#otherinst">Other atomic instructions</a></li>
|
2011-08-12 01:44:25 +02:00
|
|
|
<li><a href="#ordering">Atomic orderings</a></li>
|
2011-08-09 23:07:10 +02:00
|
|
|
<li><a href="#iropt">Atomics and IR optimization</a></li>
|
|
|
|
<li><a href="#codegen">Atomics and Codegen</a></li>
|
|
|
|
</ol>
|
|
|
|
|
|
|
|
<div class="doc_author">
|
|
|
|
<p>Written by Eli Friedman</p>
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<h2>
|
|
|
|
<a name="introduction">Introduction</a>
|
|
|
|
</h2>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
|
|
|
<p>Historically, LLVM has not had very strong support for concurrency; some
|
|
|
|
minimal intrinsics were provided, and <code>volatile</code> was used in some
|
|
|
|
cases to achieve rough semantics in the presence of concurrency. However, this
|
|
|
|
is changing; there are now new instructions which are well-defined in the
|
|
|
|
presence of threads and asynchronous signals, and the model for existing
|
|
|
|
instructions has been clarified in the IR.</p>
|
|
|
|
|
|
|
|
<p>The atomic instructions are designed specifically to provide readable IR and
|
|
|
|
optimized code generation for the following:</p>
|
|
|
|
<ul>
|
2011-08-12 01:44:25 +02:00
|
|
|
<li>The new C++0x <code><atomic></code> header.
|
|
|
|
(<a href="http://www.open-std.org/jtc1/sc22/wg21/">C++0x draft available here</a>.)
|
|
|
|
(<a href="http://www.open-std.org/jtc1/sc22/wg14/">C1x draft available here</a>)</li>
|
2011-08-09 23:07:10 +02:00
|
|
|
<li>Proper semantics for Java-style memory, for both <code>volatile</code> and
|
2011-08-12 01:44:25 +02:00
|
|
|
regular shared variables.
|
|
|
|
(<a href="http://java.sun.com/docs/books/jls/third_edition/html/memory.html">Java Specification</a>)</li>
|
|
|
|
<li>gcc-compatible <code>__sync_*</code> builtins.
|
|
|
|
(<a href="http://gcc.gnu.org/onlinedocs/gcc/Atomic-Builtins.html">Description</a>)</li>
|
2011-08-09 23:07:10 +02:00
|
|
|
<li>Other scenarios with atomic semantics, including <code>static</code>
|
|
|
|
variables with non-trivial constructors in C++.</li>
|
|
|
|
</ul>
|
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<p>Atomic and volatile in the IR are orthogonal; "volatile" is the C/C++
|
|
|
|
volatile, which ensures that every volatile load and store happens and is
|
|
|
|
performed in the stated order. A couple examples: if a
|
|
|
|
SequentiallyConsistent store is immediately followed by another
|
|
|
|
SequentiallyConsistent store to the same address, the first store can
|
|
|
|
be erased. This transformation is not allowed for a pair of volatile
|
|
|
|
stores. On the other hand, a non-volatile non-atomic load can be moved
|
|
|
|
across a volatile load freely, but not an Acquire load.</p>
|
|
|
|
|
2011-08-09 23:07:10 +02:00
|
|
|
<p>This document is intended to provide a guide to anyone either writing a
|
|
|
|
frontend for LLVM or working on optimization passes for LLVM with a guide
|
|
|
|
for how to deal with instructions with special semantics in the presence of
|
|
|
|
concurrency. This is not intended to be a precise guide to the semantics;
|
|
|
|
the details can get extremely complicated and unreadable, and are not
|
|
|
|
usually necessary.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<h2>
|
|
|
|
<a name="loadstore">Load and store</a>
|
|
|
|
</h2>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
|
|
|
<p>The basic <code>'load'</code> and <code>'store'</code> allow a variety of
|
|
|
|
optimizations, but can have unintuitive results in a concurrent environment.
|
|
|
|
For a frontend writer, the rule is essentially that all memory accessed
|
|
|
|
with basic loads and stores by multiple threads should be protected by a
|
|
|
|
lock or other synchronization; otherwise, you are likely to run into
|
|
|
|
undefined behavior. (Do not use volatile as a substitute for atomics; it
|
|
|
|
might work on some platforms, but does not provide the necessary guarantees
|
|
|
|
in general.)</p>
|
|
|
|
|
|
|
|
<p>From the optimizer's point of view, the rule is that if there
|
2011-08-12 01:44:25 +02:00
|
|
|
are not any instructions with atomic ordering involved, concurrency does
|
|
|
|
not matter, with one exception: if a variable might be visible to another
|
2011-08-09 23:07:10 +02:00
|
|
|
thread or signal handler, a store cannot be inserted along a path where it
|
2011-08-12 01:44:25 +02:00
|
|
|
might not execute otherwise. For example, suppose LICM wants to take all the
|
|
|
|
loads and stores in a loop to and from a particular address and promote them
|
|
|
|
to registers. LICM is not allowed to insert an unconditional store after
|
|
|
|
the loop with the computed value unless a store unconditionally executes
|
|
|
|
within the loop. Note that speculative loads are allowed; a load which
|
|
|
|
is part of a race returns <code>undef</code>, but does not have undefined
|
|
|
|
behavior.</p>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
|
|
|
<p>For cases where simple loads and stores are not sufficient, LLVM provides
|
|
|
|
atomic loads and stores with varying levels of guarantees.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<h2>
|
|
|
|
<a name="otherinst">Other atomic instructions</a>
|
|
|
|
</h2>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
|
|
|
<p><code>cmpxchg</code> and <code>atomicrmw</code> are essentially like an
|
|
|
|
atomic load followed by an atomic store (where the store is conditional for
|
|
|
|
<code>cmpxchg</code>), but no other memory operation can happen between
|
|
|
|
the load and store. Note that our cmpxchg does not have quite as many
|
|
|
|
options for making cmpxchg weaker as the C++0x version.</p>
|
|
|
|
|
|
|
|
<p>A <code>fence</code> provides Acquire and/or Release ordering which is not
|
|
|
|
part of another operation; it is normally used along with Monotonic memory
|
|
|
|
operations. A Monotonic load followed by an Acquire fence is roughly
|
|
|
|
equivalent to an Acquire load.</p>
|
|
|
|
|
|
|
|
<p>Frontends generating atomic instructions generally need to be aware of the
|
|
|
|
target to some degree; atomic instructions are guaranteed to be lock-free,
|
|
|
|
and therefore an instruction which is wider than the target natively supports
|
|
|
|
can be impossible to generate.</p>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
2011-08-09 23:07:10 +02:00
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<h2>
|
|
|
|
<a name="ordering">Atomic orderings</a>
|
|
|
|
</h2>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
|
|
|
<p>In order to achieve a balance between performance and necessary guarantees,
|
|
|
|
there are six levels of atomicity. They are listed in order of strength;
|
|
|
|
each level includes all the guarantees of the previous level except for
|
|
|
|
Acquire/Release.</p>
|
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<h3>
|
|
|
|
<a name="o_unordered">Unordered</a>
|
|
|
|
</h3>
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
2011-08-09 23:07:10 +02:00
|
|
|
<p>Unordered is the lowest level of atomicity. It essentially guarantees that
|
2011-08-12 01:44:25 +02:00
|
|
|
races produce somewhat sane results instead of having undefined behavior.
|
|
|
|
It also guarantees the operation to be lock-free, so it do not depend on
|
|
|
|
the data being part of a special atomic structure or depend on a separate
|
|
|
|
per-process global lock. Note that code generation will fail for
|
|
|
|
unsupported atomic operations; if you need such an operation, use explicit
|
|
|
|
locking.</p>
|
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>Relevant standard</dt>
|
|
|
|
<dd>This is intended to match the Java memory model for shared
|
|
|
|
variables.</dd>
|
|
|
|
<dt>Notes for frontends</dt>
|
|
|
|
<dd>This cannot be used for synchronization, but is useful for Java and
|
|
|
|
other "safe" languages which need to guarantee that the generated
|
|
|
|
code never exhibits undefined behavior. Note that this guarantee
|
|
|
|
is cheap on common platforms for loads of a native width, but can
|
|
|
|
be expensive or unavailable for wider loads, like a 64-bit store
|
|
|
|
on ARM. (A frontend for Java or other "safe" languages would normally
|
|
|
|
split a 64-bit store on ARM into two 32-bit unordered stores.)
|
|
|
|
<dt>Notes for optimizers</dt>
|
|
|
|
<dd>In terms of the optimizer, this prohibits any transformation that
|
|
|
|
transforms a single load into multiple loads, transforms a store
|
|
|
|
into multiple stores, narrows a store, or stores a value which
|
|
|
|
would not be stored otherwise. Some examples of unsafe optimizations
|
|
|
|
are narrowing an assignment into a bitfield, rematerializing
|
|
|
|
a load, and turning loads and stores into a memcpy call. Reordering
|
|
|
|
unordered operations is safe, though, and optimizers should take
|
|
|
|
advantage of that because unordered operations are common in
|
|
|
|
languages that need them.</dd>
|
|
|
|
<dt>Notes for code generation</dt>
|
|
|
|
<dd>These operations are required to be atomic in the sense that if you
|
|
|
|
use unordered loads and unordered stores, a load cannot see a value
|
|
|
|
which was never stored. A normal load or store instruction is usually
|
|
|
|
sufficient, but note that an unordered load or store cannot
|
|
|
|
be split into multiple instructions (or an instruction which
|
|
|
|
does multiple memory operations, like <code>LDRD</code> on ARM).</dd>
|
|
|
|
</dl>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<h3>
|
|
|
|
<a name="o_monotonic">Monotonic</a>
|
|
|
|
</h3>
|
|
|
|
|
|
|
|
<div>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
|
|
|
<p>Monotonic is the weakest level of atomicity that can be used in
|
|
|
|
synchronization primitives, although it does not provide any general
|
|
|
|
synchronization. It essentially guarantees that if you take all the
|
|
|
|
operations affecting a specific address, a consistent ordering exists.
|
2011-08-12 01:44:25 +02:00
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>Relevant standard</dt>
|
|
|
|
<dd>This corresponds to the C++0x/C1x <code>memory_order_relaxed</code>;
|
|
|
|
see those standards for the exact definition.
|
|
|
|
<dt>Notes for frontends</dt>
|
|
|
|
<dd>If you are writing a frontend which uses this directly, use with caution.
|
|
|
|
The guarantees in terms of synchronization are very weak, so make
|
|
|
|
sure these are only used in a pattern which you know is correct.
|
|
|
|
Generally, these would either be used for atomic operations which
|
|
|
|
do not protect other memory (like an atomic counter), or along with
|
|
|
|
a <code>fence</code>.</dd>
|
|
|
|
<dt>Notes for optimizers</dt>
|
|
|
|
<dd>In terms of the optimizer, this can be treated as a read+write on the
|
|
|
|
relevant memory location (and alias analysis will take advantage of
|
|
|
|
that). In addition, it is legal to reorder non-atomic and Unordered
|
|
|
|
loads around Monotonic loads. CSE/DSE and a few other optimizations
|
|
|
|
are allowed, but Monotonic operations are unlikely to be used in ways
|
|
|
|
which would make those optimizations useful.</dd>
|
|
|
|
<dt>Notes for code generation</dt>
|
|
|
|
<dd>Code generation is essentially the same as that for unordered for loads
|
|
|
|
and stores. No fences is required. <code>cmpxchg</code> and
|
|
|
|
<code>atomicrmw</code> are required to appear as a single operation.</dd>
|
|
|
|
</dl>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<h3>
|
|
|
|
<a name="o_acquire">Acquire</a>
|
|
|
|
</h3>
|
|
|
|
|
|
|
|
<div>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
|
|
|
<p>Acquire provides a barrier of the sort necessary to acquire a lock to access
|
2011-08-12 01:44:25 +02:00
|
|
|
other memory with normal loads and stores.
|
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>Relevant standard</dt>
|
|
|
|
<dd>This corresponds to the C++0x/C1x <code>memory_order_acquire</code>. It
|
|
|
|
should also be used for C++0x/C1x <code>memory_order_consume</code>.
|
|
|
|
<dt>Notes for frontends</dt>
|
|
|
|
<dd>If you are writing a frontend which uses this directly, use with caution.
|
|
|
|
Acquire only provides a semantic guarantee when paired with a Release
|
|
|
|
operation.</dd>
|
|
|
|
<dt>Notes for optimizers</dt>
|
2011-08-12 05:38:32 +02:00
|
|
|
<dd>Optimizers not aware of atomics can treat this like a nothrow call.
|
|
|
|
Tt is also possible to move stores from before an Acquire load
|
|
|
|
or read-modify-write operation to after it, and move non-Acquire
|
|
|
|
loads from before an Acquire operation to after it.</dd>
|
2011-08-12 01:44:25 +02:00
|
|
|
<dt>Notes for code generation</dt>
|
|
|
|
<dd>Architectures with weak memory ordering (essentially everything relevant
|
|
|
|
today except x86 and SPARC) require some sort of fence to maintain
|
|
|
|
the Acquire semantics. The precise fences required varies widely by
|
|
|
|
architecture, but for a simple implementation, most architectures provide
|
|
|
|
a barrier which is strong enough for everything (<code>dmb</code> on ARM,
|
|
|
|
<code>sync</code> on PowerPC, etc.). Putting such a fence after the
|
|
|
|
equivalent Monotonic operation is sufficient to maintain Acquire
|
|
|
|
semantics for a memory operation.</dd>
|
|
|
|
</dl>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
</div>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<h3>
|
|
|
|
<a name="o_acquire">Release</a>
|
|
|
|
</h3>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<div>
|
|
|
|
|
|
|
|
<p>Release is similar to Acquire, but with a barrier of the sort necessary to
|
|
|
|
release a lock.
|
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>Relevant standard</dt>
|
|
|
|
<dd>This corresponds to the C++0x/C1x <code>memory_order_release</code>.</dd>
|
|
|
|
<dt>Notes for frontends</dt>
|
|
|
|
<dd>If you are writing a frontend which uses this directly, use with caution.
|
|
|
|
Release only provides a semantic guarantee when paired with a Acquire
|
|
|
|
operation.</dd>
|
|
|
|
<dt>Notes for optimizers</dt>
|
2011-08-12 05:38:32 +02:00
|
|
|
<dd>Optimizers not aware of atomics can treat this like a nothrow call.
|
|
|
|
It is also possible to move loads from after a Release store
|
|
|
|
or read-modify-write operation to before it, and move non-Release
|
|
|
|
stores from after an Release operation to before it.</dd>
|
2011-08-12 01:44:25 +02:00
|
|
|
<dt>Notes for code generation</dt>
|
2011-08-12 03:26:06 +02:00
|
|
|
<dd>See the section on Acquire; a fence before the relevant operation is
|
2011-08-12 05:38:32 +02:00
|
|
|
usually sufficient for Release. Note that a store-store fence is not
|
2011-08-12 03:26:06 +02:00
|
|
|
sufficient to implement Release semantics; store-store fences are
|
|
|
|
generally not exposed to IR because they are extremely difficult to
|
2011-08-12 01:44:25 +02:00
|
|
|
use correctly.</dd>
|
|
|
|
</dl>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
|
|
|
</div>
|
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<h3>
|
|
|
|
<a name="o_acqrel">AcquireRelease</a>
|
|
|
|
</h3>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
|
|
|
<div>
|
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<p>AcquireRelease (<code>acq_rel</code> in IR) provides both an Acquire and a
|
|
|
|
Release barrier (for fences and operations which both read and write memory).
|
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>Relevant standard</dt>
|
|
|
|
<dd>This corresponds to the C++0x/C1x <code>memory_order_acq_rel</code>.
|
|
|
|
<dt>Notes for frontends</dt>
|
|
|
|
<dd>If you are writing a frontend which uses this directly, use with caution.
|
|
|
|
Acquire only provides a semantic guarantee when paired with a Release
|
|
|
|
operation, and vice versa.</dd>
|
|
|
|
<dt>Notes for optimizers</dt>
|
|
|
|
<dd>In general, optimizers should treat this like a nothrow call; the
|
|
|
|
the possible optimizations are usually not interesting.</dd>
|
|
|
|
<dt>Notes for code generation</dt>
|
|
|
|
<dd>This operation has Acquire and Release semantics; see the sections on
|
2011-08-12 01:48:52 +02:00
|
|
|
Acquire and Release.</dd>
|
2011-08-12 01:44:25 +02:00
|
|
|
</dl>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
</div>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<!-- ======================================================================= -->
|
|
|
|
<h3>
|
|
|
|
<a name="o_seqcst">SequentiallyConsistent</a>
|
|
|
|
</h3>
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
2011-08-12 02:36:38 +02:00
|
|
|
<p>SequentiallyConsistent (<code>seq_cst</code> in IR) provides
|
|
|
|
Acquire semantics for loads and Release semantics for
|
|
|
|
stores. Additionally, it guarantees that a total ordering exists
|
|
|
|
between all SequentiallyConsistent operations.
|
2011-08-12 01:44:25 +02:00
|
|
|
|
|
|
|
<dl>
|
|
|
|
<dt>Relevant standard</dt>
|
|
|
|
<dd>This corresponds to the C++0x/C1x <code>memory_order_seq_cst</code>,
|
|
|
|
Java volatile, and the gcc-compatible <code>__sync_*</code> builtins
|
|
|
|
which do not specify otherwise.
|
|
|
|
<dt>Notes for frontends</dt>
|
|
|
|
<dd>If a frontend is exposing atomic operations, these are much easier to
|
|
|
|
reason about for the programmer than other kinds of operations, and using
|
|
|
|
them is generally a practical performance tradeoff.</dd>
|
|
|
|
<dt>Notes for optimizers</dt>
|
2011-08-12 05:38:32 +02:00
|
|
|
<dd>Optimizers not aware of atomics can treat this like a nothrow call.
|
|
|
|
For SequentiallyConsistent loads and stores, the same reorderings are
|
|
|
|
allowed as for Acquire loads and Release stores, except that
|
|
|
|
SequentiallyConsistent operations may not be reordered.</dd>
|
2011-08-12 01:44:25 +02:00
|
|
|
<dt>Notes for code generation</dt>
|
2011-08-12 02:36:38 +02:00
|
|
|
<dd>SequentiallyConsistent loads minimally require the same barriers
|
2011-08-12 05:38:32 +02:00
|
|
|
as Acquire operations and SequeuentiallyConsistent stores require
|
|
|
|
Release barriers. Additionally, the code generator must enforce
|
|
|
|
ordering between SequeuentiallyConsistent stores followed by
|
|
|
|
SequeuentiallyConsistent loads. This is usually done by emitting
|
|
|
|
either a full fence before the loads or a full fence after the
|
|
|
|
stores; which is preferred varies by architecture.</dd>
|
2011-08-12 01:44:25 +02:00
|
|
|
</dl>
|
|
|
|
|
|
|
|
</div>
|
2011-08-09 23:07:10 +02:00
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<h2>
|
|
|
|
<a name="iropt">Atomics and IR optimization</a>
|
|
|
|
</h2>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
|
|
|
<p>Predicates for optimizer writers to query:
|
|
|
|
<ul>
|
|
|
|
<li>isSimple(): A load or store which is not volatile or atomic. This is
|
|
|
|
what, for example, memcpyopt would check for operations it might
|
|
|
|
transform.
|
|
|
|
<li>isUnordered(): A load or store which is not volatile and at most
|
|
|
|
Unordered. This would be checked, for example, by LICM before hoisting
|
|
|
|
an operation.
|
|
|
|
<li>mayReadFromMemory()/mayWriteToMemory(): Existing predicate, but note
|
2011-08-10 22:17:43 +02:00
|
|
|
that they return true for any operation which is volatile or at least
|
2011-08-09 23:07:10 +02:00
|
|
|
Monotonic.
|
|
|
|
<li>Alias analysis: Note that AA will return ModRef for anything Acquire or
|
|
|
|
Release, and for the address accessed by any Monotonic operation.
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
<p>There are essentially two components to supporting atomic operations. The
|
|
|
|
first is making sure to query isSimple() or isUnordered() instead
|
|
|
|
of isVolatile() before transforming an operation. The other piece is
|
|
|
|
making sure that a transform does not end up replacing, for example, an
|
|
|
|
Unordered operation with a non-atomic operation. Most of the other
|
|
|
|
necessary checks automatically fall out from existing predicates and
|
|
|
|
alias analysis queries.</p>
|
|
|
|
|
|
|
|
<p>Some examples of how optimizations interact with various kinds of atomic
|
|
|
|
operations:
|
|
|
|
<ul>
|
|
|
|
<li>memcpyopt: An atomic operation cannot be optimized into part of a
|
|
|
|
memcpy/memset, including unordered loads/stores. It can pull operations
|
|
|
|
across some atomic operations.
|
|
|
|
<li>LICM: Unordered loads/stores can be moved out of a loop. It just treats
|
|
|
|
monotonic operations like a read+write to a memory location, and anything
|
|
|
|
stricter than that like a nothrow call.
|
|
|
|
<li>DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores
|
|
|
|
can be DSE'ed in some cases, but it's tricky to reason about, and not
|
|
|
|
especially important.
|
|
|
|
<li>Folding a load: Any atomic load from a constant global can be
|
|
|
|
constant-folded, because it cannot be observed. Similar reasoning allows
|
|
|
|
scalarrepl with atomic loads and stores.
|
|
|
|
</ul>
|
|
|
|
|
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
<h2>
|
|
|
|
<a name="codegen">Atomics and Codegen</a>
|
|
|
|
</h2>
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<div>
|
|
|
|
|
|
|
|
<p>Atomic operations are represented in the SelectionDAG with
|
|
|
|
<code>ATOMIC_*</code> opcodes. On architectures which use barrier
|
|
|
|
instructions for all atomic ordering (like ARM), appropriate fences are
|
|
|
|
split out as the DAG is built.</p>
|
|
|
|
|
|
|
|
<p>The MachineMemOperand for all atomic operations is currently marked as
|
|
|
|
volatile; this is not correct in the IR sense of volatile, but CodeGen
|
|
|
|
handles anything marked volatile very conservatively. This should get
|
|
|
|
fixed at some point.</p>
|
|
|
|
|
2011-08-12 01:44:25 +02:00
|
|
|
<p>Common architectures have some way of representing at least a pointer-sized
|
|
|
|
lock-free <code>cmpxchg</code>; such an operation can be used to implement
|
|
|
|
all the other atomic operations which can be represented in IR up to that
|
|
|
|
size. Backends are expected to implement all those operations, but not
|
|
|
|
operations which cannot be implemented in a lock-free manner. It is
|
|
|
|
expected that backends will give an error when given an operation which
|
|
|
|
cannot be implemented. (The LLVM code generator is not very helpful here
|
|
|
|
at the moment, but hopefully that will change.)</p>
|
|
|
|
|
2011-08-09 23:07:10 +02:00
|
|
|
<p>The implementation of atomics on LL/SC architectures (like ARM) is currently
|
|
|
|
a bit of a mess; there is a lot of copy-pasted code across targets, and
|
|
|
|
the representation is relatively unsuited to optimization (it would be nice
|
|
|
|
to be able to optimize loops involving cmpxchg etc.).</p>
|
|
|
|
|
|
|
|
<p>On x86, all atomic loads generate a <code>MOV</code>.
|
|
|
|
SequentiallyConsistent stores generate an <code>XCHG</code>, other stores
|
|
|
|
generate a <code>MOV</code>. SequentiallyConsistent fences generate an
|
|
|
|
<code>MFENCE</code>, other fences do not cause any code to be generated.
|
|
|
|
cmpxchg uses the <code>LOCK CMPXCHG</code> instruction.
|
|
|
|
<code>atomicrmw xchg</code> uses <code>XCHG</code>,
|
|
|
|
<code>atomicrmw add</code> and <code>atomicrmw sub</code> use
|
|
|
|
<code>XADD</code>, and all other <code>atomicrmw</code> operations generate
|
|
|
|
a loop with <code>LOCK CMPXCHG</code>. Depending on the users of the
|
|
|
|
result, some <code>atomicrmw</code> operations can be translated into
|
|
|
|
operations like <code>LOCK AND</code>, but that does not work in
|
|
|
|
general.</p>
|
|
|
|
|
|
|
|
<p>On ARM, MIPS, and many other RISC architectures, Acquire, Release, and
|
|
|
|
SequentiallyConsistent semantics require barrier instructions
|
|
|
|
for every such operation. Loads and stores generate normal instructions.
|
2011-08-12 01:44:25 +02:00
|
|
|
<code>cmpxchg</code> and <code>atomicrmw</code> can be represented using
|
|
|
|
a loop with LL/SC-style instructions which take some sort of exclusive
|
|
|
|
lock on a cache line (<code>LDREX</code> and <code>STREX</code> on
|
|
|
|
ARM, etc.). At the moment, the IR does not provide any way to represent a
|
|
|
|
weak <code>cmpxchg</code> which would not require a loop.</p>
|
2011-08-09 23:07:10 +02:00
|
|
|
</div>
|
|
|
|
|
|
|
|
<!-- *********************************************************************** -->
|
|
|
|
|
|
|
|
<hr>
|
|
|
|
<address>
|
|
|
|
<a href="http://jigsaw.w3.org/css-validator/check/referer"><img
|
|
|
|
src="http://jigsaw.w3.org/css-validator/images/vcss-blue" alt="Valid CSS"></a>
|
|
|
|
<a href="http://validator.w3.org/check/referer"><img
|
|
|
|
src="http://www.w3.org/Icons/valid-html401-blue" alt="Valid HTML 4.01"></a>
|
|
|
|
|
|
|
|
<a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br>
|
|
|
|
Last modified: $Date: 2011-08-09 02:07:00 -0700 (Tue, 09 Aug 2011) $
|
|
|
|
</address>
|
|
|
|
|
|
|
|
</body>
|
|
|
|
</html>
|