llvm-mirror/docs/OpaquePointers.rst

===============
Opaque Pointers
===============

The Opaque Pointer Type
=======================

Traditionally, LLVM IR pointer types have contained a pointee type. For example,
``i32 *`` is a pointer that points to an ``i32`` somewhere in memory. However,
due to a lack of pointee type semantics and various issues with having pointee
types, there is a desire to remove pointee types from pointers.

The opaque pointer type project aims to replace all pointer types containing
pointee types in LLVM with an opaque pointer type. The new pointer type is
tentatively represented textually as ``ptr``.

Address spaces are still used to distinguish between different kinds of pointers
where the distinction is relevant for lowering (e.g. data vs function pointers
have different sizes on some architectures). Opaque pointers are not changing
anything related to address spaces and lowering. For more information, see
`DataLayout <LangRef.html#langref-datalayout>`_.

Issues with explicit pointee types
==================================

LLVM IR pointers can be cast back and forth between pointers with different
pointee types. The pointee type does not necessarily actually represent the
actual underlying type in memory. In other words, the pointee type contains no
real semantics.

Lots of operations do not actually care about the underlying type. These
operations, typically intrinsics, usually end up taking an ``i8 *``. This causes
lots of redundant no-op bitcasts in the IR to and from a pointer with a
different pointee type. The extra bitcasts take up space and require extra work
to look through in optimizations. And more bitcasts increases the chances of
incorrect bitcasts, especially in regards to address spaces.

Some instructions still need to know what type to treat the memory pointed to by
the pointer as. For example, a load needs to know how many bytes to load from
memory. In these cases, instructions themselves contain a type argument. For
example the load instruction from older versions of LLVM

.. code-block:: llvm

  load i64* %p

becomes

.. code-block:: llvm

  load i64, ptr %p

A nice analogous transition that happened earlier in LLVM is integer signedness.
There is no distinction between signed and unsigned integer types, rather the
integer operations themselves contain what to treat the integer as. Initially,
LLVM IR distinguished between unsigned and signed integer types. The transition
from manifesting signedness in types to instructions happened early on in LLVM's
life to the betterment of LLVM IR.

I Still Need Pointee Types!
===========================

The frontend should already know what type each operation operates on based on
the input source code. However, some frontends like Clang may end up relying on
LLVM pointer pointee types to keep track of pointee types. The frontend needs to
keep track of frontend pointee types on its own.

For optimizations around frontend types, pointee types are not useful due their
lack of semantics. Rather, since LLVM IR works on untyped memory, for a frontend
to tell LLVM about frontend types for the purposes of alias analysis, extra
metadata is added to the IR. For more information, see `TBAA
<LangRef.html#tbaa-metadata>`_.

Some specific operations still need to know what type a pointer types to. For
the most part, this is codegen and ABI specific. For example, `byval
<LangRef.html#parameter-attributes>`_ arguments are pointers, but backends need
to know the underlying type of the argument to properly lower it. In cases like
these, the attributes contain a type argument. For example,

.. code-block:: llvm

  call void @f(ptr byval(i32) %p)

signifies that ``%p`` as an argument should be lowered as an ``i32`` passed
indirectly.

If you have use cases that this sort of fix doesn't cover, please email
llvm-dev.

Transition Plan
===============

LLVM currently has many places that depend on pointee types. Each dependency on
pointee types needs to be resolved in some way or another.

Making everything use opaque pointers in one huge commit is infeasible. This
needs to be done incrementally. The following steps need to be done, in no
particular order:

* Introduce the opaque pointer type

* Various ABI attributes and instructions that need a type can be changed one at
  a time

  * This has already happened for many instructions like loads, stores, GEPs,
    and various attributes like ``byval``

* Fix up existing in-tree users of pointee types to not rely on LLVM pointer
  pointee types

* Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer
  type (not to be turned on until ready)

* Migrate frontends to not keep track of frontend pointee types via LLVM pointer
  pointee types

* Add option to internally treat all pointer types opaque pointers and see what
  breaks, starting with LLVM tests, then run Clang over large codebases

* Replace legacy pointer types in LLVM tests with opaque pointer types

Frontend Migration Steps
========================

If you have your own frontend, there are a couple of things to do after opaque
pointer types fully work.

* Don't rely on LLVM pointee types to keep track of frontend pointee types

* Migrate away from LLVM IR instruction builders that rely on pointee types

  * For example, ``IRBuilder::CreateGEP()`` has multiple overloads; make sure to
    use one where the source element type is explicitly passed in, not inferred
    from the pointer operand pointee type