[docs] Add page on opaque pointer types

Reviewed By: dblaikie, dexonsmith Differential Revision: https://reviews.llvm.org/D102292
2025-01-31 12:41:49 +01:00 · 2021-05-11 17:02:12 -07:00 · 2021-05-11 17:02:12 -07:00 · 88e8f8e83e
commit 88e8f8e83e
parent 475b09053f
2 changed files with 131 additions and 0 deletions
--- a/docs/OpaquePointers.rst
+++ b/docs/OpaquePointers.rst
@ -0,0 +1,130 @@
+===============
+Opaque Pointers
+===============
+
+The Opaque Pointer Type
+=======================
+
+Traditionally, LLVM IR pointer types have contained a pointee type. For example,
+``i32 *`` is a pointer that points to an ``i32`` somewhere in memory. However,
+due to a lack of pointee type semantics and various issues with having pointee
+types, there is a desire to remove pointee types from pointers.
+
+The opaque pointer type project aims to replace all pointer types containing
+pointee types in LLVM with an opaque pointer type. The new pointer type is
+tentatively represented textually as ``ptr``.
+
+Anything to do with pointer address spaces is unaffected.
+
+Issues with explicit pointee types
+==================================
+
+LLVM IR pointers can be cast back and forth between pointers with different
+pointee types. The pointee type does not necessarily actually represent the
+actual underlying type in memory. In other words, the pointee type contains no
+real semantics.
+
+Lots of operations do not actually care about the underlying type. These
+operations, typically intrinsics, usually end up taking an ``i8 *``. This causes
+lots of redundant no-op bitcasts in the IR to and from a pointer with a
+different pointee type. The extra bitcasts take up space and require extra work
+to look through in optimizations. And more bitcasts increases the chances of
+incorrect bitcasts, especially in regards to address spaces.
+
+Some instructions still need to know what type to treat the memory pointed to by
+the pointer as. For example, a load needs to know how many bytes to load from
+memory. In these cases, instructions themselves contain a type argument. For
+example the load instruction from older versions of LLVM
+
+.. code-block:: llvm
+
+  load i64* %p
+
+becomes
+
+.. code-block:: llvm
+
+  load i64, ptr %p
+
+A nice analogous transition that happened earlier in LLVM is integer signedness.
+There is no distinction between signed and unsigned integer types, rather the
+integer operations themselves contain what to treat the integer as. Initially,
+LLVM IR distinguished between unsigned and signed integer types. The transition
+from manifesting signedness in types to instructions happened early on in LLVM's
+life to the betterment of LLVM IR.
+
+I Still Need Pointee Types!
+===========================
+
+The frontend should already know what type each operation operates on based on
+the input source code. However, some frontends like Clang may end up relying on
+LLVM pointer pointee types to keep track of pointee types. The frontend needs to
+keep track of frontend pointee types on its own.
+
+For optimizations around frontend types, pointee types are not useful due their
+lack of semantics. Rather, since LLVM IR works on untyped memory, for a frontend
+to tell LLVM about frontend types for the purposes of alias analysis, extra
+metadata is added to the IR. For more information, see `TBAA
+<LangRef.html#tbaa-metadata>`_.
+
+Some specific operations still need to know what type a pointer types to. For
+the most part, this is codegen and ABI specific. For example, `byval
+<LangRef.html#parameter-attributes>`_ arguments are pointers, but backends need
+to know the underlying type of the argument to properly lower it. In cases like
+these, the attributes contain a type argument. For example,
+
+.. code-block:: llvm
+
+  call void @f(ptr byval(i32) %p)
+
+signifies that ``%p`` as an argument should be lowered as an ``i32`` passed
+indirectly.
+
+If you have use cases that this sort of fix doesn't cover, please email
+llvm-dev.
+
+Transition Plan
+===============
+
+LLVM currently has many places that depend on pointee types. Each dependency on
+pointee types needs to be resolved in some way or another.
+
+Making everything use opaque pointers in one huge commit is infeasible. This
+needs to be done incrementally. The following steps need to be done, in no
+particular order:
+
+* Introduce the opaque pointer type
+
+* Various ABI attributes and instructions that need a type can be changed one at
+  a time
+
+  * This has already happened for many instructions like loads, stores, GEPs,
+    and various attributes like ``byval``
+
+* Fix up existing in-tree users of pointee types to not rely on LLVM pointer
+  pointee types
+
+* Allow bitcode auto-upgrade of legacy pointer type to the new opaque pointer
+  type (not to be turned on until ready)
+
+* Migrate frontends to not keep track of frontend pointee types via LLVM pointer
+  pointee types
+
+* Add option to internally treat all pointer types opaque pointers and see what
+  breaks, starting with LLVM tests, then run Clang over large codebases
+
+* Replace legacy pointer types in LLVM tests with opaque pointer types
+
+Frontend Migration Steps
+========================
+
+If you have your own frontend, there are a couple of things to do after opaque
+pointer types fully work.
+
+* Don't rely on LLVM pointee types to keep track of frontend pointee types
+
+* Migrate away from LLVM IR instruction builders that rely on pointee types
+
+  * For example, ``IRBuilder::CreateGEP()`` has multiple overloads; make sure to
+    use one where the source element type is explicitly passed in, not inferred
+    from the pointer operand pointee type
--- a/docs/UserGuides.rst
+++ b/docs/UserGuides.rst
@ -44,6 +44,7 @@ intermediate LLVM representation.
   MergeFunctions
   MCJITDesignAndImplementation
   ORCv2
+   OpaquePointers
   JITLink
   NewPassManager
   NVPTXUsage