| ========= |
| MemorySSA |
| ========= |
| |
| .. contents:: |
| :local: |
| |
| Introduction |
| ============ |
| |
| ``MemorySSA`` is an analysis that allows us to cheaply reason about the |
| interactions between various memory operations. Its goal is to replace |
| ``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because, |
| unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily |
| result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't |
| have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get |
| better results, too. |
| |
| At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based |
| form for memory, complete with def-use and use-def chains, which |
| enables users to quickly find may-def and may-uses of memory operations. |
| It can also be thought of as a way to cheaply give versions to the complete |
| state of heap memory, and associate memory operations with those versions. |
| |
| This document goes over how ``MemorySSA`` is structured, and some basic |
| intuition on how ``MemorySSA`` works. |
| |
| A paper on MemorySSA (with notes about how it's implemented in GCC) `can be |
| found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's |
| relatively out-of-date; the paper references multiple heap partitions, but GCC |
| eventually swapped to just using one, like we now have in LLVM. Like |
| GCC's, LLVM's MemorySSA is intraprocedural. |
| |
| |
| MemorySSA Structure |
| =================== |
| |
| MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a |
| structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are |
| ``MemorySSA``'s parallel to LLVM ``Instruction``\ s. |
| |
| Each ``MemoryAccess`` can be one of three types: |
| |
| - ``MemoryPhi`` |
| - ``MemoryUse`` |
| - ``MemoryDef`` |
| |
| ``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any |
| point we have two (or more) ``MemoryDef``\ s that could flow into a |
| ``BasicBlock``, the block's top ``MemoryAccess`` will be a |
| ``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any |
| concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s |
| inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s |
| and ``MemoryDef``\ s. |
| |
| Note also that in SSA, Phi nodes merge must-reach definitions (that is, |
| definitions that *must* be new versions of variables). In MemorySSA, PHI nodes |
| merge may-reach definitions (that is, until disambiguated, the versions that |
| reach a phi node may or may not clobber a given variable). |
| |
| ``MemoryUse``\ s are operations which use but don't modify memory. An example of |
| a ``MemoryUse`` is a ``load``, or a ``readonly`` function call. |
| |
| ``MemoryDef``\ s are operations which may either modify memory, or which |
| introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s |
| include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher) |
| ordering, volatile operations, memory fences, etc. |
| |
| Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``. |
| It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being |
| run on, and implies that we've hit the top of the function. It's the only |
| ``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of |
| ``liveOnEntry`` implies that the memory being used is either undefined or |
| defined before the function begins. |
| |
| An example of all of this overlaid on LLVM IR (obtained by running ``opt |
| -passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When |
| viewing this example, it may be helpful to view it in terms of clobbers. The |
| operands of a given ``MemoryAccess`` are all (potential) clobbers of said |
| MemoryAccess, and the value produced by a ``MemoryAccess`` can act as a clobber |
| for other ``MemoryAccess``\ es. Another useful way of looking at it is in |
| terms of heap versions. In that view, operands of a given |
| ``MemoryAccess`` are the version of the heap before the operation, and |
| if the access produces a value, the value is the new version of the heap |
| after the operation. |
| |
| .. code-block:: llvm |
| |
| define void @foo() { |
| entry: |
| %p1 = alloca i8 |
| %p2 = alloca i8 |
| %p3 = alloca i8 |
| ; 1 = MemoryDef(liveOnEntry) |
| store i8 0, i8* %p3 |
| br label %while.cond |
| |
| while.cond: |
| ; 6 = MemoryPhi({%0,1},{if.end,4}) |
| br i1 undef, label %if.then, label %if.else |
| |
| if.then: |
| ; 2 = MemoryDef(6) |
| store i8 0, i8* %p1 |
| br label %if.end |
| |
| if.else: |
| ; 3 = MemoryDef(6) |
| store i8 1, i8* %p2 |
| br label %if.end |
| |
| if.end: |
| ; 5 = MemoryPhi({if.then,2},{if.else,3}) |
| ; MemoryUse(5) |
| %1 = load i8, i8* %p1 |
| ; 4 = MemoryDef(5) |
| store i8 2, i8* %p2 |
| ; MemoryUse(1) |
| %2 = load i8, i8* %p3 |
| br label %while.cond |
| } |
| |
| The ``MemorySSA`` IR is shown in comments that precede the instructions they map |
| to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)`` |
| is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM |
| instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this |
| particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8* |
| %p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM |
| Instruction, so the line directly below a ``MemoryPhi`` isn't special. |
| |
| Going from the top down: |
| |
| - ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering |
| ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This |
| ``MemoryPhi`` is referred to in the textual IR by the number ``6``. |
| - ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition, |
| and its reaching definition before it is ``6``, or the ``MemoryPhi`` after |
| ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_ |
| sections below for why this ``MemoryDef`` isn't linked to a separate, |
| disambiguated ``MemoryPhi``.) |
| - ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its |
| reaching definition is also ``6``. |
| - ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before |
| this block could either be ``2`` or ``3``. |
| - ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that |
| it's clobbered by ``5``. |
| - ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; it's |
| reaching definition is ``5``. |
| - ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory, |
| and the last thing that could clobber this use is above ``while.cond`` (e.g. |
| the store to ``%p3``). In heap versioning parlance, it really only depends on |
| the heap version 1, and is unaffected by the new heap versions generated since |
| then. |
| |
| As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not |
| meant to interact with LLVM IR. |
| |
| Design of MemorySSA |
| =================== |
| |
| ``MemorySSA`` is an analysis that can be built for any arbitrary function. When |
| it's built, it does a pass over the function's IR in order to build up its |
| mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things |
| like the dominance relation between ``MemoryAccess``\ es, and get the |
| ``MemoryAccess`` for any given ``Instruction`` . |
| |
| When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker`` |
| that you can use (see below). |
| |
| |
| The walker |
| ---------- |
| |
| A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or |
| the walker, for short. The goal of the walker is to provide answers to clobber |
| queries beyond what's represented directly by ``MemoryAccess``\ es. For example, |
| given: |
| |
| .. code-block:: llvm |
| |
| define void @foo() { |
| %a = alloca i8 |
| %b = alloca i8 |
| |
| ; 1 = MemoryDef(liveOnEntry) |
| store i8 0, i8* %a |
| ; 2 = MemoryDef(1) |
| store i8 0, i8* %b |
| } |
| |
| The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would |
| be the walker's goal to figure this out, and return ``liveOnEntry`` when queried |
| for the clobber of ``MemoryAccess`` ``2``. |
| |
| By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s |
| and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to |
| be using. Walkers were built to be flexible, though, so it's entirely reasonable |
| (and expected) to create more specialized walkers (e.g. one that specifically |
| queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc). |
| |
| |
| Locating clobbers yourself |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| If you choose to make your own walker, you can find the clobber for a |
| ``MemoryAccess`` by walking every ``MemoryDef`` that dominates said |
| ``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple; |
| they ultimately form a linked list of every clobber that dominates the |
| ``MemoryAccess`` that you're trying to optimize. In other words, the |
| ``definingAccess`` of a ``MemoryDef`` is always the nearest dominating |
| ``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``. |
| |
| |
| Build-time use optimization |
| --------------------------- |
| |
| ``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time. |
| Specifically, we optimize the operand of every ``MemoryUse`` to point to the |
| actual clobber of said ``MemoryUse``. This can be seen in the above example; the |
| second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a |
| ``MemoryDef`` from the entry block. This is done to make walking, |
| value numbering, etc, faster and easier. |
| |
| It is not possible to optimize ``MemoryDef`` in the same way, as we |
| restrict ``MemorySSA`` to one heap variable and, thus, one Phi node |
| per block. |
| |
| |
| Invalidation and updating |
| ------------------------- |
| |
| Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever |
| the IR is updated. "Update", in this case, includes the addition, deletion, and |
| motion of ``Instructions``. The update API is being made on an as-needed basis. |
| If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API. |
| |
| |
| Phi placement |
| ^^^^^^^^^^^^^ |
| |
| ``MemorySSA`` only places ``MemoryPhi``\ s where they're actually |
| needed. That is, it is a pruned SSA form, like LLVM's SSA form. For |
| example, consider: |
| |
| .. code-block:: llvm |
| |
| define void @foo() { |
| entry: |
| %p1 = alloca i8 |
| %p2 = alloca i8 |
| %p3 = alloca i8 |
| ; 1 = MemoryDef(liveOnEntry) |
| store i8 0, i8* %p3 |
| br label %while.cond |
| |
| while.cond: |
| ; 3 = MemoryPhi({%0,1},{if.end,2}) |
| br i1 undef, label %if.then, label %if.else |
| |
| if.then: |
| br label %if.end |
| |
| if.else: |
| br label %if.end |
| |
| if.end: |
| ; MemoryUse(1) |
| %1 = load i8, i8* %p1 |
| ; 2 = MemoryDef(3) |
| store i8 2, i8* %p2 |
| ; MemoryUse(1) |
| %2 = load i8, i8* %p3 |
| br label %while.cond |
| } |
| |
| Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi`` |
| for ``if.end`` would be pointless, so we don't place one. So, if you need to |
| place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create |
| a ``MemoryPhi`` for ``if.end``. |
| |
| If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s |
| everywhere. Because we have Walkers that are capable of optimizing above said |
| phis, doing so shouldn't prohibit optimizations. |
| |
| |
| Non-Goals |
| --------- |
| |
| ``MemorySSA`` is meant to reason about the relation between memory |
| operations, and enable quicker querying. |
| It isn't meant to be the single source of truth for all potential memory-related |
| optimizations. Specifically, care must be taken when trying to use ``MemorySSA`` |
| to reason about atomic or volatile operations, as in: |
| |
| .. code-block:: llvm |
| |
| define i8 @foo(i8* %a) { |
| entry: |
| br i1 undef, label %if.then, label %if.end |
| |
| if.then: |
| ; 1 = MemoryDef(liveOnEntry) |
| %0 = load volatile i8, i8* %a |
| br label %if.end |
| |
| if.end: |
| %av = phi i8 [0, %entry], [%0, %if.then] |
| ret i8 %av |
| } |
| |
| Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may |
| seem legal. Because it's a volatile load, though, it's not. |
| |
| |
| Design tradeoffs |
| ---------------- |
| |
| Precision |
| ^^^^^^^^^ |
| |
| ``MemorySSA`` in LLVM deliberately trades off precision for speed. |
| Let us think about memory variables as if they were disjoint partitions of the |
| heap (that is, if you have one variable, as above, it represents the entire |
| heap, and if you have multiple variables, each one represents some |
| disjoint portion of the heap) |
| |
| First, because alias analysis results conflict with each other, and |
| each result may be what an analysis wants (IE |
| TBAA may say no-alias, and something else may say must-alias), it is |
| not possible to partition the heap the way every optimization wants. |
| Second, some alias analysis results are not transitive (IE A noalias B, |
| and B noalias C, does not mean A noalias C), so it is not possible to |
| come up with a precise partitioning in all cases without variables to |
| represent every pair of possible aliases. Thus, partitioning |
| precisely may require introducing at least N^2 new virtual variables, |
| phi nodes, etc. |
| |
| Each of these variables may be clobbered at multiple def sites. |
| |
| To give an example, if you were to split up struct fields into |
| individual variables, all aliasing operations that may-def multiple struct |
| fields, will may-def more than one of them. This is pretty common (calls, |
| copies, field stores, etc). |
| |
| Experience with SSA forms for memory in other compilers has shown that |
| it is simply not possible to do this precisely, and in fact, doing it |
| precisely is not worth it, because now all the optimizations have to |
| walk tons and tons of virtual variables and phi nodes. |
| |
| So we partition. At the point at which you partition, again, |
| experience has shown us there is no point in partitioning to more than |
| one variable. It simply generates more IR, and optimizations still |
| have to query something to disambiguate further anyway. |
| |
| As a result, LLVM partitions to one variable. |
| |
| Use Optimization |
| ^^^^^^^^^^^^^^^^ |
| |
| Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one |
| useful guarantee - all loads are optimized to point at the thing that |
| actually clobbers them. This gives some nice properties. For example, |
| for a given store, you can find all loads actually clobbered by that |
| store by walking the immediate uses of the store. |