| ========================================== |
| Design and Usage of the InAlloca Attribute |
| ========================================== |
| |
| Introduction |
| ============ |
| |
| The :ref:`inalloca <attr_inalloca>` attribute is designed to allow |
| taking the address of an aggregate argument that is being passed by |
| value through memory. Primarily, this feature is required for |
| compatibility with the Microsoft C++ ABI. Under that ABI, class |
| instances that are passed by value are constructed directly into |
| argument stack memory. Prior to the addition of inalloca, calls in LLVM |
| were indivisible instructions. There was no way to perform intermediate |
| work, such as object construction, between the first stack adjustment |
| and the final control transfer. With inalloca, all arguments passed in |
| memory are modelled as a single alloca, which can be stored to prior to |
| the call. Unfortunately, this complicated feature comes with a large |
| set of restrictions designed to bound the lifetime of the argument |
| memory around the call. |
| |
| For now, it is recommended that frontends and optimizers avoid producing |
| this construct, primarily because it forces the use of a base pointer. |
| This feature may grow in the future to allow general mid-level |
| optimization, but for now, it should be regarded as less efficient than |
| passing by value with a copy. |
| |
| Intended Usage |
| ============== |
| |
| The example below is the intended LLVM IR lowering for some C++ code |
| that passes two default-constructed ``Foo`` objects to ``g`` in the |
| 32-bit Microsoft C++ ABI. |
| |
| .. code-block:: c++ |
| |
| // Foo is non-trivial. |
| struct Foo { int a, b; Foo(); ~Foo(); Foo(const Foo &); }; |
| void g(Foo a, Foo b); |
| void f() { |
| g(Foo(), Foo()); |
| } |
| |
| .. code-block:: text |
| |
| %struct.Foo = type { i32, i32 } |
| declare void @Foo_ctor(%struct.Foo* %this) |
| declare void @Foo_dtor(%struct.Foo* %this) |
| declare void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) |
| |
| define void @f() { |
| entry: |
| %base = call i8* @llvm.stacksave() |
| %memargs = alloca <{ %struct.Foo, %struct.Foo }> |
| %b = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 1 |
| call void @Foo_ctor(%struct.Foo* %b) |
| |
| ; If a's ctor throws, we must destruct b. |
| %a = getelementptr <{ %struct.Foo, %struct.Foo }>* %memargs, i32 0 |
| invoke void @Foo_ctor(%struct.Foo* %a) |
| to label %invoke.cont unwind %invoke.unwind |
| |
| invoke.cont: |
| call void @g(<{ %struct.Foo, %struct.Foo }>* inalloca %memargs) |
| call void @llvm.stackrestore(i8* %base) |
| ... |
| |
| invoke.unwind: |
| call void @Foo_dtor(%struct.Foo* %b) |
| call void @llvm.stackrestore(i8* %base) |
| ... |
| } |
| |
| To avoid stack leaks, the frontend saves the current stack pointer with |
| a call to :ref:`llvm.stacksave <int_stacksave>`. Then, it allocates the |
| argument stack space with alloca and calls the default constructor. The |
| default constructor could throw an exception, so the frontend has to |
| create a landing pad. The frontend has to destroy the already |
| constructed argument ``b`` before restoring the stack pointer. If the |
| constructor does not unwind, ``g`` is called. In the Microsoft C++ ABI, |
| ``g`` will destroy its arguments, and then the stack is restored in |
| ``f``. |
| |
| Design Considerations |
| ===================== |
| |
| Lifetime |
| -------- |
| |
| The biggest design consideration for this feature is object lifetime. |
| We cannot model the arguments as static allocas in the entry block, |
| because all calls need to use the memory at the top of the stack to pass |
| arguments. We cannot vend pointers to that memory at function entry |
| because after code generation they will alias. |
| |
| The rule against allocas between argument allocations and the call site |
| avoids this problem, but it creates a cleanup problem. Cleanup and |
| lifetime is handled explicitly with stack save and restore calls. In |
| the future, we may want to introduce a new construct such as ``freea`` |
| or ``afree`` to make it clear that this stack adjusting cleanup is less |
| powerful than a full stack save and restore. |
| |
| Nested Calls and Copy Elision |
| ----------------------------- |
| |
| We also want to be able to support copy elision into these argument |
| slots. This means we have to support multiple live argument |
| allocations. |
| |
| Consider the evaluation of: |
| |
| .. code-block:: c++ |
| |
| // Foo is non-trivial. |
| struct Foo { int a; Foo(); Foo(const &Foo); ~Foo(); }; |
| Foo bar(Foo b); |
| int main() { |
| bar(bar(Foo())); |
| } |
| |
| In this case, we want to be able to elide copies into ``bar``'s argument |
| slots. That means we need to have more than one set of argument frames |
| active at the same time. First, we need to allocate the frame for the |
| outer call so we can pass it in as the hidden struct return pointer to |
| the middle call. Then we do the same for the middle call, allocating a |
| frame and passing its address to ``Foo``'s default constructor. By |
| wrapping the evaluation of the inner ``bar`` with stack save and |
| restore, we can have multiple overlapping active call frames. |
| |
| Callee-cleanup Calling Conventions |
| ---------------------------------- |
| |
| Another wrinkle is the existence of callee-cleanup conventions. On |
| Windows, all methods and many other functions adjust the stack to clear |
| the memory used to pass their arguments. In some sense, this means that |
| the allocas are automatically cleared by the call. However, LLVM |
| instead models this as a write of undef to all of the inalloca values |
| passed to the call instead of a stack adjustment. Frontends should |
| still restore the stack pointer to avoid a stack leak. |
| |
| Exceptions |
| ---------- |
| |
| There is also the possibility of an exception. If argument evaluation |
| or copy construction throws an exception, the landing pad must do |
| cleanup, which includes adjusting the stack pointer to avoid a stack |
| leak. This means the cleanup of the stack memory cannot be tied to the |
| call itself. There needs to be a separate IR-level instruction that can |
| perform independent cleanup of arguments. |
| |
| Efficiency |
| ---------- |
| |
| Eventually, it should be possible to generate efficient code for this |
| construct. In particular, using inalloca should not require a base |
| pointer. If the backend can prove that all points in the CFG only have |
| one possible stack level, then it can address the stack directly from |
| the stack pointer. While this is not yet implemented, the plan is that |
| the inalloca attribute should not change much, but the frontend IR |
| generation recommendations may change. |