| Target-specific lowering in ICE |
| =============================== |
| |
| This document discusses several issues around generating target-specific ICE |
| instructions from high-level ICE instructions. |
| |
| Meeting register address mode constraints |
| ----------------------------------------- |
| |
| Target-specific instructions often require specific operands to be in physical |
| registers. Sometimes one specific register is required, but usually any |
| register in a particular register class will suffice, and that register class is |
| defined by the instruction/operand type. |
| |
| The challenge is that ``Variable`` represents an operand that is either a stack |
| location in the current frame, or a physical register. Register allocation |
| happens after target-specific lowering, so during lowering we generally don't |
| know whether a ``Variable`` operand will meet a target instruction's physical |
| register requirement. |
| |
| To this end, ICE allows certain directives: |
| |
| * ``Variable::setWeightInfinite()`` forces a ``Variable`` to get some |
| physical register (without specifying which particular one) from a |
| register class. |
| |
| * ``Variable::setRegNum()`` forces a ``Variable`` to be assigned a specific |
| physical register. |
| |
| These directives are described below in more detail. In most cases, though, |
| they don't need to be explicity used, as the routines that create lowered |
| instructions have reasonable defaults and simple options that control these |
| directives. |
| |
| The recommended ICE lowering strategy is to generate extra assignment |
| instructions involving extra ``Variable`` temporaries, using the directives to |
| force suitable register assignments for the temporaries, and then let the |
| register allocator clean things up. |
| |
| Note: There is a spectrum of *implementation complexity* versus *translation |
| speed* versus *code quality*. This recommended strategy picks a point on the |
| spectrum representing very low complexity ("splat-isel"), pretty good code |
| quality in terms of frame size and register shuffling/spilling, but perhaps not |
| the fastest translation speed since extra instructions and operands are created |
| up front and cleaned up at the end. |
| |
| Ensuring a non-specific physical register |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| The x86 instruction:: |
| |
| mov dst, src |
| |
| needs at least one of its operands in a physical register (ignoring the case |
| where ``src`` is a constant). This can be done as follows:: |
| |
| mov reg, src |
| mov dst, reg |
| |
| so long as ``reg`` is guaranteed to have a physical register assignment. The |
| low-level lowering code that accomplishes this looks something like:: |
| |
| Variable *Reg; |
| Reg = Func->makeVariable(Dst->getType()); |
| Reg->setWeightInfinite(); |
| NewInst = InstX8632Mov::create(Func, Reg, Src); |
| NewInst = InstX8632Mov::create(Func, Dst, Reg); |
| |
| ``Cfg::makeVariable()`` generates a new temporary, and |
| ``Variable::setWeightInfinite()`` gives it infinite weight for the purpose of |
| register allocation, thus guaranteeing it a physical register (though leaving |
| the particular physical register to be determined by the register allocator). |
| |
| The ``_mov(Dest, Src)`` method in the ``TargetX8632`` class is sufficiently |
| powerful to handle these details in most situations. Its ``Dest`` argument is |
| an in/out parameter. If its input value is ``nullptr``, then a new temporary |
| variable is created, its type is set to the same type as the ``Src`` operand, it |
| is given infinite register weight, and the new ``Variable`` is returned through |
| the in/out parameter. (This is in addition to the new temporary being the dest |
| operand of the ``mov`` instruction.) The simpler version of the above example |
| is:: |
| |
| Variable *Reg = nullptr; |
| _mov(Reg, Src); |
| _mov(Dst, Reg); |
| |
| Preferring another ``Variable``'s physical register |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| (An older version of ICE allowed the lowering code to provide a register |
| allocation hint: if a physical register is to be assigned to one ``Variable``, |
| then prefer a particular ``Variable``'s physical register if available. This |
| hint would be used to try to reduce the amount of register shuffling. |
| Currently, the register allocator does this automatically through the |
| ``FindPreference`` logic.) |
| |
| Ensuring a specific physical register |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| Some instructions require operands in specific physical registers, or produce |
| results in specific physical registers. For example, the 32-bit ``ret`` |
| instruction needs its operand in ``eax``. This can be done with |
| ``Variable::setRegNum()``:: |
| |
| Variable *Reg; |
| Reg = Func->makeVariable(Src->getType()); |
| Reg->setWeightInfinite(); |
| Reg->setRegNum(Reg_eax); |
| NewInst = InstX8632Mov::create(Func, Reg, Src); |
| NewInst = InstX8632Ret::create(Func, Reg); |
| |
| Precoloring with ``Variable::setRegNum()`` effectively gives it infinite weight |
| for register allocation, so the call to ``Variable::setWeightInfinite()`` is |
| technically unnecessary, but perhaps documents the intention a bit more |
| strongly. |
| |
| The ``_mov(Dest, Src, RegNum)`` method in the ``TargetX8632`` class has an |
| optional ``RegNum`` argument to force a specific register assignment when the |
| input ``Dest`` is ``nullptr``. As described above, passing in ``Dest=nullptr`` |
| causes a new temporary variable to be created with infinite register weight, and |
| in addition the specific register is chosen. The simpler version of the above |
| example is:: |
| |
| Variable *Reg = nullptr; |
| _mov(Reg, Src, Reg_eax); |
| _ret(Reg); |
| |
| Disabling live-range interference |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| |
| (An older version of ICE allowed an overly strong preference for another |
| ``Variable``'s physical register even if their live ranges interfered. This was |
| risky, and currently the register allocator derives this automatically through |
| the ``AllowOverlap`` logic.) |
| |
| Call instructions kill scratch registers |
| ---------------------------------------- |
| |
| A ``call`` instruction kills the values in all scratch registers, so it's |
| important that the register allocator doesn't allocate a scratch register to a |
| ``Variable`` whose live range spans the ``call`` instruction. ICE provides the |
| ``InstFakeKill`` pseudo-instruction to compactly mark such register kills. For |
| each scratch register, a fake trivial live range is created that begins and ends |
| in that instruction. The ``InstFakeKill`` instruction is inserted after the |
| ``call`` instruction. For example:: |
| |
| CallInst = InstX8632Call::create(Func, ... ); |
| NewInst = InstFakeKill::create(Func, CallInst); |
| |
| The last argument to the ``InstFakeKill`` constructor links it to the previous |
| call instruction, such that if its linked instruction is dead-code eliminated, |
| the ``InstFakeKill`` instruction is eliminated as well. The linked ``call`` |
| instruction could be to a target known to be free of side effects, and therefore |
| safe to remove if its result is unused. |
| |
| Instructions producing multiple values |
| -------------------------------------- |
| |
| ICE instructions allow at most one destination ``Variable``. Some machine |
| instructions produce more than one usable result. For example, the x86-32 |
| ``call`` ABI returns a 64-bit integer result in the ``edx:eax`` register pair. |
| Also, x86-32 has a version of the ``imul`` instruction that produces a 64-bit |
| result in the ``edx:eax`` register pair. The x86-32 ``idiv`` instruction |
| produces the quotient in ``eax`` and the remainder in ``edx``, though generally |
| only one or the other is needed in the lowering. |
| |
| To support multi-dest instructions, ICE provides the ``InstFakeDef`` |
| pseudo-instruction, whose destination can be precolored to the appropriate |
| physical register. For example, a ``call`` returning a 64-bit result in |
| ``edx:eax``:: |
| |
| CallInst = InstX8632Call::create(Func, RegLow, ... ); |
| NewInst = InstFakeKill::create(Func, CallInst); |
| Variable *RegHigh = Func->makeVariable(IceType_i32); |
| RegHigh->setRegNum(Reg_edx); |
| NewInst = InstFakeDef::create(Func, RegHigh); |
| |
| ``RegHigh`` is then assigned into the desired ``Variable``. If that assignment |
| ends up being dead-code eliminated, the ``InstFakeDef`` instruction may be |
| eliminated as well. |
| |
| Managing dead-code elimination |
| ------------------------------ |
| |
| ICE instructions with a non-nullptr ``Dest`` are subject to dead-code |
| elimination. However, some instructions must not be eliminated in order to |
| preserve side effects. This applies to most function calls, volatile loads, and |
| loads and integer divisions where the underlying language and runtime are |
| relying on hardware exception handling. |
| |
| ICE facilitates this with the ``InstFakeUse`` pseudo-instruction. This forces a |
| use of its source ``Variable`` to keep that variable's definition alive. Since |
| the ``InstFakeUse`` instruction has no ``Dest``, it will not be eliminated. |
| |
| Here is the full example of the x86-32 ``call`` returning a 32-bit integer |
| result:: |
| |
| Variable *Reg = Func->makeVariable(IceType_i32); |
| Reg->setRegNum(Reg_eax); |
| CallInst = InstX8632Call::create(Func, Reg, ... ); |
| NewInst = InstFakeKill::create(Func, CallInst); |
| NewInst = InstFakeUse::create(Func, Reg); |
| NewInst = InstX8632Mov::create(Func, Result, Reg); |
| |
| Without the ``InstFakeUse``, the entire call sequence could be dead-code |
| eliminated if its result were unused. |
| |
| One more note on this topic. These tools can be used to allow a multi-dest |
| instruction to be dead-code eliminated only when none of its results is live. |
| The key is to use the optional source parameter of the ``InstFakeDef`` |
| instruction. Using pseudocode:: |
| |
| t1:eax = call foo(arg1, ...) |
| InstFakeKill // eax, ecx, edx |
| t2:edx = InstFakeDef(t1) |
| v_result_low = t1 |
| v_result_high = t2 |
| |
| If ``v_result_high`` is live but ``v_result_low`` is dead, adding ``t1`` as an |
| argument to ``InstFakeDef`` suffices to keep the ``call`` instruction live. |
| |
| Instructions modifying source operands |
| -------------------------------------- |
| |
| Some native instructions may modify one or more source operands. For example, |
| the x86 ``xadd`` and ``xchg`` instructions modify both source operands. Some |
| analysis needs to identify every place a ``Variable`` is modified, and it uses |
| the presence of a ``Dest`` variable for this analysis. Since ICE instructions |
| have at most one ``Dest``, the ``xadd`` and ``xchg`` instructions need special |
| treatment. |
| |
| A ``Variable`` that is not the ``Dest`` can be marked as modified by adding an |
| ``InstFakeDef``. However, this is not sufficient, as the ``Variable`` may have |
| no more live uses, which could result in the ``InstFakeDef`` being dead-code |
| eliminated. The solution is to add an ``InstFakeUse`` as well. |
| |
| To summarize, for every source ``Variable`` that is not equal to the |
| instruction's ``Dest``, append an ``InstFakeDef`` and ``InstFakeUse`` |
| instruction to provide the necessary analysis information. |