|  | # Reactor Debug Info Generation | 
|  |  | 
|  | ## Introduction | 
|  |  | 
|  | Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime | 
|  | configurations, or to even build a compiler. | 
|  |  | 
|  | In order to debug executable code at a higher level than disassembly, source code files are required. | 
|  |  | 
|  | Reactor has two potential sources of source code: | 
|  |  | 
|  | 1. The C++ source code of the program that calls into Reactor. | 
|  | 2. External source files read by the program and passed to Reactor. | 
|  |  | 
|  | While case (2) is preferable for implementing a compiler, this is currently not | 
|  | implemented. | 
|  |  | 
|  | Reactor implements case (1) and this can be used by GDB to single line step and | 
|  | inspect variables. | 
|  |  | 
|  | ## Supported Platforms | 
|  |  | 
|  | Currently: | 
|  |  | 
|  | * Debug info generation is only supported on Linux with the LLVM 7 | 
|  | backend. | 
|  | * GDB is the only supported debugger. | 
|  | * The program must be compiled with debug info iteself. | 
|  |  | 
|  | ## Enabling | 
|  |  | 
|  | Debug generation is enabled with `REACTOR_EMIT_DEBUG_INFO` CMake flag (defaults | 
|  | to disabled). | 
|  |  | 
|  | ## Implementation details | 
|  |  | 
|  | ### Source Location | 
|  |  | 
|  | All Reactor functions begin with a call to `RR_DEBUG_INFO_UPDATE_LOC()`, which calls into `rr::DebugInfo::EmitLocation()`. | 
|  |  | 
|  | `rr::DebugInfo::EmitLocation()` calls `rr::DebugInfo::getCallerBacktrace()`, | 
|  | which in turn uses [`libbacktrace`](https://github.com/ianlancetaylor/libbacktrace) | 
|  | to unwind the stack and find the file, function and line of the caller. | 
|  |  | 
|  | This information is passed to `llvm::IRBuilder<>::SetCurrentDebugLocation` | 
|  | to emit source line information for the next LLVM instructions to be built. | 
|  |  | 
|  | ### Variables | 
|  |  | 
|  | There are 3 aspects to generating variable debug information: | 
|  |  | 
|  | #### 1. Variable names | 
|  |  | 
|  | Constructing a Reactor `LValue`: | 
|  |  | 
|  | ```C++ | 
|  | rr::Int a = 1; | 
|  | ``` | 
|  |  | 
|  | Will emit an LLVM `alloca` instruction to allocate the storage of the variable, | 
|  | and emit another to initialize it to the constant `1`. While fluent, none of the | 
|  | Reactor calls see the name of the C++ local variable "`a`", and the LLVM `alloca` | 
|  | value gets a meaningless numerical value. | 
|  |  | 
|  | There are two potential ways that Reactor can obtain the variable name: | 
|  |  | 
|  | 1. Use the running executable's own debug information to examine the local | 
|  | declaration and extract the local variable's name. | 
|  | 2. Use the backtrace information to parse the name from the source file. | 
|  |  | 
|  | While (1) is arguably a cleaner and more robust solution, (2) is | 
|  | easier to implement and can work for the majority of use cases. | 
|  |  | 
|  | (2) is the current solution implemented. | 
|  |  | 
|  | `rr::DebugInfo::getOrParseFileTokens()` scans a source file line by line, and | 
|  | uses a regular expression to look for patterns of `<type> <name>`. Matching is not | 
|  | precise, but is adequate to find locals constructed with and without assignment. | 
|  |  | 
|  | #### 2. Variable binding | 
|  |  | 
|  | Given that we can find a variable name for a given source line, we need a way of | 
|  | binding the LLVM values to the name. | 
|  |  | 
|  | Given our trivial example: | 
|  |  | 
|  | ```C++ | 
|  | rr::Int a = 1 | 
|  | ``` | 
|  |  | 
|  | The `rr::Int` constructor calls `RR_DEBUG_INFO_EMIT_VAR()` passing the storage | 
|  | value as single argument. `RR_DEBUG_INFO_EMIT_VAR()` performs the backtrace | 
|  | to find the source file and line and uses the token information produced by | 
|  | `rr::DebugInfo::getOrParseFileTokens()` to identify the variable name. | 
|  |  | 
|  | However, things get a bit more complicated when there are multiple variables | 
|  | being constructed on the same line. | 
|  |  | 
|  | Take for example: | 
|  |  | 
|  | ```C++ | 
|  | rr::Int a = rr::Int(1) + rr::Int(2) | 
|  | ``` | 
|  |  | 
|  | Here we have 3 calls to the `rr::Int` constructor, each calling down | 
|  | to `RR_DEBUG_INFO_EMIT_VAR()`. | 
|  |  | 
|  | To disambiguate which of these should be bound to the variable name "`a`", | 
|  | `rr::DebugInfo::EmitVariable()` buffers the binding into | 
|  | `scope.pending` and the last binding for a given line is used by | 
|  | `DebugInfo::emitPending()`. For variable construction and assignment, C++ | 
|  | guarantees that the LHS is the last value to be constructed. | 
|  |  | 
|  | This solution is not perfect. | 
|  |  | 
|  | Multi-line expressions, multiple assignments on a single line, macro obfuscation | 
|  | can all break variable bindings - however the majority of typical cases work. | 
|  |  | 
|  | #### 3. Variable scope | 
|  |  | 
|  | `rr::DebugInfo` maintains a stack of `llvm::DIScope`s and `llvm::DILocation`s | 
|  | that mirrors the current backtrace for function being called. | 
|  |  | 
|  | A synthetic call stack is produced by chaining `llvm::DILocation`s with | 
|  | `InlinedAt`s. | 
|  |  | 
|  | For example, at the declaration of `i`: | 
|  |  | 
|  | ```C++ | 
|  | void B() | 
|  | { | 
|  | rr::Int i; // <- here | 
|  | } | 
|  |  | 
|  | void A() | 
|  | { | 
|  | B(); | 
|  | } | 
|  |  | 
|  | int main(int argc, const char* argv[]) | 
|  | { | 
|  | A(); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | The `DIScope` hierarchy would be: | 
|  |  | 
|  | ```C++ | 
|  | DIFile: "foo.cpp" | 
|  | rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main" | 
|  | rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A" | 
|  | rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B" | 
|  | ``` | 
|  |  | 
|  | The `DILocation` hierarchy would be: | 
|  |  | 
|  | ```C++ | 
|  | rr::DebugInfo::diRootLocation:      DILocation(DISubprogram: "ReactorFunction") | 
|  | rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main") | 
|  | rr::DebugInfo::diScope[1].location:   ↳ DILocation(DISubprogram: "A") | 
|  | rr::DebugInfo::diScope[2].location:     ↳ DILocation(DISubprogram: "B") | 
|  | ``` | 
|  |  | 
|  | Where '↳' represents an `InlinedAt`. | 
|  |  | 
|  |  | 
|  | `rr::DebugInfo::diScope` is updated by `rr::DebugInfo::syncScope()`. | 
|  |  | 
|  | `llvm::DIScope`s typically do not nest - there is usually a separate | 
|  | `llvm::DISubprogram` for each function in the callstack. All local variables | 
|  | within a function will typically share the same scope, regardless of whether | 
|  | they are declared within a sub-block. | 
|  |  | 
|  | Loops and jumps within a function add complexity. Consider: | 
|  |  | 
|  | ```C++ | 
|  | void B() | 
|  | { | 
|  | rr::Int i = 0; | 
|  | } | 
|  |  | 
|  | void A() | 
|  | { | 
|  | for (int i = 0; i < 3; i++) | 
|  | { | 
|  | rr::Int x = 0; | 
|  | } | 
|  | B(); | 
|  | } | 
|  |  | 
|  | int main(int argc, const char* argv[]) | 
|  | { | 
|  | A(); | 
|  | } | 
|  | ``` | 
|  |  | 
|  | In this particular example Reactor will not be aware of the `for` loop, and will | 
|  | attempt to create three variables called "`x`" in the same function scope for `A()`. | 
|  | Duplicate symbols in the same `llvm::DIScope` result in undefined behavior. | 
|  |  | 
|  | To solve this, `rr::DebugInfo::syncScope()` observes when a function jumps | 
|  | backwards, and forks the current `llvm::DILexicalBlock` for the function. This | 
|  | results in a number of `llvm::DILexicalBlock` chains, each declaring variables | 
|  | that shadow the previous block. | 
|  |  | 
|  | At the declaration of `i`, the `DIScope` hierarchy would be: | 
|  |  | 
|  | ```C++ | 
|  | DIFile: "foo.cpp" | 
|  | rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main" | 
|  | ↳ DISubprogram: "A" | 
|  | | ↳ DILexicalBlock: "A".1 | 
|  | rr::DebugInfo::diScope[1].di: |   ↳ DILexicalBlock: "A".2 | 
|  | rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B" | 
|  | ``` | 
|  |  | 
|  | The `DILocation` hierarchy would be: | 
|  |  | 
|  | ```C++ | 
|  | rr::DebugInfo::diRootLocation:      DILocation(DISubprogram: "ReactorFunction") | 
|  | rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main") | 
|  | rr::DebugInfo::diScope[1].location:   ↳ DILocation(DILexicalBlock: "A".2) | 
|  | rr::DebugInfo::diScope[2].location:     ↳ DILocation(DISubprogram: "B") | 
|  | ``` | 
|  |  | 
|  | ### Debugger integration | 
|  |  | 
|  | Once the debug information has been generated, it needs to be handed to the | 
|  | debugger. | 
|  |  | 
|  | Reactor uses [`llvm::JITEventListener::createGDBRegistrationListener()`](http://llvm.org/doxygen/classllvm_1_1JITEventListener.html#a004abbb5a0d48ac376dfbe3e3c97c306) | 
|  | to inform GDB of the JIT'd program and its debugging information. | 
|  | More information [can be found here](https://llvm.org/docs/DebuggingJITedCode.html). | 
|  |  | 
|  | LLDB should be able to support this same mechanism, but at the time of writing | 
|  | this does not appear to work. | 
|  |  |