Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime configurations, or to even build a compiler.
In order to debug executable code at a higher level than disassembly, source code files are required.
Reactor has two potential sources of source code:
While case (2) is preferable for implementing a compiler, this is currently not implemented.
Reactor implements case (1) and this can be used by GDB to single line step and inspect variables.
Currently:
Debug generation is enabled with REACTOR_EMIT_DEBUG_INFO
CMake flag (defaults to disabled).
All Reactor functions begin with a call to RR_DEBUG_INFO_UPDATE_LOC()
, which calls into rr::DebugInfo::EmitLocation()
.
rr::DebugInfo::EmitLocation()
calls rr::DebugInfo::getCallerBacktrace()
, which in turn uses libbacktrace
to unwind the stack and find the file, function and line of the caller.
This information is passed to llvm::IRBuilder<>::SetCurrentDebugLocation
to emit source line information for the next LLVM instructions to be built.
There are 3 aspects to generating variable debug information:
Constructing a Reactor LValue
:
rr::Int a = 1;
Will emit an LLVM alloca
instruction to allocate the storage of the variable, and emit another to initialize it to the constant 1
. While fluent, none of the Reactor calls see the name of the C++ local variable “a
”, and the LLVM alloca
value gets a meaningless numerical value.
There are two potential ways that Reactor can obtain the variable name:
While (1) is arguably a cleaner and more robust solution, (2) is easier to implement and can work for the majority of use cases.
(2) is the current solution implemented.
rr::DebugInfo::getOrParseFileTokens()
scans a source file line by line, and uses a regular expression to look for patterns of <type> <name>
. Matching is not precise, but is adequate to find locals constructed with and without assignment.
Given that we can find a variable name for a given source line, we need a way of binding the LLVM values to the name.
Given our trivial example:
rr::Int a = 1
The rr::Int
constructor calls RR_DEBUG_INFO_EMIT_VAR()
passing the storage value as single argument. RR_DEBUG_INFO_EMIT_VAR()
performs the backtrace to find the source file and line and uses the token information produced by rr::DebugInfo::getOrParseFileTokens()
to identify the variable name.
However, things get a bit more complicated when there are multiple variables being constructed on the same line.
Take for example:
rr::Int a = rr::Int(1) + rr::Int(2)
Here we have 3 calls to the rr::Int
constructor, each calling down to RR_DEBUG_INFO_EMIT_VAR()
.
To disambiguate which of these should be bound to the variable name “a
”, rr::DebugInfo::EmitVariable()
buffers the binding into scope.pending
and the last binding for a given line is used by DebugInfo::emitPending()
. For variable construction and assignment, C++ guarantees that the LHS is the last value to be constructed.
This solution is not perfect.
Multi-line expressions, multiple assignments on a single line, macro obfuscation can all break variable bindings - however the majority of typical cases work.
rr::DebugInfo
maintains a stack of llvm::DIScope
s and llvm::DILocation
s that mirrors the current backtrace for function being called.
A synthetic call stack is produced by chaining llvm::DILocation
s with InlinedAt
s.
For example, at the declaration of i
:
void B() { rr::Int i; // <- here } void A() { B(); } int main(int argc, const char* argv[]) { A(); }
The DIScope
hierarchy would be:
DIFile: "foo.cpp" rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main" rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A" rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
The DILocation
hierarchy would be:
rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction") rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main") rr::DebugInfo::diScope[1].location: ↳ DILocation(DISubprogram: "A") rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
Where ‘↳’ represents an InlinedAt
.
rr::DebugInfo::diScope
is updated by rr::DebugInfo::syncScope()
.
llvm::DIScope
s typically do not nest - there is usually a separate llvm::DISubprogram
for each function in the callstack. All local variables within a function will typically share the same scope, regardless of whether they are declared within a sub-block.
Loops and jumps within a function add complexity. Consider:
void B() { rr::Int i = 0; } void A() { for (int i = 0; i < 3; i++) { rr::Int x = 0; } B(); } int main(int argc, const char* argv[]) { A(); }
In this particular example Reactor will not be aware of the for
loop, and will attempt to create three variables called “x
” in the same function scope for A()
. Duplicate symbols in the same llvm::DIScope
result in undefined behavior.
To solve this, rr::DebugInfo::syncScope()
observes when a function jumps backwards, and forks the current llvm::DILexicalBlock
for the function. This results in a number of llvm::DILexicalBlock
chains, each declaring variables that shadow the previous block.
At the declaration of i
, the DIScope
hierarchy would be:
DIFile: "foo.cpp" rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main" ↳ DISubprogram: "A" | ↳ DILexicalBlock: "A".1 rr::DebugInfo::diScope[1].di: | ↳ DILexicalBlock: "A".2 rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
The DILocation
hierarchy would be:
rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction") rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main") rr::DebugInfo::diScope[1].location: ↳ DILocation(DILexicalBlock: "A".2) rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
Once the debug information has been generated, it needs to be handed to the debugger.
Reactor uses llvm::JITEventListener::createGDBRegistrationListener()
to inform GDB of the JIT'd program and its debugging information. More information can be found here.
LLDB should be able to support this same mechanism, but at the time of writing this does not appear to work.