Blame - docs/ReactorDebugInfo.md - SwiftShader

blob: 3b612d4654826d0e3f4708bcc74617d8c5054050 [file] [log] [blame] [view]

Ben Clayton	ac07ed8	2019-03-26 14:17:41 +0000	[diff] [blame]	1	# Reactor Debug Info Generation
				2
				3	## Introduction
				4
				5	Reactor produces Just In Time compiled dynamic executable code and can be used to JIT high performance functions specialized for runtime
				6	configurations, or to even build a compiler.
				7
				8	In order to debug executable code at a higher level than disassembly, source code files are required.
				9
				10	Reactor has two potential sources of source code:
				11
				12	1. The C++ source code of the program that calls into Reactor.
				13	2. External source files read by the program and passed to Reactor.
				14
				15	While case (2) is preferable for implementing a compiler, this is currently not
				16	implemented.
				17
				18	Reactor implements case (1) and this can be used by GDB to single line step and
				19	inspect variables.
				20
				21	## Supported Platforms
				22
				23	Currently:
				24
				25	* Debug info generation is only supported on Linux with the LLVM 7
				26	backend.
				27	* GDB is the only supported debugger.
				28	* The program must be compiled with debug info iteself.
				29
				30	## Enabling
				31
				32	Debug generation is enabled with `REACTOR_EMIT_DEBUG_INFO` CMake flag (defaults
				33	to disabled).
				34
				35	## Implementation details
				36
				37	### Source Location
				38
				39	All Reactor functions begin with a call to `RR_DEBUG_INFO_UPDATE_LOC()`, which calls into `rr::DebugInfo::EmitLocation()`.
				40
				41	`rr::DebugInfo::EmitLocation()` calls `rr::DebugInfo::getCallerBacktrace()`,
				42	which in turn uses [`libbacktrace`](https://github.com/ianlancetaylor/libbacktrace)
				43	to unwind the stack and find the file, function and line of the caller.
				44
				45	This information is passed to `llvm::IRBuilder<>::SetCurrentDebugLocation`
				46	to emit source line information for the next LLVM instructions to be built.
				47
				48	### Variables
				49
				50	There are 3 aspects to generating variable debug information:
				51
				52	#### 1. Variable names
				53
				54	Constructing a Reactor `LValue`:
				55
				56	```C++
				57	rr::Int a = 1;
				58	```
				59
				60	Will emit an LLVM `alloca` instruction to allocate the storage of the variable,
				61	and emit another to initialize it to the constant `1`. While fluent, none of the
				62	Reactor calls see the name of the C++ local variable "`a`", and the LLVM `alloca`
				63	value gets a meaningless numerical value.
				64
				65	There are two potential ways that Reactor can obtain the variable name:
				66
				67	1. Use the running executable's own debug information to examine the local
				68	declaration and extract the local variable's name.
				69	2. Use the backtrace information to parse the name from the source file.
				70
				71	While (1) is arguably a cleaner and more robust solution, (2) is
				72	easier to implement and can work for the majority of use cases.
				73
				74	(2) is the current solution implemented.
				75
				76	`rr::DebugInfo::getOrParseFileTokens()` scans a source file line by line, and
				77	uses a regular expression to look for patterns of `<type> <name>`. Matching is not
				78	precise, but is adequate to find locals constructed with and without assignment.
				79
				80	#### 2. Variable binding
				81
				82	Given that we can find a variable name for a given source line, we need a way of
				83	binding the LLVM values to the name.
				84
				85	Given our trivial example:
				86
				87	```C++
				88	rr::Int a = 1
				89	```
				90
				91	The `rr::Int` constructor calls `RR_DEBUG_INFO_EMIT_VAR()` passing the storage
				92	value as single argument. `RR_DEBUG_INFO_EMIT_VAR()` performs the backtrace
				93	to find the source file and line and uses the token information produced by
				94	`rr::DebugInfo::getOrParseFileTokens()` to identify the variable name.
				95
				96	However, things get a bit more complicated when there are multiple variables
				97	being constructed on the same line.
				98
				99	Take for example:
				100
				101	```C++
				102	rr::Int a = rr::Int(1) + rr::Int(2)
				103	```
				104
				105	Here we have 3 calls to the `rr::Int` constructor, each calling down
				106	to `RR_DEBUG_INFO_EMIT_VAR()`.
				107
				108	To disambiguate which of these should be bound to the variable name "`a`",
				109	`rr::DebugInfo::EmitVariable()` buffers the binding into
				110	`scope.pending` and the last binding for a given line is used by
				111	`DebugInfo::emitPending()`. For variable construction and assignment, C++
				112	guarantees that the LHS is the last value to be constructed.
				113
				114	This solution is not perfect.
				115
				116	Multi-line expressions, multiple assignments on a single line, macro obfuscation
				117	can all break variable bindings - however the majority of typical cases work.
				118
				119	#### 3. Variable scope
				120
				121	`rr::DebugInfo` maintains a stack of `llvm::DIScope`s and `llvm::DILocation`s
				122	that mirrors the current backtrace for function being called.
				123
				124	A synthetic call stack is produced by chaining `llvm::DILocation`s with
				125	`InlinedAt`s.
				126
				127	For example, at the declaration of `i`:
				128
				129	```C++
				130	void B()
				131	{
				132	rr::Int i; // <- here
				133	}
				134
				135	void A()
				136	{
				137	B();
				138	}
				139
				140	int main(int argc, const char* argv[])
				141	{
				142	A();
				143	}
				144	```
				145
				146	The `DIScope` hierarchy would be:
				147
				148	```C++
				149	DIFile: "foo.cpp"
				150	rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
				151	rr::DebugInfo::diScope[1].di: ↳ DISubprogram: "A"
				152	rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
				153	```
				154
				155	The `DILocation` hierarchy would be:
				156
				157	```C++
				158	rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction")
				159	rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
				160	rr::DebugInfo::diScope[1].location: ↳ DILocation(DISubprogram: "A")
				161	rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
				162	```
				163
				164	Where '↳' represents an `InlinedAt`.
				165
				166
				167	`rr::DebugInfo::diScope` is updated by `rr::DebugInfo::syncScope()`.
				168
				169	`llvm::DIScope`s typically do not nest - there is usually a separate
				170	`llvm::DISubprogram` for each function in the callstack. All local variables
				171	within a function will typically share the same scope, regardless of whether
				172	they are declared within a sub-block.
				173
				174	Loops and jumps within a function add complexity. Consider:
				175
				176	```C++
				177	void B()
				178	{
				179	rr::Int i = 0;
				180	}
				181
				182	void A()
				183	{
				184	for (int i = 0; i < 3; i++)
				185	{
				186	rr::Int x = 0;
				187	}
				188	B();
				189	}
				190
				191	int main(int argc, const char* argv[])
				192	{
				193	A();
				194	}
				195	```
				196
				197	In this particular example Reactor will not be aware of the `for` loop, and will
				198	attempt to create three variables called "`x`" in the same function scope for `A()`.
				199	Duplicate symbols in the same `llvm::DIScope` result in undefined behavior.
				200
				201	To solve this, `rr::DebugInfo::syncScope()` observes when a function jumps
				202	backwards, and forks the current `llvm::DILexicalBlock` for the function. This
				203	results in a number of `llvm::DILexicalBlock` chains, each declaring variables
				204	that shadow the previous block.
				205
				206	At the declaration of `i`, the `DIScope` hierarchy would be:
				207
				208	```C++
				209	DIFile: "foo.cpp"
				210	rr::DebugInfo::diScope[0].di: ↳ DISubprogram: "main"
				211	↳ DISubprogram: "A"
				212	\| ↳ DILexicalBlock: "A".1
				213	rr::DebugInfo::diScope[1].di: \| ↳ DILexicalBlock: "A".2
				214	rr::DebugInfo::diScope[2].di: ↳ DISubprogram: "B"
				215	```
				216
				217	The `DILocation` hierarchy would be:
				218
				219	```C++
				220	rr::DebugInfo::diRootLocation: DILocation(DISubprogram: "ReactorFunction")
				221	rr::DebugInfo::diScope[0].location: ↳ DILocation(DISubprogram: "main")
				222	rr::DebugInfo::diScope[1].location: ↳ DILocation(DILexicalBlock: "A".2)
				223	rr::DebugInfo::diScope[2].location: ↳ DILocation(DISubprogram: "B")
				224	```
				225
				226	### Debugger integration
				227
				228	Once the debug information has been generated, it needs to be handed to the
				229	debugger.
				230
				231	Reactor uses [`llvm::JITEventListener::createGDBRegistrationListener()`](http://llvm.org/doxygen/classllvm_1_1JITEventListener.html#a004abbb5a0d48ac376dfbe3e3c97c306)
				232	to inform GDB of the JIT'd program and its debugging information.
				233	More information [can be found here](https://llvm.org/docs/DebuggingJITedCode.html).
				234
				235	LLDB should be able to support this same mechanism, but at the time of writing
				236	this does not appear to work.
				237