Blame - docs/Reactor.md - SwiftShader

blob: 682052f2cc30d61f69c6c457454078e441fa5654 [file] [log] [blame] [view]

Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	1	Reactor Documentation
				2	=====================
				3
				4	Reactor is an embedded language for C++ to facilitate dynamic code generation and specialization.
				5
				6	Introduction
				7	------------
				8
				9	To generate the code for an expression such as
				10	```C++
				11	float y = 1 - x;
				12	```
				13	using the LLVM compiler framework, one needs to execute
				14	```C++
				15	Value *valueY = BinaryOperator::CreateSub(ConstantInt::get(Type::getInt32Ty(Context), 1), valueX, "y", basicBlock);
				16	```
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	17
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	18	For large expressions this quickly becomes hard to read, and tedious to write and modify.
				19
				20	With Reactor, it becomes as simple as writing
				21	```C++
				22	Float y = 1 - x;
				23	```
				24	Note the capital letter for the type. This is not the code to perform the calculation. It's the code that when executed will record the calculation to be performed.
				25
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	26	This is possible through the use of C++ operator overloading. Reactor also supports control flow constructs and pointer arithmetic with C-like syntax.
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	27
				28	Motivation
				29	----------
				30
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	31	Just-in-time (JIT) compiled code has the potential to be faster than statically compiled code, through [run-time specialization](http://en.wikipedia.org/wiki/Run-time_algorithm_specialisation). However, this is rarely achieved in practice.
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	32
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	33	Specialization in general is the use of a more optimal routine that is specific for a certain set of conditions. For example when sorting two numbers it is faster to swap them if they are not yet in order, than to call a generic quicksort function. Specialization can be done statically, by explicitly writing each variant or by using metaprogramming to generate multiple variants at static compile time, or dynamically by examining the parameters at run-time and generating a specialized path.
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	34
Nicolas Capens	cf83d16	2016-07-02 23:41:30 -0400	[diff] [blame]	35	Because specialization can be done statically, sometimes aided by metaprogramming, the ability of a JIT-compiler to do it at run-time is often disregarded. Specialized benchmarks show no advantage of JIT code over static code. However, having a specialized benchmark does not take into account that a typical real-world application deals with many unpredictable conditions. Systems can have one core or several dozen cores, and many different ISA extensions. This alone can make it impractical to write fully specialized routines manually, and with the help of metaprogramming it results in code bloat. Worse yet, any non-trivial application has a layered architecture in which lower layers (e.g. framework APIs) know very little or nothing about the usage by higher layers. Various parameters also depend on user input. Run-time specialization can have access to the full context in which each routine executes, and although the optimization contribution of specialization for a single parameter is small, the combined speedup can be huge. As an extreme example, interpreters can execute any kind of program in any language, but by specializing for a specific program you get a compiled version of that program. But you don't need a full-blown language to observe a huge difference between interpretation and specialization through compilation. Most applications process some form of list of commands in an interpreted fashion, and even the series of calls into a framework API can be compiled into a more efficient whole at run-time.
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	36
Nicolas Capens	cf83d16	2016-07-02 23:41:30 -0400	[diff] [blame]	37	While the benefit of run-time specialization should now be apparent, JIT-compiled languages lack many of the practical advantages of static compilation. JIT-compilers are very constrained in how much time they can spend on compiling the bytecode into machine code. This limits their ability to even reach parity with static compilation, let alone attempt to exceed it by performing run-time specialization. Also, even if the compilation time was not as constrained, they can't specialize at every opportunity because it would result in an explosive growth of the amount of generated code. There's a need to be very selective in only specializing the hotspots for often recurring conditions, and to manage a cache of the different variants. Even just selecting the size of the set of variables that form the entire condition to specialize for can get immensely complicated.
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	38
Nicolas Capens	cf83d16	2016-07-02 23:41:30 -0400	[diff] [blame]	39	Clearly we need a manageable way to benefit from run-time specialization where it would help significantly, while still resorting to static compilation for anything else. A crucial observation is that the developer has expectations about the application's behavior, which is valuable information which can be exploited to choose between static or JIT-compilation. One way to do that is to use an API which JIT-compiles the commands provided by the application developer. An example of this is an advanced DBMS which compiles the query into an optimized sequence of routines, each specialized to the data types involved, the sizes of the CPU caches, etc. Another example is a modern graphics API, which takes shaders (a routine executed per pixel or other element) and a set of parameters which affect their execution, and compiles them into GPU-specific code. However, these examples have a very hard divide between what goes on inside the API and outside. You can't exchange data between the statically compiled outside world and the JIT-compiled routines, unless through the API, and they have very different execution models. In other words they are highly domain specific and not generic ways to exploit run-time specialization in arbitrary code.
Nicolas Capens	ebefeba	2016-06-14 15:36:01 -0400	[diff] [blame]	40
Nicolas Capens	cf83d16	2016-07-02 23:41:30 -0400	[diff] [blame]	41	This is becoming especially problematic for GPUs, as they are now just as programmable as CPUs but you can still only command them through an API. Attempts to disguise this by using a single language, such as C++AMP and SYCL, still have difficulties expressing how data is exchanged, don't actually provide control over the specialization, they have hidden overhead, and they have unpredictable performance characteristics across devices. Meanwhile CPUs gain ever more cores and wider SIMD vector units, but statically compiled languages don't readily exploit this and can't deal with the many code paths required to extract optimal performance. A different language and framework is required.
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	42
				43	Concepts and Syntax
				44	-------------------
				45
				46	### Routine and Function<>
				47
				48	Reactor allows you to create new functions at run-time. Their generation happens in C++, and after materializing them they can be called during the execution of the same C++ program. We call these dynamically generated functions "routines", to discern them from statically compiled functions and methods. Reactor's ```Routine``` class encapsulates a routine. Deleting a Routine object also frees the memory used to store the routine.
				49
				50	To declare the function signature of a routine, use the ```Function<>``` template. The template argument is the signature of a function, using Reactor variable types. Here's a complete definition of a routine taking no arguments and returning an integer:
				51
				52	```C++
				53	Function<Int(Void)> function;
				54	{
				55	Return(1);
				56	}
				57	```
				58
				59	The braces are superfluous. They just make the syntax look more like regular C++, and they offer a new scope for Reactor variables.
				60
Nicolas Capens	342b5c6	2016-06-20 12:43:14 -0400	[diff] [blame]	61	The Routine is obtained and materialized by "calling" the ```Function<>``` object to give it a name:
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	62
				63	```C++
				64	Routine *routine = function(L"one");
				65	```
				66
Nicolas Capens	342b5c6	2016-06-20 12:43:14 -0400	[diff] [blame]	67	Finally, we can obtain the function pointer to the entry point of the routine, and call it:
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	68
				69	```C++
Nicolas Capens	ac23012	2016-09-20 14:30:06 -0400	[diff] [blame]	70	int (callable)() = (int()())routine->getEntry();
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	71
				72	int result = callable();
				73	assert(result == 1);
				74	```
				75
Nicolas Capens	cf83d16	2016-07-02 23:41:30 -0400	[diff] [blame]	76	Note that ```Function<>``` objects are relatively heavyweight, since they have the entire JIT-compiler behind them, while ```Routine``` objects are lightweight and merely provide storage and lifetime management of generated routines. So we typically allow the ```Function<>``` object to be destroyed (by going out of scope), while the ```Routine``` object is retained until we no longer need to call the routine. Hence the distinction between them and the need for a couple of lines of boilerplate code.
Nicolas Capens	342b5c6	2016-06-20 12:43:14 -0400	[diff] [blame]	77
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	78	### Arguments and Expressions
				79
Nicolas Capens	342b5c6	2016-06-20 12:43:14 -0400	[diff] [blame]	80	Routines can take various arguments. The following example illustrates the syntax for accessing the arguments of a routine which takes two integer arguments and returns their sum:
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	81
				82	```C++
				83	Function<Int(Int, Int)> function;
				84	{
				85	Int x = function.Arg<0>();
				86	Int y = function.Arg<1>();
				87
				88	Int sum = x + y;
				89
				90	Return(sum);
				91	}
				92	```
				93
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	94	Reactor supports various types which correspond to C++ types:
				95
				96	\| Class name \| C++ equivalent \|
				97	\| ------------- \|----------------\|
				98	\| Int \| int32_t \|
				99	\| UInt \| uint32_t \|
				100	\| Short \| int16_t \|
				101	\| UShort \| uint16_t \|
				102	\| Byte \| uint8_t \|
				103	\| SByte \| int8_t \|
				104	\| Long \| int64_t \|
				105	\| ULong \| uint64_t \|
				106	\| Float \| float \|
				107
				108	Note that bytes are unsigned unless prefixed with S, while larger integers are signed unless prefixed with U.
				109
				110	These scalar types support all of the C++ arithmetic operations.
				111
Nicolas Capens	d022e41	2016-09-26 13:30:14 -0400	[diff] [blame]	112	Reactor also supports several vector types. For example ```Float4``` is a vector of four floats. They support a select number of C++ operators, and several "intrinsic" functions such as ```Max()``` to compute the element-wise maximum and return a bit mask. Check [Reactor.hpp](../src/Reactor/Reactor.hpp) for all the types, operators and intrinsics.
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	113
				114	### Casting and Reinterpreting
				115
				116	Types can be cast using the constructor-style syntax:
				117
				118	```C++
				119	Function<Int(Float)> function;
				120	{
				121	Float x = function.Arg<0>();
				122
				123	Int cast = Int(x);
				124
				125	Return(cast);
				126	}
				127	```
				128
				129	You can reinterpret-cast a variable using ```As<>```:
				130
				131	```C++
				132	Function<Int(Float)> function;
				133	{
				134	Float x = function.Arg<0>();
				135
				136	Int reinterpret = As<Int>(x);
				137
				138	Return(reinterpret);
				139	}
				140	```
				141
				142	Note that this is a bitwise cast. Unlike C++'s ```reinterpret_cast<>```, it does not allow casting between different sized types. Think of it as storing the value in memory and then loading from that same address into the casted type.
				143
				144	### Pointers
				145
				146	Pointers also use a template class:
				147
				148	```C++
				149	Function<Int(Pointer<Int>)> function;
				150	{
				151	Pointer<Int> x = function.Arg<0>();
Nicolas Capens	47eca45	2016-12-08 10:57:43 -0500	[diff] [blame]	152
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	153	Int dereference = *x;
Nicolas Capens	47eca45	2016-12-08 10:57:43 -0500	[diff] [blame]	154
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	155	Return(dereference);
				156	}
				157	```
				158
Nicolas Capens	47eca45	2016-12-08 10:57:43 -0500	[diff] [blame]	159	Pointer arithmetic is only supported on ```Pointer<Byte>```, and can be used to access structure fields:
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	160
				161	```C++
				162	struct S
				163	{
				164	int x;
Nicolas Capens	47eca45	2016-12-08 10:57:43 -0500	[diff] [blame]	165	int y;
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	166	};
				167
				168	Function<Int(Pointer<Byte>)> function;
				169	{
				170	Pointer<Byte> s = function.Arg<0>();
				171
				172	Int y = *Pointer<Int>(s + offsetof(S, y));
				173
				174	Return(y);
				175	}
				176	```
				177
				178	Reactor also defines an OFFSET() macro equivalent to the standard offsetof() macro.
				179
				180	### Conditionals
				181
				182	To generate for example the [unit step](https://en.wikipedia.org/wiki/Heaviside_step_function) function:
				183
				184	```C++
				185	Function<Float(Float)> function;
				186	{
				187	Pointer<Float> x = function.Arg<0>();
				188
				189	If(x > 0.0f)
				190	{
				191	Return(1.0f);
				192	}
				193	Else If(x < 0.0f)
				194	{
				195	Return(0.0f);
				196	}
				197	Else
				198	{
				199	Return(0.5f);
				200	}
				201	}
				202	```
				203
				204	There's also an IfThenElse() intrinsic function which corresponds with the C++ ?: operator.
				205
				206	### Loops
				207
				208	Loops also have a syntax similar to C++:
				209
				210	```C++
				211	Function<Int(Pointer<Int>, Int)> function;
				212	{
				213	Pointer<Int> p = function.Arg<0>();
				214	Int n = function.Arg<1>();
				215	Int total = 0;
				216
				217	For(Int i = 0, i < n, i++)
				218	{
Nicolas Capens	cf5be24	2016-06-25 02:39:23 -0400	[diff] [blame]	219	total += p[i];
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	220	}
				221
				222	Return(total);
				223	}
				224	```
				225
				226	Note the use of commas instead of semicolons to separate the loop expressions.
				227
				228	```While(expr) {}``` also works as expected, but there is no ```Do {} While(expr)``` equivalent because we can't discern between them. Instead, there's a ```Do {} Until(expr)``` where you can use the inverse expression to exit the loop.
				229
				230	Specialization
				231	--------------
				232
				233	The above examples don't illustrate anything that can't be written as regular C++ function. The real power of Reactor is to generate routines that are specialized for a certain set of conditions, or "state".
				234
				235	```C++
				236	Function<Int(Pointer<Int>, Int)> function;
				237	{
				238	Pointer<Int> p = function.Arg<0>();
				239	Int n = function.Arg<1>();
				240	Int total = 0;
				241
				242	For(Int i = 0, i < n, i++)
				243	{
				244	if(state.operation == ADD)
				245	{
Nicolas Capens	cf5be24	2016-06-25 02:39:23 -0400	[diff] [blame]	246	total += p[i];
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	247	}
				248	else if(state.operation == SUBTRACT)
				249	{
Nicolas Capens	cf5be24	2016-06-25 02:39:23 -0400	[diff] [blame]	250	total -= p[i];
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	251	}
				252	else if(state.operation == AND)
				253	{
Nicolas Capens	cf5be24	2016-06-25 02:39:23 -0400	[diff] [blame]	254	total &= p[i];
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	255	}
				256	else if(...)
				257	{
				258	...
				259	}
Nicolas Capens	fbc62d7	2016-06-17 13:47:28 -0400	[diff] [blame]	260	}
				261
				262	Return(total);
				263	}
				264	```
				265
				266	Note that this example uses regular C++ ```if``` and ```else``` constructs. They only determine which code ends up in the generated routine, and don't end up in the generated code themselves. Thus the routine contains a loop with just one arithmetic or logical operation, making it more efficient than if this was written in regular C++.
				267
				268	Of course one could write an equivalent efficient function in regular C++ like this:
				269
				270	```C++
				271	int function(int *p, int n)
				272	{
				273	int total = 0;
				274
				275	if(state.operation == ADD)
				276	{
				277	for(int i = 0; i < n; i++)
				278	{
				279	total += p[i];
				280	}
				281	}
				282	else if(state.operation == SUBTRACT)
				283	{
				284	for(int i = 0; i < n; i++)
				285	{
				286	total -= p[i];
				287	}
				288	}
				289	else if(state.operation == AND)
				290	{
				291	for(int i = 0; i < n; i++)
				292	{
				293	total &= p[i];
				294	}
				295	}
				296	else if(...)
				297	{
				298	...
				299	}
				300
				301	return total;
				302	}
				303	```
				304
				305	But now there's a lot of repeated code. It could be made more manageable using macros or templates, but that doesn't help reduce the binary size of the statically compiled code. That's fine when there are only a handful of state conditions to specialize for, but when you have multiple state variables with many possible values each, the total number of combinations can be prohibitive.
				306
				307	This is especially the case when implementing APIs which offer a broad set of features but developers are likely to only use a select set. The quintessential example is graphics processing, where there are are long pipelines of optional operations and both fixed-function and programmable stages. Applications configure the state of these stages between each draw call.
				308
				309	With Reactor, we can write the code for such pipelines in a syntax that is as easy to read as a naive unoptimized implementation, while at the same time specializing the code for exactly the operations required by the pipeline configuration.