SwiftShader provides high-performance graphics rendering on the CPU. It eliminates the dependency on graphics hardware capabilities.
SwiftShader provides shared libraries (DLLs) which implement standardized graphics APIs. Applications already using these APIs thus don't require any changes to use SwiftShader. It can run entirely in user space, or as a driver (for Android), and output to either a frame buffer, a window, or an offscreen buffer.
To achieve exceptional performance, SwiftShader is built around two major optimizations that affect its architecture: dynamic code generation, and parallel processing. Generating code at run-time allows to eliminate code branches and optimizes register usage, specializing the processing routines for exactly the operations required by each draw call. Parallel processing means both utilizing the CPU's multiple cores and processing multiple elements accoss the width of the SIMD vector units.
Structurally there are four major layers:
The API layer is an implementation of a graphics API, such as OpenGL (ES) or Direct3D, on top of the Renderer interface. It is responsible for managing API-level resources and rendering state, as well as compiling high-level shaders to bytecode form.
The Renderer layer generates specialized processing routines for draw calls and coordinates the execution of rendering tasks. It defines the data structures used and how the processing is performed.
Reactor is an embedded language for C++ to dynamically generate code in a WYSIWYG fashion. It allows to specialize the processing routines for the state and shaders used by each draw call. Its syntax closely resembles C and shading languages, to make the code generation easily readable.
The JIT layer is a run-time compiler, such as LLVM's JIT, or Subzero. Reactor records its operations in an in-memory intermediate form which can be materialized by the JIT into a function which can be called directly.
To generate code for an expression such as float y = 1 - x;
directly with LLVM, we'd need code like Value *valueY = BinaryOperator::CreateSub(ConstantInt::get(Type::getInt32Ty(Context), 1), valueX, "y", basicBlock);
. This is very verbose and becomes hard to read for longer expressions. Using C++ operator overloading, Reactor simplifies this to Float y = 1 - x;
. Note that Reactor types have the same names as C types, but starting with a capital letter. Likewise If()
, Else
, and For(,,)
implement their C counterparts.
While making Reactor's syntax so similar to the C++ in which it is written might cause some confusion at first, it provides a powerful abstraction for code specialization. For example to produce the code for an addition or a subtraction, one could write x = addOrSub ? x + y : x - y;
. Note that only one operation ends up in the generated code.
We refer to the functions generated by Reactor code as Routines.
More details on Reactor can be found in Reactor.md.
The Renderer layer is implemented in three main parts: the VertexProcessor, SetupProcessor, and PixelProcessor. Each “processor” produces a corresponding Reactor routine, and manages the relevant graphics state. They also keep a cache of already generated routines, so that when a combination of states is encountered again it will reuse the routine that performs the desired processing.
The VertexRoutine produces a function for processing a batch of vertices. The fixed-function T&L pipeline is implemented by VertexPipeline, while programmable vertex processing with a shader is implemented by VertexProgram. Note that the vertex routine also performs vertex attribute reading, vertex caching, viewport transform, and clip flag calculation all in the same function.
The SetupRoutine performs primitive setup. This constitutes back-face culling, computing gradients, and rasterization.
The PixelRoutine takes a batch of primitives and performs per-pixel operations. The fixed-function texture stages and legacy integer shaders are implemented by PixelPipeline, while programmable pixel processing with a shader is implemented by PixelProgram. All other per-pixel operations such as the depth test, alpha test, stenciling, and alpha blending are also performed in the pixel routine. Together with the traversal of the pixels in QuadRasterizer, it forms one function.
The PixelProgram and VertexProgram share some common functionality in ShaderCore. Likewise, texture sampling is implemented by SamplerCore.
Aside from creating and managing the processing routines with the help of the Processor classes, the Renderer also subdivides and schedules rendering tasks onto multiple threads.
The OpenGL (ES) and EGL APIs are implemented in src/OpenGL/.
The GLSL compiler is implemented in src/OpenGL/compiler/. It uses Flex and Bison to tokenize and parse GLSL shader source. It produces an abstract syntax tree (AST), which is then traversed to output assembly-level instructions in OutputASM.cpp.
The EGL API is implemented in src/OpenGL/libEGL/. Its entry functions are listed in libEGL.def (for Windows) and exports.map (for Linux), and defined in main.cpp and implemented in libEGL.cpp. The Display, Surface, and Config classes are respective implementations of the abstract EGLDisplay, EGLSurface, and EGLConfig types.
OpenGL ES 1.1 is implemented in src/OpenGL/libGLES_CM/, while OpenGL ES 2.0 is implemented in src/OpenGL/libGLESv2/. Note that while OpenGL ES 3.0 functions are implemented in libGLESv3.cpp, it is compiled into the libGLESv2 library as standard among most implementations (some platforms have a libGLESv3 symbolically link to libGLESv2). We'll focus on OpenGL ES 2.0 in this documentation.
When the application calls an OpenGL function, it lands in the C entry functions at main.cpp. It then gets dispatched to libGLESv2.cpp functions in the es2 namespace. These functions obtain the thread‘s OpenGL context, and perform validation of the call’s parameters. Most functions then call a corresponding Context method to perform the call's main operations (changing state or queuing a draw task).