The Armchair Engine is a relatively special purpose rendering engine, targeted towards planet-scale gameplay. The emphasis is placed on being able to move quickly and seamlessly from looking at the whole planet/ to flight-simulator level of detail - that is tens of metres altitude - and everything in between. Having said that, the engine is modular, and components such as the terrain module don't have to be included in a specific implementation, so it can be used as a general-purpose engine just as well.
Threads with purpose
The core concept of the engine is a collection of special purpose worker threads running asynchronously, based on demand. The rendering thread, which runs at vsync framerate (or whatever is possible if vsync is off) just traverses the scene graph and renders all the buffers that need to be rendered. All OpenGL state is batched automatically in order to minimise expensive state changes, and those are grouped by the relative cost of a state change; for instance, all "renderables" using a given shader are rendered together, to minimise costly shader switches. Within a given shader group, renderables are further grouped by their individual shader attributes, and so on with all remaining state changes.
This thread does not carry out any uploading to OpenGL whatsoever, and no game logic or physics, so it's very fast and predictable, with no stalls or slowdowns. Secondly, there is an asynchronous game logic thread, which runs at a lower frequency - that frequency can be set by whatever game implements the engine, but typical examples are 5hz or 10hz. This thread deals with all the game physics, AI, complex runtime calculations - all the heavy lifting in terms of game logic. This thread is the one that assembles what are referred to as "frames", and dispatches them to the renderer thread's queue.
A frame is basically a comprehensive temporal scene graph description, stating when it should begin and when it should end (as absolute timestamps), and all the entities it contains - just in terms of their 3D position and orientation, and their model buffer description. The buffer model description itself is a container structure describing in generic terms its vertex and index buffer IDs (if the buffer has already been uploaded to OpenGL), what shader it uses, what attributes that shader hooks to, etc.
At this point we get into what makes the Armchair Engine different from other game engines out there. The rendering thread always has a frame queued up from the game logic thread, so there's 1/10th of a second (or 1/5th of a second, or any other value the specific game chooses) delay from that thread completing a tick (and generating and passing the next frame), to rendering taking place. This is effectively a smoothing process, and the rendering thread (that runs at an arbitrary display framerate) simply interpolates between the previous frame and the next frame, for all entity positions, spherically interpolates ("slerps") all orientations, and interpolates all uniform attributes for the shaders. For some models, this also allows per-vertex interpolation without having to feed those in realtime to OpenGL: the logic thread can generate the start and end positions for each vertex in a model, and those are both uploaded to OpenGL along with a shader attribute for frame progress, allowing smooth movement of even very complex models.
Threads passing functions
However, here's where the complexity increases. The game logic thread doesn't do the uploading to OpenGL, either, as that would add unnecessary stalls and risk overrunning its timing past its alotted frequency, especially if heavy game physics or AI are in use. Instead, for each model the game wants to use in a future frame, it actually creates a function, dynamically, which itself would then generate the model - creating the general content for the vertex buffer using whatever process that particular game wants to use to do so. That function is passed to the OpenGL uploader queue and wakes the uploader thread, which is only woken on demand when there's one or more entry in the queue - it then executes the stored function, whose purpose it is to fill a vertex buffer.
It's worth noting that the Armchair Engine doesn't need to specify a single vertex format, or even be aware of the vertex format the game is choosing to use, or the particular shader the model is being uploaded to use with. The vertex format is described dynamically (avoiding polymorphism in this case) using a generic description structure, which can be passed around and decoded later by the rendering thread.
At this point the uploader thread uploads the resulting model to OpenGL on a second, hidden window's context, so it doesn't interfere with the rendering thread in any way, or cause any stalls. While this process is pretty fast, even for large buffers, we still want it to complete within 1/10th of a second in the worst case scenario; otherwise we risk missing the deadline for the next frame we're interpolating to, and we hold up the queue of other models to upload, which should be ready to go right away. As a result, we can't have long-running model generation logic, or loading from disk or from network, executing in that function. We can't afford to include such long-running processes in the game logic thread directly, either. So in special cases like these, we want to carry out long running tasks asynchronously again, separate from either of these threads - ideally with the use of a dedicated continuing thread, or thread pool, if the tasks are recurring ones.
One example of how this is done is the case of the procedural terrain generator, which is a long-running and intensive system; generating a chunk patch of terrain can take over a second depending on its complexity. Of course that cannot be done synchronously, but as we can predict well ahead of time when a piece of terrain will be needed, it's a perfect candidate for asynchronous generation. In this case, the terrain generator has its own model generator thread. This thread also first generates a function, whose job it is to in turn spit out another function to feed to the render uploader queue, as described above. This first function is passed to the model generator thread, which belongs to the planet generator module. The model generator thread is long-running and standalone, and woken only when it's wanted. The thread then executes the function we've passed to it asynchronously, and the result it spits out is just a minimal function which owns the terrain vertex description data. This is the function that's executed by the OpenGL uploader thread, which uploads it to the OpenGL vertex buffer as quickly as possible - and without holding this thread up to generate the model, which may have uploaded many other buffers while the planet generator was running meanwhile.
Lifetime-based garbage collection
Once a buffer has been uploaded to OpenGL by the uploader thread, it's given a lifetime. Every time the buffer is used in a renderable that gets rendered, that lifetime timer is reset, and the buffer is kept alive. The lifetime can be arbitrary, set by the game depending on the purpose of the buffer - realistic values are anything between a hundred milliseconds to tens of minutes. This is the first level of caching used by the Armchair Engine.
An intermittent garbage collector runs on those buffers, and reaps those that have passed their expiry lifetime without being used. Alternatively, we may start to reap buffers if we begin to run out of OpenGL buffer slots; the number of slots is dependent on the graphics hardware, and determined at runtime. In this situation, the garbage collector will become more aggressive, and begin to reap those buffers with the longest idle lifetimes first - i.e., those buffers that have not been rendered to the user's screen for the longest time.
Earlier we mentioned the buffer cache as the first level of caching used by the Armchair Engine. The planet terrain generator also has two further levels of caching, to improve efficiency through the whole pipeline.
Cache level 1
First, in the terrain generator thread, the generator creates a group of chunks which together form a chunk patch. Each chunk in the chunk patch is generated asynchronously, on a thread pool the size of which depends on the number of threads (virtual cores) available to the running system at the time. We use std::async and std::future to implement this.
Cache level 2
Normally in the process of generating a chunk, the generator will query a number of points - say 100,000 for example - distributed pseudo-randomly (but in a repeatable fashion) on the surface. It will then fetch the properties for each point, by calling a special terrain point info callback function. This callback is set by the game in a way specific to that game, and contains the noise functions and any other logic - often quite complex - required to generate the terrain procedurally, one point at a time. The generator then carries out a delaunay triangulation on the resulting points, using the sweep-line algorithm, to generate the optimal terrain mesh for that chunk. The resulting mesh is cached; this is the second level of caching.
After that, the async thread pool is reaped, the resulting meshes are merged and the generator produces a single triangle list with properties for each triangle. But before trying to generate a chunk, each async future checks a cache map using a hash of its corner coordinates as a key, to see if we have a cache available of its resultant triangle data; if so, it just pulls that out of the cache map instead of generating it. The reap and merge then proceed as normal, regardless of partial cache hit, total cache hit or total cache miss.
Another garbage collector runs on these triangle caches, which also have an expiry lifetime - typically a couple of minutes. Again, there is active resource monitoring, as well - in this case the engine monitors system memory usage (each chunk can be 200MB to 500MB, so they can fill RAM fast), and will reap the oldest ones if it looks like we're close to exhausting physical system memory. The more RAM the player has, the more they can cache, and the faster regenerating terrain will be.
Cache level 3
Finally, the conglomerate chunk patch itself is cached; if a player returns to the same area that the same chunk patch would centre on it, within the lifetime of that, then we can simply call up the OpenGL buffer directly (assuming it hasn't expired). This third level of caching can avoid dispatch to each individual chunk generator altogether, and skip the mesh merging stage too - which can save a hundred or so milliseconds from each chunk generation again.
There is an additional level of complexity in the way the Armchair Engine handles models and rendering, because each model can have a "fallback". In fact, a fallback is just a reference ID for another model to be rendered in place of the first - selected if the buffer isn't available, for whatever reason, at the time of rendering. Perhaps it has taken too long to generate, or is still being loaded from disk or network resource, and we need a stand-in. Maybe the game logic thread has just increased the level of detail of a model, and the optimal path is to continue rendering the lower level of detail model in its place until the higher level of detail model buffer is available to render, and prioritise rendering the higher detail renderable whenever it's available, once the asynchronous loader thread has finished loading it. In this case, the fallback mechanism works optimally - but it's also much more powerful than that. Each fallback is actually a list, or queue, of models to fall back to - and each of those models can also have a queue of fallbacks of arbitrary size, and each of those so on in turn; allowing for very complex render priority hierarchies, if desired.
The render thread has a configurable limit to how many fallback steps it'll follow, before falling through, and just not rendering that object at all - so regardless of the complexity of this fallback graph, there's a firmly bounded maximum stall time.
In summary, the main innovative features of the Armchair Engine to date are:
- The ability to summon up huge complex procedurally generated models at runtime without causing any framerate reduction, switch between them and fall back instantly and seamlessly without any blanks or loading gaps whatsoever.
- The ability to process complex game logic and physics without slowing down the render thread in any way.
- The ability to queue up massive numbers of dynamically moving objects to be smoothly interpolated between game logic ticks, without having to do that computation every frame, while still rendering smooth motion - including per-vertex animation - at any arbitrary framerate.
- The ability to smoothly interpolate between all properties, including shader uniforms and even vertex positions and colours, allowing fully dynamic models which are generated procedurally at runtime, but still don't take up any CPU time on the renderer thread.