Concept · Rendering
Instanced Rendering
A FloraForge forest can hold thousands of trees, and every one of them is a detailed procedural mesh. Drawn naively — one command per tree — that forest would bury the CPU in overhead before the GPU broke a sweat. Instancing is the technique that collapses it all: describe the tree once, hand the GPU a list of where and how big, and let it stamp out the whole forest in a single command.
The draw-call problem
Every draw command an engine submits carries a fixed cost that has nothing to do with how much it draws. The CPU has to validate the call, record it into a command buffer, and the GPU's front end has to set up state before the first triangle moves. For one big mesh that overhead is noise. For ten thousand tiny meshes it is the workload: the CPU spends the frame dictating commands while the GPU finishes each little tree instantly and waits for the next order.
The waste is especially galling because the trees are nearly identical. Two oaks differ only in position, a turn around their trunk, and a bit of size — yet the naive approach re-submits the full mesh-drawing ceremony for each. What you want to send is the shape once and the differences many times.
One mesh, a list of differences
That is exactly what instanced rendering does. The mesh — vertices, normals, colours — is uploaded to GPU memory a single time, as a prototype. Alongside it goes an instance buffer: a tightly packed array with one small record per copy, holding only what makes that copy unique. One draw call then says "draw this mesh n times", and the GPU walks the record list itself, re-running the mesh through the vertex shader once per record.
draw_indexed call replays the mesh once per
record.
What an instance record holds
FloraForge's record is deliberately tiny. Here it is, verbatim — the struct that every tree, shrub, house and road sign in the world is positioned by:
#[repr(C)]
#[derive(Clone, Copy, Debug, Zeroable, Pod)]
pub struct InstanceData {
pub position: [f32; 3],
pub rotation_y: f32,
pub scale: [f32; 3],
pub tilt: f32,
pub color: [f32; 4],
}
A position in the world, a rotation around the vertical axis, a scale, a
lean angle, and an RGBA tint — 48 bytes per copy. The
tilt field (which doubles as padding so the colour starts on
a 16-byte boundary) leans dead snags a few degrees off vertical; living
plants leave it at zero. #[repr(C)] plus the Pod
derive guarantee the struct's bytes can be copied straight into a GPU
buffer with no translation. The tint is how a shrub billboard gets its
species' leaf colour — and how a dead plant gets its weathered grey-brown —
while the scale is where the growth stage shows up: seedlings are stamped
at 15% size, young plants at 50%, dead snags at 85% and shrinking, from
meshes of the same species.
Two streams into one shader
The GPU is told about both buffers through vertex buffer layouts
with different step modes: slot 0 (the mesh) advances per
vertex, slot 1 (the records) advances per
instance. From the shader's point of view they simply merge into
one input struct — the @location numbers carry on from the
mesh attributes into the instance attributes:
struct VertexInput {
// Per-vertex (slot 0)
@location(0) position: vec3<f32>,
@location(1) normal: vec3<f32>,
@location(2) vert_color: vec3<f32>,
// Per-instance (slot 1)
@location(3) inst_position: vec3<f32>,
@location(4) inst_rotation_y: f32,
@location(5) inst_scale: vec3<f32>,
@location(6) inst_color: vec4<f32>,
};
When the GPU invokes the vertex shader for vertex 412 of copy 87, it has already paired the right mesh vertex with the right instance record. The shader's only job is to apply the record — scale, then spin around Y, then move into place:
let c = cos(input.inst_rotation_y);
let s = sin(input.inst_rotation_y);
// Scale, then rotate around Y, then translate
let scaled = input.position * input.inst_scale;
let rotated = vec3<f32>(
scaled.x * c - scaled.z * s,
scaled.y,
scaled.x * s + scaled.z * c,
);
let world_pos = rotated + input.inst_position;
out.clip_position =
frame.view_proj_no_translation * vec4<f32>(world_pos - frame.camera_position.xyz, 1.0);
That last line hides a subtlety worth noticing: the position is made camera-relative before projection. FloraForge's world is huge, and 32-bit floats lose precision far from the origin — subtracting the camera first keeps distant trees from jittering. The camera matrix itself arrives via the per-frame uniform described in Bind Groups.
One draw per species per chunk
FloraForge has eight plant species — oak, birch, spruce, willow, acacia, palm, cattail, shrub — and each gets its own prototype mesh, built by the engine's procedural plant generator rather than loaded from an art file (houses are the exception: a single GLB model). Instance records are grouped per species, per chunk, so when a chunk streams out, its vegetation's instance buffer is simply dropped with it. The render loop is then almost embarrassingly short:
// Draw each species, batching by LOD level to minimise state changes
for (i, key) in self.species_names.iter().enumerate() {
// …partition this species' visible chunks into near and far…
// Draw near chunks with hi-res mesh
if let Some(mesh) = self.models.get(key) {
pass.set_vertex_buffer(0, mesh.vertex_buffer.slice(..));
pass.set_index_buffer(mesh.index_buffer.slice(..), wgpu::IndexFormat::Uint32);
for inst in &near {
pass.set_vertex_buffer(1, inst.instance_buffer.slice(..));
pass.draw_indexed(0..mesh.index_count, 0, 0..inst.instance_count);
}
}
// …far chunks repeat the loop with a low-detail LOD mesh…
}
Each draw_indexed call's final argument is the instance range
— "this mesh, instance_count times". A chunk with two hundred
oaks costs one call. The loop also folds in two classic companions of
instancing: chunks more than 512 metres away swap to a cheaper
LOD mesh of the same species (dead plants get a third
bucket — a bark-only snag mesh cheap enough to serve every distance), and
chunks outside the camera's view are skipped entirely by
frustum culling —
the subject of the next page.
color field. Same instancing pipeline, roughly 300–400×
fewer vertices per copy.