Concept · Rendering
Bind Groups
A shader is a program marooned on another processor — it cannot see a single variable of the engine that launched it. Bind groups are how data reaches it: bundles of buffers and textures, plugged into numbered slots before each draw. How you carve those bundles up turns out to be one of the quiet performance decisions at the heart of every renderer, FloraForge included.
The two-processor problem
When the engine wants to draw a chunk of terrain, the shader doing the drawing needs to know things only the engine knows: where the camera is, what time of day it is, which way the sun points, what colour the fog should be. None of that can be passed as ordinary function arguments, because nobody calls the shader directly — the GPU does, thousands of times, on its own schedule, in its own memory. Everything the shader needs must be placed in GPU-visible resources ahead of time and wired to named slots in the shader's interface. WebGPU makes that wiring fully explicit, and the unit of wiring is the bind group.
Three kinds of resources
The things you can plug in come in a few flavours, each tuned for a different access pattern:
- Uniform buffers — small, fixed-size structs of constants, read-only and broadcast identically to every thread. The camera matrix and the clock live here. The GPU caches them aggressively precisely because they cannot change mid-draw.
- Storage buffers — big raw arrays the shader can index
freely and, if declared
read_write, write back to. This is what the terrain compute shader uses for its height inputs and vertex output. - Textures and samplers — images plus the recipe for reading them (filtering, wrapping, comparison). They travel as a pair: FloraForge's terrain binds its biome colour atlas with an ordinary sampler, and its shadow map with a special comparison sampler that does depth tests in hardware.
Bundles, and the contracts behind them
You don't bind these resources one at a time. WebGPU asks you to define a bind group layout first — a contract that says "slot 0 is a uniform buffer visible to the vertex and fragment stages, slot 1 is a texture…" — and then create bind groups: immutable bundles of actual resources that satisfy that contract. The split looks bureaucratic but is the source of the speed: because the layout is known when the pipeline is built, and the group is validated once when it's created, binding a group at draw time is nearly free — the expensive checking already happened. Here is FloraForge creating its per-frame group, layout first, bundle second:
let layout = device.create_bind_group_layout(&wgpu::BindGroupLayoutDescriptor {
label: Some("frame-bind-group-layout"),
entries: &[wgpu::BindGroupLayoutEntry {
binding: 0,
visibility: wgpu::ShaderStages::VERTEX_FRAGMENT,
ty: wgpu::BindingType::Buffer {
ty: wgpu::BufferBindingType::Uniform,
// …
},
count: None,
}],
});
// …create the uniform buffer holding a FrameUniform…
let bind_group = device.create_bind_group(&wgpu::BindGroupDescriptor {
label: Some("frame-bind-group"),
layout: &layout,
entries: &[wgpu::BindGroupEntry {
binding: 0,
resource: buffer.as_entire_binding(),
}],
});
The cost model: sort by how often it changes
A frame is a long sequence of draws, and between draws the engine swaps bind groups in and out of four numbered slots (WebGPU guarantees at least four). Each swap is cheap, but it isn't free, and the swaps add up across hundreds of draw calls. The classic answer is to sort your data by how often it changes and give each rate its own group. Things that are true for the whole frame — the camera, the clock — go in group 0, bound once and then left alone. Things that change per material — lighting and fog colours — go in group 1. Per-object textures sit higher still. The common case, "same camera, fifty different surfaces," then touches only the cheap, small, frequently-swapped groups while the stable ones stay plugged in.
FloraForge's split
Every material in the engine follows the same two-group convention.
Group 0 is per-frame: a single FrameUniform
struct holding the camera's view-projection matrices, the camera position,
a packed time vector (elapsed seconds, hour of day, underwater
factor) and the shadow parameters. Group 1 is per-material:
a MaterialUniform with the light direction, ambient level, fog
settings and the sun and sky colours that the day/night cycle recomputes as
it goes. On the shader side the declarations read like a mirror of that
design:
struct FrameUniform {
view_proj: mat4x4<f32>,
inv_view_proj_no_translation: mat4x4<f32>,
light_view_proj: mat4x4<f32>,
camera_position: vec4<f32>,
time: vec4<f32>,
shadow_params: vec4<f32>,
view_proj_no_translation: mat4x4<f32>,
};
struct MaterialUniform {
light_direction: vec4<f32>,
ambient: vec4<f32>,
fog_color: vec4<f32>,
fog_params: vec4<f32>,
sun_color: vec4<f32>,
sky_zenith: vec4<f32>,
sky_horizon: vec4<f32>,
};
@group(0) @binding(0) var<uniform> frame: FrameUniform;
@group(1) @binding(0) var<uniform> material: MaterialUniform;
@group(2) @binding(0) var terrain_atlas: texture_2d<f32>;
@group(2) @binding(1) var terrain_sampler: sampler;
@group(3) @binding(0) var shadow_map: texture_depth_2d;
@group(3) @binding(1) var shadow_sampler: sampler_comparison;
Those vec4 fields where a single float would do aren't waste —
they're alignment. Uniform buffers follow strict layout
rules (vectors land on 16-byte boundaries), and the engine's matching Rust
structs in src/renderer_wgpu/material.rs use the same padded
shapes so the bytes copied across line up exactly with what the shader
expects. The spare lanes get used, too: time packs three
different clocks into one slot.
The render loop in src/renderer_wgpu/world.rs then plays the
frequency game exactly as advertised. At the top of the frame the engine
writes a fresh FrameUniform into group 0's buffer; each pass
binds frame_bg at slot 0 and the relevant material at slot 1,
and the terrain, water and river passes all share the same material bundle —
one set_bind_group apiece, and on to the draw calls.
queue.write_buffer of 304 bytes refreshes the camera, clock and
shadows for every shader in the engine at once, because they all share the
same group 0 bundle.