Particle Systems
Compute-driven particles
Why Particles
Fire, smoke, sparks, rain, explosions, magic effects—so much of what makes visuals feel alive comes from particle systems. Each particle is simple: a point with position, velocity, and a limited lifespan. But thousands of them, updated and rendered in parallel, create complex emergent behavior.
Particle systems are a natural fit for the GPU. Every particle is independent. Its update logic depends only on its own state and some global forces. This is embarrassingly parallel workload—exactly what compute shaders excel at.
Interactive: Particle System
Each particle stores position, velocity, lifetime, and appearance. Every frame, we update physics and cull dead particles.
Each slider controls a different aspect of the simulation. Notice how changing parameters affects the visual character: high gravity creates waterfalls, negative gravity makes rising smoke, wide spread angles create explosions.
Particles as Data
A particle is just a struct—a bundle of numbers that define its current state:
struct Particle {
position: vec2<f32>,
velocity: vec2<f32>,
lifetime: f32,
max_lifetime: f32,
size: f32,
color: vec4<f32>,
}On the GPU, particles live in a storage buffer. Each particle occupies a fixed number of bytes, laid out contiguously. Thread 0 processes particle 0, thread 1 processes particle 1, and so on.
The choice of what data to store depends on the effect you want. A simple fire might need only position, velocity, and lifetime. A complex magic effect might track rotation, scale animation curves, texture coordinates, and custom parameters.
Interactive: Particle Lifecycle
Each particle carries its own state through its lifecycle: spawn → active → fading → removed.
The lifecycle is straightforward: spawn with initial values, update each frame until lifetime reaches zero, then remove or recycle. The alpha value often derives from remaining lifetime—particles fade as they die.
The Compute Update
Each frame, a compute shader updates every particle. The pattern is simple but powerful:
@compute @workgroup_size(256)
fn update(@builtin(global_invocation_id) id: vec3<u32>) {
let index = id.x;
if (index >= particle_count) { return; }
var p = particles[index];
// Skip dead particles
if (p.lifetime <= 0.0) { return; }
// Physics integration
p.velocity += gravity * dt;
p.position += p.velocity * dt;
// Age the particle
p.lifetime -= dt;
// Write back
particles[index] = p;
}Interactive: Compute Update Steps
In a compute shader, thousands of threads execute this logic in parallel—one thread per particle.
The key insight is that each particle update is independent. No particle reads another particle's data. This means thousands of updates happen simultaneously with no synchronization needed.
For more complex physics—collisions with geometry, inter-particle forces, fluid-like behavior—the compute shader grows but the pattern remains: read particle state, compute new state, write back.
Spawning Particles
Emitters control where and how particles are born. A point emitter spawns all particles at a single location. A line emitter spreads them along a segment. A mesh emitter spawns from surface points.
Interactive: Emitter Configuration
Click and drag to emit particles. The emitter shape determines spawn positions, velocity mode determines initial directions.
Spawn logic runs either on the CPU (uploading new particles each frame) or on the GPU (using atomic counters to claim slots in the particle buffer). GPU-side spawning is faster for high spawn rates, but CPU-side is simpler and sufficient for many effects.
Initial velocity is critical to the particle's behavior. Directional velocity creates streams. Randomized velocity creates bursts. Outward velocity from the spawn point creates explosions. The emitter defines not just where particles appear but how they begin their journey.
Rendering Particles
Once updated, particles need to reach the screen. The simplest approach is point sprites—each particle is a single vertex that the rasterizer expands to a screen-aligned square. Fast, but limited: you cannot rotate the sprite or control its shape.
Billboard quads give more control. Each particle becomes four vertices forming a camera-facing rectangle. You can rotate the quad, apply texture coordinates, and vary the shape. The vertex shader positions the corners; the fragment shader applies the texture.
Interactive: Render Mode Comparison
Different rendering approaches trade off quality against performance. Point sprites are simplest, textured billboards look best.
Textured billboards look best. A soft circular texture with transparency creates the characteristic particle glow. Additive blending layers particles without hard edges—bright areas become brighter as particles overlap.
The rendering challenge is overdraw. A dense particle cloud might have dozens of particles per pixel, each requiring a texture sample and blend operation. Sorting particles back-to-front ensures correct transparency but costs performance. Often, additive blending lets you skip sorting entirely—the visual result is order-independent.
The Full Pipeline
A complete particle system runs two passes per frame:
The update pass is a compute dispatch. It reads the particle buffer, updates physics and lifetimes, and writes results. Dead particles are marked but not removed—removal would require compaction, which is expensive.
The render pass draws all particles. For point sprites, one draw call suffices. For billboards, you might use instancing: the instance ID indexes into the particle buffer to fetch position and appearance.
// Vertex shader for billboards
@vertex
fn vs(@builtin(instance_index) instance: u32,
@builtin(vertex_index) vertex: u32) -> VertexOutput {
let p = particles[instance];
// Skip dead particles by placing them off-screen
if (p.lifetime <= 0.0) {
return VertexOutput(vec4(-10.0, -10.0, -10.0, 1.0), vec2(0.0), 0.0);
}
// Billboard corner offsets
let corners = array<vec2<f32>, 4>(
vec2(-1.0, -1.0), vec2(1.0, -1.0),
vec2(-1.0, 1.0), vec2(1.0, 1.0)
);
let offset = corners[vertex] * p.size;
// Camera-facing orientation
let world_pos = vec3(p.position + offset, 0.0);
let clip_pos = view_proj * vec4(world_pos, 1.0);
let alpha = p.lifetime / p.max_lifetime;
return VertexOutput(clip_pos, (corners[vertex] + 1.0) * 0.5, alpha);
}The fragment shader samples the particle texture and applies the computed alpha. Additive blending combines overlapping particles into the final image.
Double buffering is essential when the compute pass both reads and writes particles. Use two buffers and swap them each frame: read from buffer A, write to buffer B, then flip. This avoids race conditions where a thread reads a particle that another thread just modified.
Key Takeaways
- Particles are independent data points—position, velocity, lifetime, appearance
- Compute shaders update thousands of particles in parallel each frame
- Emitters control spawn location, rate, and initial velocity
- Point sprites are fast; billboard quads offer more control; textured billboards look best
- The pipeline: compute update → render with instancing → additive blending
- Double buffering prevents read/write conflicts in the compute pass