The earliest entry in the thesis-line: a toy DDPM that learns to generate a single training image — a 16 × 16 red square on a black background. Forward noising, learned reverse denoising, 100 timesteps, MLP denoiser. The point was not to generate anything useful; the point was to understand DDPM's noise schedule, the forward-Markov-chain → reverse-network coupling, and the role of the timestep embedding before using diffusion as a black box in any later work. Foundation for Topic 27 (JiT), Topic 25 (MNIST flow), Topic 24 (polyline diffusion).
Same motivation as Topic 13's Mini-LLM exercise, applied to diffusion. February 2025 was the first encounter with DDPM as a candidate generator family for the thesis line. Reading the paper once had communicated the broad strokes (forward-noise / reverse- denoise, Markov chain, schedule, MSE on noise prediction) but not the mechanics — the timestep embedding, the reparameterised forward closed form q(x_t | x₀), the variance scaling of the reverse step. The toy-prototype exercise filled that in.
The toy problem was the simplest possible non-trivial setup: a single training image, a 16 × 16 red square on a black background, encoded as 3 × 16 × 16 = 768 floats. Train a DDPM to reproduce this single image from noise. With one training image, the model cannot do anything except memorise — the success criterion is "the reverse pass produces something visually red-square-shaped when started from noise". The diagnostic value is in being able to inspect every step of the forward and reverse process by hand on a problem this small.
The denoiser is the smallest network that could plausibly work on 768-dim inputs: a 3-layer MLP with sinusoidal timestep embedding added to the first-layer activation.
| Setting | Value | Note |
|---|---|---|
| Denoiser parameters | ~3 M | Massively over-parameterised for one training image; the point is mechanics not efficiency |
| Diffusion timesteps | 100 | Less than the DDPM-paper 1 000; sufficient for a toy |
| β schedule | Linear, β₁ = 1×10⁻⁴ → β₁₀₀ = 0.02 | Standard DDPM linear schedule |
| Loss | MSE on predicted noise | ε-prediction (the standard DDPM choice) |
| Optimiser | AdamW, lr 1×10⁻³ | Aggressive — one training image, no overfit risk |
| Epochs | 200 | ~5 min wallclock on M2 Mac CPU |
| Sampling | DDPM ancestral sampling, 100 steps | From x_T ∼ 𝒩(0, I) back to x_0 |
After 200 epochs of training, the reverse pass produced a recognisable red square from noise. The trained model is not interesting as a generator — it has only ever seen one image, so it can only "generate" that image — but the diagnostic signal is that the forward and reverse processes match: noising the training image for 100 steps and then denoising it for 100 steps reconstructs the original red square within MSE < 0.01.
| Metric | Value | Note |
|---|---|---|
| Final training loss (ε-MSE) | ~0.012 | Converged stably; no instability spikes |
| Reverse-pass reconstruction MSE | < 0.01 | vs training image after 100-step ancestral sampling |
| Sampling wallclock | ~0.8 s on M2 CPU | 100 forward passes through the 3 M-param MLP |
| Visual sanity | Recognisable red square | The success criterion the toy was set up to test |
100 steps of noise. 100 steps of denoise.
The reverse pass is just the forward pass run with the network in the loop.
The mechanical insight DDPM papers under-communicate is that the forward and reverse processes are structurally identical — both are 100-step Markov chains, both apply a small per-step transition, both have closed-form variance scaling. The only difference is that the forward chain adds Gaussian noise with a fixed schedule and the reverse chain subtracts a learned-network-predicted estimate of that same noise. Watching this happen on a 16×16 image you can render and inspect at every step is what makes the architecture stop being mysterious.
Step through the forward and reverse Markov chain. The left pane is the clean red square (t = 0). The middle pane is the noisy image at the current step. The right pane is the reverse-pass denoised image at the same step. Slide through the 20-step trajectory to watch noise added (forward) and removed (reverse). Click the colour buttons to swap the target — red square, blue square, green square.
White paper · DDPM from first principles · single-image training · sampling-cost analysis