← Research Timeline Aditya Jain / Apple Maps · 3D Reconstruction
Feb 2025
Topic 6 Feb 2025 DDPM · Toy Prototype · Learning Exercise

Red-Square —
First Diffusion Model From Scratch.

The earliest entry in the thesis-line: a toy DDPM that learns to generate a single training image — a 16 × 16 red square on a black background. Forward noising, learned reverse denoising, 100 timesteps, MLP denoiser. The point was not to generate anything useful; the point was to understand DDPM's noise schedule, the forward-Markov-chain → reverse-network coupling, and the role of the timestep embedding before using diffusion as a black box in any later work. Foundation for Topic 27 (JiT), Topic 25 (MNIST flow), Topic 24 (polyline diffusion).

00 — Motivation

Understand DDPM before using it in 3-D work.

Same motivation as Topic 13's Mini-LLM exercise, applied to diffusion. February 2025 was the first encounter with DDPM as a candidate generator family for the thesis line. Reading the paper once had communicated the broad strokes (forward-noise / reverse- denoise, Markov chain, schedule, MSE on noise prediction) but not the mechanics — the timestep embedding, the reparameterised forward closed form q(x_t | x₀), the variance scaling of the reverse step. The toy-prototype exercise filled that in.

The toy problem was the simplest possible non-trivial setup: a single training image, a 16 × 16 red square on a black background, encoded as 3 × 16 × 16 = 768 floats. Train a DDPM to reproduce this single image from noise. With one training image, the model cannot do anything except memorise — the success criterion is "the reverse pass produces something visually red-square-shaped when started from noise". The diagnostic value is in being able to inspect every step of the forward and reverse process by hand on a problem this small.

What it informs
The understanding of DDPM mechanics from this exercise carries forward to every diffusion-family topic in the thesis line: Topic 27 (JiT — x-prediction DDPM at ImageNet-256), Topic 25 (MNIST flow matching — the alternative to DDPM motivated by sampling speed), Topic 24 (polyline diffusion design study — the comparative DDPM-vs-FM analysis). The "I have built one of these by hand" baseline made every subsequent paper read faster.
01 — Architecture

MLP denoiser, 100 timesteps, single training image.

The denoiser is the smallest network that could plausibly work on 768-dim inputs: a 3-layer MLP with sinusoidal timestep embedding added to the first-layer activation.

Input : x_t ∈ ℝ^768 (flattened 3×16×16 image at step t) + t ∈ ℤ ∩ [0, 100) (timestep) ↓ TimeMLP : t → 128-dim sinusoidal embedding → Linear 128 → 768 ↓ Concat : x_t + time_embed (added, not concatenated) ↓ Layer 1 : Linear 768 → 1024 + ReLU Layer 2 : Linear 1024 → 1024 + ReLU Layer 3 : Linear 1024 → 768 ↓ Output : ε̂ (predicted noise, same shape as input)
SettingValueNote
Denoiser parameters~3 MMassively over-parameterised for one training image; the point is mechanics not efficiency
Diffusion timesteps100Less than the DDPM-paper 1 000; sufficient for a toy
β scheduleLinear, β₁ = 1×10⁻⁴ → β₁₀₀ = 0.02Standard DDPM linear schedule
LossMSE on predicted noiseε-prediction (the standard DDPM choice)
OptimiserAdamW, lr 1×10⁻³Aggressive — one training image, no overfit risk
Epochs200~5 min wallclock on M2 Mac CPU
SamplingDDPM ancestral sampling, 100 stepsFrom x_T ∼ 𝒩(0, I) back to x_0
02 — Result

100 noisy steps in. 100 denoising steps out. Red square.

After 200 epochs of training, the reverse pass produced a recognisable red square from noise. The trained model is not interesting as a generator — it has only ever seen one image, so it can only "generate" that image — but the diagnostic signal is that the forward and reverse processes match: noising the training image for 100 steps and then denoising it for 100 steps reconstructs the original red square within MSE < 0.01.

MetricValueNote
Final training loss (ε-MSE)~0.012Converged stably; no instability spikes
Reverse-pass reconstruction MSE< 0.01vs training image after 100-step ancestral sampling
Sampling wallclock~0.8 s on M2 CPU100 forward passes through the 3 M-param MLP
Visual sanityRecognisable red squareThe success criterion the toy was set up to test
Core Insight

100 steps of noise. 100 steps of denoise.
The reverse pass is just the forward pass run with the network in the loop.

The mechanical insight DDPM papers under-communicate is that the forward and reverse processes are structurally identical — both are 100-step Markov chains, both apply a small per-step transition, both have closed-form variance scaling. The only difference is that the forward chain adds Gaussian noise with a fixed schedule and the reverse chain subtracts a learned-network-predicted estimate of that same noise. Watching this happen on a 16×16 image you can render and inspect at every step is what makes the architecture stop being mysterious.

Interactive Demo · Live

Step through the forward and reverse Markov chain. The left pane is the clean red square (t = 0). The middle pane is the noisy image at the current step. The right pane is the reverse-pass denoised image at the same step. Slide through the 20-step trajectory to watch noise added (forward) and removed (reverse). Click the colour buttons to swap the target — red square, blue square, green square.

01 — Clean target (t = 0) RED
02 — Forward x_t (noisy) STEP 0 / 20
03 — Reverse x̂_0 from x_t denoised back

Full Technical Paper

White paper · DDPM from first principles · single-image training · sampling-cost analysis

Read Paper →
Related Thesis Chapters
JiT Diffusion — ImageNet-256
The grown-up version of this exercise. JiT is x-prediction DDPM at 86 M parameters on ImageNet-256; the mechanics learned here scale up directly.
MNIST Flow-Matching Validation
The alternative-to-DDPM topic. Flow matching is what you reach for after building a DDPM by hand and noticing the 100-step sampling cost.
Polyline-Diffusion Design Study
The DDPM-vs-flow-matching comparison in the polyline-diffusion design study uses the mechanics first understood here.
Appendix — Raw Materials
Transcripts & Source References
████████████████████████████████████████████████
███████████████████████████████████████

██████████████████████████████████████
█████████ · ████ · █████████████████████
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Restricted Access