Topic 6 Feb 2025 DDPM · Toy Prototype · Learning Exercise

Red-Square —
First Diffusion Model From Scratch.

The earliest entry in the thesis-line: a toy DDPM that learns to generate a single training image — a 16 × 16 red square on a black background. Forward noising, learned reverse denoising, 100 timesteps, MLP denoiser. The point was not to generate anything useful; the point was to understand DDPM's noise schedule, the forward-Markov-chain → reverse-network coupling, and the role of the timestep embedding before using diffusion as a black box in any later work. Foundation for Topic 27 (JiT), Topic 25 (MNIST flow), Topic 24 (polyline diffusion).

00 — Motivation

Understand DDPM before using it in 3-D work.

Same motivation as Topic 13's Mini-LLM exercise, applied to diffusion. February 2025 was the first encounter with DDPM as a candidate generator family for the thesis line. Reading the paper once had communicated the broad strokes (forward-noise / reverse- denoise, Markov chain, schedule, MSE on noise prediction) but not the mechanics — the timestep embedding, the reparameterised forward closed form q(x_t | x₀), the variance scaling of the reverse step. The toy-prototype exercise filled that in.

The toy problem was the simplest possible non-trivial setup: a single training image, a 16 × 16 red square on a black background, encoded as 3 × 16 × 16 = 768 floats. Train a DDPM to reproduce this single image from noise. With one training image, the model cannot do anything except memorise — the success criterion is "the reverse pass produces something visually red-square-shaped when started from noise". The diagnostic value is in being able to inspect every step of the forward and reverse process by hand on a problem this small.

What it informs

The understanding of DDPM mechanics from this exercise carries forward to every diffusion-family topic in the thesis line: Topic 27 (JiT — x-prediction DDPM at ImageNet-256), Topic 25 (MNIST flow matching — the alternative to DDPM motivated by sampling speed), Topic 24 (polyline diffusion design study — the comparative DDPM-vs-FM analysis). The "I have built one of these by hand" baseline made every subsequent paper read faster.

01 — Architecture

MLP denoiser, 100 timesteps, single training image.

The denoiser is the smallest network that could plausibly work on 768-dim inputs: a 3-layer MLP with sinusoidal timestep embedding added to the first-layer activation.

Input  : x_t ∈ ℝ^768          (flattened 3×16×16 image at step t)
       + t  ∈ ℤ ∩ [0, 100)    (timestep)
       ↓
TimeMLP   : t → 128-dim sinusoidal embedding → Linear 128 → 768
       ↓
Concat    : x_t + time_embed         (added, not concatenated)
       ↓
Layer 1   : Linear 768 → 1024  +  ReLU
Layer 2   : Linear 1024 → 1024 +  ReLU
Layer 3   : Linear 1024 → 768
       ↓
Output    : ε̂  (predicted noise, same shape as input)

Setting	Value	Note
Denoiser parameters	~3 M	Massively over-parameterised for one training image; the point is mechanics not efficiency
Diffusion timesteps	100	Less than the DDPM-paper 1 000; sufficient for a toy
β schedule	Linear, β₁ = 1×10⁻⁴ → β₁₀₀ = 0.02	Standard DDPM linear schedule
Loss	MSE on predicted noise	ε-prediction (the standard DDPM choice)
Optimiser	AdamW, lr 1×10⁻³	Aggressive — one training image, no overfit risk
Epochs	200	~5 min wallclock on M2 Mac CPU
Sampling	DDPM ancestral sampling, 100 steps	From `x_T ∼ 𝒩(0, I)` back to `x_0`

02 — Result

100 noisy steps in. 100 denoising steps out. Red square.

After 200 epochs of training, the reverse pass produced a recognisable red square from noise. The trained model is not interesting as a generator — it has only ever seen one image, so it can only "generate" that image — but the diagnostic signal is that the forward and reverse processes match: noising the training image for 100 steps and then denoising it for 100 steps reconstructs the original red square within MSE < 0.01.

Metric	Value	Note
Final training loss (ε-MSE)	~0.012	Converged stably; no instability spikes
Reverse-pass reconstruction MSE	< 0.01	vs training image after 100-step ancestral sampling
Sampling wallclock	~0.8 s on M2 CPU	100 forward passes through the 3 M-param MLP
Visual sanity	Recognisable red square	The success criterion the toy was set up to test

Core Insight

100 steps of noise. 100 steps of denoise.
The reverse pass is just the forward pass run with the network in the loop.

The mechanical insight DDPM papers under-communicate is that the forward and reverse processes are structurally identical — both are 100-step Markov chains, both apply a small per-step transition, both have closed-form variance scaling. The only difference is that the forward chain adds Gaussian noise with a fixed schedule and the reverse chain subtracts a learned-network-predicted estimate of that same noise. Watching this happen on a 16×16 image you can render and inspect at every step is what makes the architecture stop being mysterious.

Interactive Demo · Live

Step through the forward and reverse Markov chain. The left pane is the clean red square (t = 0). The middle pane is the noisy image at the current step. The right pane is the reverse-pass denoised image at the same step. Slide through the 20-step trajectory to watch noise added (forward) and removed (reverse). Click the colour buttons to swap the target — red square, blue square, green square.

01 — Clean target (t = 0) RED

02 — Forward x_t (noisy) STEP 0 / 20

03 — Reverse x̂_0 from x_t denoised back

Appendix — Raw Materials

Transcripts & Source References

████████████████████████████████████████████████
███████████████████████████████████████

01 — ██████████████████████████

██████████████████████████████████████

█████████ · ████ · █████████████████████

█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

Restricted Access

Red-Square — First Diffusion Model From Scratch.

Understand DDPM before using it in 3-D work.

MLP denoiser, 100 timesteps, single training image.

100 noisy steps in. 100 denoising steps out. Red square.

Interactive Demo · Live

Full Technical Paper

Red-Square —
First Diffusion Model From Scratch.