← Research Timeline Aditya Jain / Apple Maps · 3D Reconstruction
Nov 2025
Topic 24 Nov 2025 Diffusion · Flow Matching · Polylines · Bridge DSL

Diffusion for
Houdini Polylines.

Design study, not an implementation. Question: can the seq2seq transformer in PGN (Topic 40) be replaced by a diffusion or flow-matching model that operates directly on the polyline + attribute representation? This page documents the theory study (DDPM, LDM, flow matching), the Sparc3D / Sparcubes analysis as a related-but- different approach, and the architectural design questions that have to be answered before training. No training runs yet — the work here is the design specification.

00 — Motivation

Replace the PGN seq2seq head with a diffusion/FM head.

PGN (Topic 40, the September 2025 work) trains a transformer seq2seq model that maps a bridge polyline + per-segment semantic attribute string to an executable DSL program. The seq2seq head was appropriate for the 15-pair training corpus PGN had, but it has two structural limitations carried into the larger thesis line: (i) autoregressive decoding is slow at inference; (ii) the output is a discrete DSL token sequence, which means the model has to learn the executor's grammar perfectly rather than producing geometry it can directly evaluate.

A diffusion-style or flow-matching-style head sidesteps both. The output is a continuous geometric representation (polyline coordinates + attributes), the sampler runs in parallel over all positions rather than autoregressively, and the trained model is a denoiser over polyline configurations rather than a token-sequence generator. The design question Topic 24 sets up is which representation and which generator family works for variable-length polyline outputs.

The honest scope of this page: no training runs were completed. The work is theory study (DDPM, LDM, flow matching), comparative analysis of related 3-D generative approaches (Sparc3D, Sparcubes, FVDB), and architectural design questions that have to be answered before the first training run. The downstream consumer of these design decisions is the larger MambaFlow3D thesis line (Topic 26).

What it informs
The design study feeds three downstream decisions. (1) Whether to lift the polyline → DSL pipeline out of seq2seq entirely and replace it with a polyline-diffusion generator that emits geometry directly. (2) Whether to represent the polyline as a fixed maximum-length padded sequence (transformer/Mamba friendly) or a variable-length set (graph-NN friendly). (3) Whether the conditioning input (per-segment semantic attributes) enters via cross-attention, added embeddings, or a separate conditioning encoder. None of these are settled.
01 — Representation

Polyline as fixed-length padded tensor vs variable-length set.

A bridge polyline is a sequence of 3-D control points (x, y, z) plus per-segment semantic attributes (OPEN, CLOSED, RAILING, …). The PGN training corpus has polylines of 8–40 control points. The diffusion-on-polylines question is how to encode this as a fixed-shape tensor that a neural network can consume.

OptionEncodingProsCons
Padded fixed length Pad to N_max = 64 with sentinel + length mask Standard transformer/Mamba consumption · supported by every architecture Wasted capacity on short polylines · sentinel-value handling is fiddly
Variable-length set Point-cloud-style, treat polyline as set of (xyz, attr, position-index) No padding · scales naturally to long polylines Needs PointNet++ or graph-NN encoder · loses sequence order without positional encoding
Length-conditioned sampling Predict length L first, then sample (L × 3) polyline conditioned on L Sharp generation · no wasted capacity Two-stage sampler · L-prediction is itself a small generative problem

Working hypothesis: padded fixed length with explicit length mask, because it composes cleanly with the Mamba backbone already chosen for the MambaFlow3D thesis line and because the PGN corpus's polyline length distribution is tight enough (8–40 points) that the padding overhead is manageable. The length-conditioned option is the backup if the padded version fails on long polylines.

Pipeline

Proposed forward path for polyline diffusion.

Bridge polyline + per-segment attrs Pad to N_max=64 + attribute embed Diffusion / FM body Mamba ×N or ViT Denoised polyline (N_max × 3) + mask Houdini DSL eval Open questions: — Backbone choice (Mamba vs Transformer): inherits from Topic 25 result → likely Pure Mamba. — Conditioning input (attributes) entry point: cross-attn, added embeddings, or sep encoder. — Diffusion vs FM head: lean FM (matches Topic 25/26 architectural choice). — Loss formulation: continuous MSE on coordinates; attribute classification cross-entropy on tokens.
02 — Theory Study

DDPM → LDM → Flow Matching, building from PGN's transformer baseline.

Topic 24 sits inside a longer theory-study arc. The relevant predecessor work studied: DDPM [1] for the forward-noise / reverse-denoise formulation; LDM [2] for the latent-space diffusion idea and its VAE / U-Net split; flow matching [3] for the linear-interpolation formulation and the velocity-prediction target; and Sparc3D / Sparcubes [4] for a related-but-different problem (mesh reconstruction via deformable sparse marching cubes rather than direct polyline generation).

FormulationForward processTargetInference costVerdict for polylines
DDPM (pixel-space) Markov chain of Gaussian noise additions ε-prediction or x₀-prediction ~100–250 steps Workable but slow; high sampling-step count is the bottleneck
LDM Forward noising in latent space after VAE encoder Same as DDPM but in compressed space ~50–100 steps VAE compression buys speed but adds reconstruction artefacts at the polyline scale
Flow Matching (preferred) Linear interp x_t = (1−t) z₀ + t x₁ v-prediction (velocity field) ~20–50 steps Best fit — fewer sampling steps, no VAE, direct over polyline coordinates

The pre-decision: flow matching, matching the upstream choice in MambaFlow3D (Topic 26). The 20–50-step sampler composes with the Pure-Mamba backbone to make polyline generation interactive-rate on a consumer GPU — the single-image-to-3-D interactivity premise the larger thesis line cares about.

Core Insight

"Why not just train directly on meshes?"
Because meshes are hard for neural networks.

The variable-topology + irregular-connectivity + self-intersection problems that block direct mesh learning also partly apply to polylines — variable length is real, irregular semantic attribute placement is real. The Sparcubes deformable-vertex approach to meshes (deform vertices off a fixed grid) suggests an analogue for polylines: fix the maximum length, deform sentinel pad positions out of the sample space. The padding-with-mask decision in §01 is a more pragmatic version of that idea.

03 — Sparcubes Analogy

Deformable vertices for meshes ↔ deformable polyline control points.

Sparcubes [4] solves a different problem — mesh-to-watertight-mesh reconstruction — but its core technical move is informative. Each vertex of a fixed regular grid is allowed a learnable offset Δv so the surface can pass through positions the fixed grid cannot represent.

The polyline analogue: each control point of a fixed N_max-length polyline carries a learnable position offset, and the masking head decides which positions are "real" vs "padded". The architectural difference is that Sparcubes runs optimisation-by-rendering for the offsets, while a polyline diffusion model would learn the offsets through the velocity-prediction loss in flow matching. The conceptual similarity — fix the grid, deform off it — is the design inspiration.

The Sparcubes vs FVDB comparison is also relevant for the larger thesis line. FVDB is the general-purpose differentiable sparse voxel library; Sparcubes is one specific algorithm built on similar ideas. The polyline-diffusion design here is closer to the FVDB-as-substrate path — use Mamba blocks (linear-time, sequence- native) over the polyline tensor, rather than design a specialised deformable-polyline algorithm.

Interactive Demo · Live

Step a polyline through the flow trajectory. Pick a target bridge shape (arch, truss, beam) and a sampling step. The polyline starts as a random scatter and resolves toward the bridge silhouette as the step advances. Drag the output canvas to rotate the cloud.

01 — Random polyline · CLICK TO RE-SEED ARCH
02 — Attribute mask · 32 segments STEP 0 / 10
03 — Polyline at step t drag to rotate

Full Technical Paper

White paper · polyline-diffusion architecture specification · padded-with-mask + Pure-Mamba + flow matching · three open questions for first training run

Read Paper →
Related Thesis Chapters
PGN — Polyline → DSL seq2seq
The transformer baseline this design study targets to replace. PGN's autoregressive DSL-token decoding is the speed bottleneck the diffusion / FM head would lift.
MambaFlow3D — Sparse-Voxel Generation
The downstream architectural premise. Pure Mamba + FM is the substrate carried forward; the polyline-diffusion head would be a special case at 1-D polyline sequence length.
MNIST Flow-Matching Validation
The three-backbone validation that picked Pure Mamba over Transformer and Hybrid for flow-matching tasks. The same pick informs the polyline-diffusion backbone choice.
Appendix — Raw Materials
Transcripts & Source References
████████████████████████████████████████████████
███████████████████████████████████████

██████████████████████████████████████
█████████ · ████ · █████████████████████
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Restricted Access