Design study, not an implementation. Question: can the seq2seq transformer in PGN (Topic 40) be replaced by a diffusion or flow-matching model that operates directly on the polyline + attribute representation? This page documents the theory study (DDPM, LDM, flow matching), the Sparc3D / Sparcubes analysis as a related-but- different approach, and the architectural design questions that have to be answered before training. No training runs yet — the work here is the design specification.
PGN (Topic 40, the September 2025 work) trains a transformer seq2seq model that maps a bridge polyline + per-segment semantic attribute string to an executable DSL program. The seq2seq head was appropriate for the 15-pair training corpus PGN had, but it has two structural limitations carried into the larger thesis line: (i) autoregressive decoding is slow at inference; (ii) the output is a discrete DSL token sequence, which means the model has to learn the executor's grammar perfectly rather than producing geometry it can directly evaluate.
A diffusion-style or flow-matching-style head sidesteps both. The output is a continuous geometric representation (polyline coordinates + attributes), the sampler runs in parallel over all positions rather than autoregressively, and the trained model is a denoiser over polyline configurations rather than a token-sequence generator. The design question Topic 24 sets up is which representation and which generator family works for variable-length polyline outputs.
The honest scope of this page: no training runs were completed. The work is theory study (DDPM, LDM, flow matching), comparative analysis of related 3-D generative approaches (Sparc3D, Sparcubes, FVDB), and architectural design questions that have to be answered before the first training run. The downstream consumer of these design decisions is the larger MambaFlow3D thesis line (Topic 26).
A bridge polyline is a sequence of 3-D control points (x, y, z)
plus per-segment semantic attributes (OPEN, CLOSED,
RAILING, …). The PGN training corpus has polylines of
8–40 control points. The diffusion-on-polylines question is how to
encode this as a fixed-shape tensor that a neural network can
consume.
| Option | Encoding | Pros | Cons |
|---|---|---|---|
| Padded fixed length | Pad to N_max = 64 with sentinel + length mask |
Standard transformer/Mamba consumption · supported by every architecture | Wasted capacity on short polylines · sentinel-value handling is fiddly |
| Variable-length set | Point-cloud-style, treat polyline as set of (xyz, attr, position-index) | No padding · scales naturally to long polylines | Needs PointNet++ or graph-NN encoder · loses sequence order without positional encoding |
| Length-conditioned sampling | Predict length L first, then sample (L × 3) polyline conditioned on L |
Sharp generation · no wasted capacity | Two-stage sampler · L-prediction is itself a small generative problem |
Working hypothesis: padded fixed length with explicit length mask, because it composes cleanly with the Mamba backbone already chosen for the MambaFlow3D thesis line and because the PGN corpus's polyline length distribution is tight enough (8–40 points) that the padding overhead is manageable. The length-conditioned option is the backup if the padded version fails on long polylines.
Topic 24 sits inside a longer theory-study arc. The relevant predecessor work studied: DDPM [1] for the forward-noise / reverse-denoise formulation; LDM [2] for the latent-space diffusion idea and its VAE / U-Net split; flow matching [3] for the linear-interpolation formulation and the velocity-prediction target; and Sparc3D / Sparcubes [4] for a related-but-different problem (mesh reconstruction via deformable sparse marching cubes rather than direct polyline generation).
| Formulation | Forward process | Target | Inference cost | Verdict for polylines |
|---|---|---|---|---|
| DDPM (pixel-space) | Markov chain of Gaussian noise additions | ε-prediction or x₀-prediction | ~100–250 steps | Workable but slow; high sampling-step count is the bottleneck |
| LDM | Forward noising in latent space after VAE encoder | Same as DDPM but in compressed space | ~50–100 steps | VAE compression buys speed but adds reconstruction artefacts at the polyline scale |
| Flow Matching (preferred) | Linear interp x_t = (1−t) z₀ + t x₁ |
v-prediction (velocity field) | ~20–50 steps | Best fit — fewer sampling steps, no VAE, direct over polyline coordinates |
The pre-decision: flow matching, matching the upstream choice in MambaFlow3D (Topic 26). The 20–50-step sampler composes with the Pure-Mamba backbone to make polyline generation interactive-rate on a consumer GPU — the single-image-to-3-D interactivity premise the larger thesis line cares about.
"Why not just train directly on meshes?"
Because meshes are hard for neural networks.
The variable-topology + irregular-connectivity + self-intersection problems that block direct mesh learning also partly apply to polylines — variable length is real, irregular semantic attribute placement is real. The Sparcubes deformable-vertex approach to meshes (deform vertices off a fixed grid) suggests an analogue for polylines: fix the maximum length, deform sentinel pad positions out of the sample space. The padding-with-mask decision in §01 is a more pragmatic version of that idea.
Sparcubes [4] solves a different
problem — mesh-to-watertight-mesh reconstruction — but its core
technical move is informative. Each vertex of a fixed regular grid
is allowed a learnable offset Δv so the surface can
pass through positions the fixed grid cannot represent.
The polyline analogue: each control point of a fixed
N_max-length polyline carries a learnable position
offset, and the masking head decides which positions are "real"
vs "padded". The architectural difference is that Sparcubes runs
optimisation-by-rendering for the offsets, while a polyline
diffusion model would learn the offsets through the
velocity-prediction loss in flow matching. The conceptual
similarity — fix the grid, deform off it — is the design
inspiration.
The Sparcubes vs FVDB comparison is also relevant for the larger thesis line. FVDB is the general-purpose differentiable sparse voxel library; Sparcubes is one specific algorithm built on similar ideas. The polyline-diffusion design here is closer to the FVDB-as-substrate path — use Mamba blocks (linear-time, sequence- native) over the polyline tensor, rather than design a specialised deformable-polyline algorithm.
Step a polyline through the flow trajectory. Pick a target bridge shape (arch, truss, beam) and a sampling step. The polyline starts as a random scatter and resolves toward the bridge silhouette as the step advances. Drag the output canvas to rotate the cloud.
White paper · polyline-diffusion architecture specification · padded-with-mask + Pure-Mamba + flow matching · three open questions for first training run