Polyline Diffusion Design Study — White Paper

Diffusion Generator over Houdini Bridge Polylines: An Architecture Design Specification Lifting PGN's Seq2seq Head into Flow-Matching over a Padded-with-Mask Polyline Tensor

Aaditya Jain

ad_jain@icloud.com · orcid.org/0009-0005-5534-5641

Diffusion Models · Procedural Geometry · Thesis-Line Architecture Specification

Submitted: November 2025 Subject: cs.LG · cs.GR Keywords: polyline diffusion, flow matching, variable-length sequence, PGN replacement, bridge DSL generation, design study

Abstract

We specify a polyline-diffusion architecture intended to replace the autoregressive seq2seq DSL head in PGN [1]. The motivation is twofold: (i) the autoregressive token-by-token decoding in PGN's seq2seq head is slow at inference (each generated token requires a full forward pass); (ii) the discrete DSL-token output requires the model to learn the executor's grammar perfectly, which limits the training-data efficiency on PGN's small 15-pair corpus. The proposed alternative: a continuous-output flow-matching generator that operates directly over the polyline-coordinate-plus-attribute representation, with all sampling steps parallel-over-positions rather than autoregressive. We document the architectural decisions made: representation — padded fixed-length tensor with explicit length mask (working hypothesis, two alternatives — variable-length set, length-conditioned two-stage — also analysed); backbone — Pure-Mamba inheriting the Topic-25 MNIST validation decision [2]; diffusion target — flow-matching velocity prediction inheriting the manifold-hypothesis argument [3]; conditioning — attribute embeddings added per position, or cross-attention from a separate conditioning encoder. We compare DDPM, LDM, and flow matching as candidate diffusion families and pre-decide flow-matching on sampling-step-count grounds. No training runs are executed in this work — the contribution is the design specification and the open architectural questions enumerated for the first training run. Keywords: polyline diffusion, flow matching, padded sequence, design specification, PGN replacement.

1. Introduction

PGN [1] trains a seq2seq transformer that maps a bridge polyline + per-segment semantic attribute string to an executable DSL program. The architecture choice was appropriate for the 15-pair training corpus PGN had, but it carries two structural limitations to the larger thesis line: (i) autoregressive decoding is slow at inference; (ii) the discrete DSL-token output requires the model to learn the executor's grammar perfectly rather than producing geometry it can directly evaluate.

A diffusion-style or flow-matching-style head sidesteps both. The output is a continuous geometric representation (polyline coordinates + attributes), the sampler runs in parallel over all positions, and the trained model is a denoiser over polyline configurations. The design question this paper specifies is which representation and which generator family works for variable-length polyline outputs.

The honest scope: this is a design specification, not an implementation. No training runs are executed. The contribution is the documented design decisions and the open architectural questions enumerated for the first training run.

2. Representation

A bridge polyline is a sequence of 3-D control points (x, y, z) plus per-segment semantic attributes (OPEN, CLOSED, RAILING, …). The PGN corpus has polylines of 8–40 control points. Three encoding options are analysed:

Table 1 — Representation alternatives.
Option	Encoding	Pros	Cons
Padded fixed length	Pad to N_max = 64 with sentinel + length mask	Standard transformer / Mamba consumption	Wasted capacity on short polylines
Variable-length set	Treat polyline as set of (xyz, attr, position-index)	No padding; scales to long polylines	Needs PointNet++ / GNN encoder; loses order without positional encoding
Length-conditioned sampling	Predict length L first, then sample (L × 3) polyline conditioned on L	Sharp generation; no wasted capacity	Two-stage sampler; L-prediction itself is a small generative problem

Working hypothesis: padded fixed length with length mask. Reasons: composes cleanly with the Pure-Mamba backbone [2]; PGN corpus length distribution is tight (8–40 points) so padding overhead is manageable; the length-conditioned alternative is the backup if padded fails on long polylines.

3. Backbone

Pure-Mamba state-space backbone, inheriting the Topic-25 MNIST backbone-validation decision [2]. The validation found Pure-Mamba beats Pure-Transformer and Hybrid-Mamba+Attention on the speed-quality trade-off at the 196-token regime. Polyline-diffusion's N_max = 64 sits inside that regime; the Mamba block scales linearly in token count where the transformer's attention is quadratic, so the advantage holds.

4. Diffusion Family Choice

Table 2 — DDPM vs LDM vs Flow Matching for polyline output.
Family	Sampling steps (quality)	Manifold-aware target?	Verdict
DDPM (pixel/polyline-space)	~100–250	ε-prediction (no)	Workable but slow
LDM (latent-space)	~50–100	ε-prediction in latent (no)	VAE compression buys speed but adds reconstruction artefacts at polyline scale
Flow Matching	~20–50	v-prediction (yes, closer to x-pred)	Best fit

Pre-decision: flow matching, matching the upstream MambaFlow3D choice [4] and the x-prediction manifold-hypothesis analysis [3].

4.5 Flow-matching specifics for polylines

Flow matching defines a continuous-time flow from a base distribution p_0 (Gaussian noise over the polyline tensor) to the data distribution p_1 (the training-set bridge polylines). Linear interpolation gives the simplest flow:

x_t = (1 − t) · z + t · x_1, z ∼ 𝒩(0, I), x_1 ∼ p_data, t ∈ [0, 1]

The optimal velocity field is the constant v(x_t, t) = x_1 − z; the network v̂(x_t, t) is trained to predict this velocity from x_t and t:

L = E_{t, z, x_1} ‖v̂(x_t, t) − (x_1 − z)‖²

At sampling time the network's predicted velocity is integrated by Euler (or Heun, RK4) from t = 0 to t = 1:

x_{t + Δt} = x_t + Δt · v̂(x_t, t)

with 20–50 Euler steps producing high-quality samples. For the polyline-tensor shape (N_max, 4) (3 coordinates + 1 attribute embedding per position), the flow operates per-position, with the Pure-Mamba backbone providing the cross-position interaction.

5. Conditioning

Per-segment semantic attributes (OPEN, CLOSED, RAILING) are the conditioning input. Three entry points considered:

Added embeddings. Look up an attribute embedding per segment and add it to the polyline-coordinate embedding at the matching position. Simple; works only if the attribute's effect is local.
Cross-attention. Encode the attribute sequence via a small transformer; the Mamba backbone cross-attends. More expressive; cross-attention adds a quadratic cost the Mamba backbone is trying to avoid.
Separate conditioning encoder. A separate encoder produces a single conditioning vector that AdaLN-modulates every Mamba block. Stable Diffusion-class pattern.

Working hypothesis: added embeddings, with AdaLN-modulation as backup if added embeddings are insufficient.

6. Open Questions for First Training Run

Three concrete questions the first training run is intended to answer. (i) Does padded fixed-length with length-mask train cleanly, or does the sentinel-padding distort the loss landscape? (ii) Is added-embedding attribute conditioning expressive enough for the OPEN / CLOSED / RAILING distinction, or is cross-attention needed? (iii) Does the 20–50-step flow-matching sampler preserve polyline coherence (no kinks, no self-intersections), or are explicit geometric losses needed?

7. Conclusion

Polyline-diffusion is specified: padded fixed-length with length mask; Pure-Mamba backbone; flow-matching velocity prediction; added-embedding attribute conditioning. No training runs executed. The first training run's job is to answer the three open questions in §6.

References

[1] Jain, A. "PGN: A Transformer-Based Procedural Generator Network." Thesis research, Sep 2025. /whitepaper/pgn

[2] Jain, A. "MNIST Flow-Matching Backbone Validation." Thesis research, Nov 2025. /whitepaper/mnist-flow-validation

[3] Jain, A. "Manifold-Aware Diffusion Targets (x-Prediction Analysis)." Thesis research, Nov 2025. /whitepaper/x-prediction

[4] Jain, A. "MambaFlow3D." Thesis research, Nov 2025. /whitepaper/mambaflow3d

[5] Lipman, Y. et al. "Flow Matching for Generative Modelling." ICLR, 2023.

[6] Qi, C. R. et al. "PointNet++: Deep Hierarchical Feature Learning on Point Sets." NeurIPS, 2017. The set-encoder alternative to padded sequences.

[7] Ho, J., Jain, A., Abbeel, P. "Denoising Diffusion Probabilistic Models." NeurIPS, 2020.

[8] Gu, A., Dao, T. "Mamba: Linear-Time Sequence Modelling with Selective State Spaces." 2023.