Design Specification · cs.LG · cs.GR · Nov 2025
Documentation → ← Back to White Papers
Diffusion Generator over Houdini Bridge Polylines: An Architecture Design Specification Lifting PGN's Seq2seq Head into Flow-Matching over a Padded-with-Mask Polyline Tensor
Aaditya Jain
Diffusion Models · Procedural Geometry · Thesis-Line Architecture Specification
Submitted: November 2025 Subject: cs.LG · cs.GR Keywords: polyline diffusion, flow matching, variable-length sequence, PGN replacement, bridge DSL generation, design study
Abstract
We specify a polyline-diffusion architecture intended to replace the autoregressive seq2seq DSL head in PGN [1]. The motivation is twofold: (i) the autoregressive token-by-token decoding in PGN's seq2seq head is slow at inference (each generated token requires a full forward pass); (ii) the discrete DSL-token output requires the model to learn the executor's grammar perfectly, which limits the training-data efficiency on PGN's small 15-pair corpus. The proposed alternative: a continuous-output flow-matching generator that operates directly over the polyline-coordinate-plus-attribute representation, with all sampling steps parallel-over-positions rather than autoregressive. We document the architectural decisions made: representation — padded fixed-length tensor with explicit length mask (working hypothesis, two alternatives — variable-length set, length-conditioned two-stage — also analysed); backbone — Pure-Mamba inheriting the Topic-25 MNIST validation decision [2]; diffusion target — flow-matching velocity prediction inheriting the manifold-hypothesis argument [3]; conditioning — attribute embeddings added per position, or cross-attention from a separate conditioning encoder. We compare DDPM, LDM, and flow matching as candidate diffusion families and pre-decide flow-matching on sampling-step-count grounds. No training runs are executed in this work — the contribution is the design specification and the open architectural questions enumerated for the first training run. Keywords: polyline diffusion, flow matching, padded sequence, design specification, PGN replacement.
1. Introduction

PGN [1] trains a seq2seq transformer that maps a bridge polyline + per-segment semantic attribute string to an executable DSL program. The architecture choice was appropriate for the 15-pair training corpus PGN had, but it carries two structural limitations to the larger thesis line: (i) autoregressive decoding is slow at inference; (ii) the discrete DSL-token output requires the model to learn the executor's grammar perfectly rather than producing geometry it can directly evaluate.

A diffusion-style or flow-matching-style head sidesteps both. The output is a continuous geometric representation (polyline coordinates + attributes), the sampler runs in parallel over all positions, and the trained model is a denoiser over polyline configurations. The design question this paper specifies is which representation and which generator family works for variable-length polyline outputs.

The honest scope: this is a design specification, not an implementation. No training runs are executed. The contribution is the documented design decisions and the open architectural questions enumerated for the first training run.

2. Representation

A bridge polyline is a sequence of 3-D control points (x, y, z) plus per-segment semantic attributes (OPEN, CLOSED, RAILING, …). The PGN corpus has polylines of 8–40 control points. Three encoding options are analysed:

Table 1 — Representation alternatives.
OptionEncodingProsCons
Padded fixed lengthPad to N_max = 64 with sentinel + length maskStandard transformer / Mamba consumptionWasted capacity on short polylines
Variable-length setTreat polyline as set of (xyz, attr, position-index)No padding; scales to long polylinesNeeds PointNet++ / GNN encoder; loses order without positional encoding
Length-conditioned samplingPredict length L first, then sample (L × 3) polyline conditioned on LSharp generation; no wasted capacityTwo-stage sampler; L-prediction itself is a small generative problem

Working hypothesis: padded fixed length with length mask. Reasons: composes cleanly with the Pure-Mamba backbone [2]; PGN corpus length distribution is tight (8–40 points) so padding overhead is manageable; the length-conditioned alternative is the backup if padded fails on long polylines.

3. Backbone

Pure-Mamba state-space backbone, inheriting the Topic-25 MNIST backbone-validation decision [2]. The validation found Pure-Mamba beats Pure-Transformer and Hybrid-Mamba+Attention on the speed-quality trade-off at the 196-token regime. Polyline-diffusion's N_max = 64 sits inside that regime; the Mamba block scales linearly in token count where the transformer's attention is quadratic, so the advantage holds.

4. Diffusion Family Choice
Table 2 — DDPM vs LDM vs Flow Matching for polyline output.
FamilySampling steps (quality)Manifold-aware target?Verdict
DDPM (pixel/polyline-space)~100–250ε-prediction (no)Workable but slow
LDM (latent-space)~50–100ε-prediction in latent (no)VAE compression buys speed but adds reconstruction artefacts at polyline scale
Flow Matching~20–50v-prediction (yes, closer to x-pred)Best fit

Pre-decision: flow matching, matching the upstream MambaFlow3D choice [4] and the x-prediction manifold-hypothesis analysis [3].

4.5 Flow-matching specifics for polylines

Flow matching defines a continuous-time flow from a base distribution p_0 (Gaussian noise over the polyline tensor) to the data distribution p_1 (the training-set bridge polylines). Linear interpolation gives the simplest flow:

x_t = (1 − t) · z + t · x_1, z ∼ 𝒩(0, I), x_1 ∼ p_data, t ∈ [0, 1]

The optimal velocity field is the constant v(x_t, t) = x_1 − z; the network v̂(x_t, t) is trained to predict this velocity from x_t and t:

L = E_{t, z, x_1} ‖v̂(x_t, t) − (x_1 − z)‖²

At sampling time the network's predicted velocity is integrated by Euler (or Heun, RK4) from t = 0 to t = 1:

x_{t + Δt} = x_t + Δt · v̂(x_t, t)

with 20–50 Euler steps producing high-quality samples. For the polyline-tensor shape (N_max, 4) (3 coordinates + 1 attribute embedding per position), the flow operates per-position, with the Pure-Mamba backbone providing the cross-position interaction.

5. Conditioning

Per-segment semantic attributes (OPEN, CLOSED, RAILING) are the conditioning input. Three entry points considered:

  • Added embeddings. Look up an attribute embedding per segment and add it to the polyline-coordinate embedding at the matching position. Simple; works only if the attribute's effect is local.
  • Cross-attention. Encode the attribute sequence via a small transformer; the Mamba backbone cross-attends. More expressive; cross-attention adds a quadratic cost the Mamba backbone is trying to avoid.
  • Separate conditioning encoder. A separate encoder produces a single conditioning vector that AdaLN-modulates every Mamba block. Stable Diffusion-class pattern.

Working hypothesis: added embeddings, with AdaLN-modulation as backup if added embeddings are insufficient.

6. Open Questions for First Training Run

Three concrete questions the first training run is intended to answer. (i) Does padded fixed-length with length-mask train cleanly, or does the sentinel-padding distort the loss landscape? (ii) Is added-embedding attribute conditioning expressive enough for the OPEN / CLOSED / RAILING distinction, or is cross-attention needed? (iii) Does the 20–50-step flow-matching sampler preserve polyline coherence (no kinks, no self-intersections), or are explicit geometric losses needed?

7. Conclusion

Polyline-diffusion is specified: padded fixed-length with length mask; Pure-Mamba backbone; flow-matching velocity prediction; added-embedding attribute conditioning. No training runs executed. The first training run's job is to answer the three open questions in §6.

References
[1] Jain, A. "PGN: A Transformer-Based Procedural Generator Network." Thesis research, Sep 2025. /whitepaper/pgn
[2] Jain, A. "MNIST Flow-Matching Backbone Validation." Thesis research, Nov 2025. /whitepaper/mnist-flow-validation
[3] Jain, A. "Manifold-Aware Diffusion Targets (x-Prediction Analysis)." Thesis research, Nov 2025. /whitepaper/x-prediction
[4] Jain, A. "MambaFlow3D." Thesis research, Nov 2025. /whitepaper/mambaflow3d
[5] Lipman, Y. et al. "Flow Matching for Generative Modelling." ICLR, 2023.
[6] Qi, C. R. et al. "PointNet++: Deep Hierarchical Feature Learning on Point Sets." NeurIPS, 2017. The set-encoder alternative to padded sequences.
[7] Ho, J., Jain, A., Abbeel, P. "Denoising Diffusion Probabilistic Models." NeurIPS, 2020.
[8] Gu, A., Dao, T. "Mamba: Linear-Time Sequence Modelling with Selective State Spaces." 2023.