arXiv Preprint · cs.GR · cs.LG · Oct 2025
Documentation → ← Back to White Papers
Graph Grammars for Automatic 3D Procedural Modeling: Implementing Merrell's Boundary String Method with Neural Rule Prediction
Aaditya Jain
Procedural Generation · Graph Grammars · Thesis Research, Unpublished Preprint
Submitted: October 2025 Subject: cs.GR · cs.LG Keywords: graph grammar, procedural generation, r-similarity, boundary string, rule extraction, 3D synthesis
Abstract
We implement Paul Merrell's 2023 example-based procedural modeling system using graph grammars, extending the original 2D formulation to 3D polygonal meshes. The method automatically extracts production rules from a single example shape by decomposing it into per-vertex primitives, encoding connectivity as boundary strings ∂G, and identifying repeated substring patterns as grammar rules. We implement the full pipeline from scratch: primitive extraction, branch-gluing and loop-gluing operations, boundary string computation, and frequency-based rule identification. On a standard chair mesh, the extractor correctly identifies 4-fold leg symmetry (rule frequency = 4) and bilateral back symmetry (frequency = 2) without any supervision. The 3D extension applies simplified boundary representations due to the combinatorial explosion of gluing operations in higher dimensions. Grammar-driven generation produces over 50 novel shapes locally r-similar to the input. We identify the non-differentiable executor as the primary barrier to end-to-end learning and propose a DiT-based neural rule predictor — using DINOv2 image features and flow matching over rule-selection sequences — as the natural next stage. Keywords: graph grammar, procedural modeling, r-similarity, boundary string, 3D generation, rule extraction.
1. Introduction

Procedural 3D generation in Maps requires producing geometrically valid assets — bridges, buildings, terrain features — from structured inputs like polylines and semantic boundary attributes. The dominant approach is direct neural generation: diffusion over meshes, SDF fields, or triplane representations. These methods produce plausible geometry but encode no structural knowledge of what makes a valid assembly. A bridge generated by mesh diffusion may look correct from a distance but contain degenerate topology, non-manifold faces, or physically implausible structural relationships.

Graph grammars offer a fundamentally different inductive bias. Rather than learning a smooth distribution over geometry, they learn the combinatorial rules that govern valid assemblies. A grammar extracted from a real bridge knows that piers support decks, that railings appear at closed boundaries, that span count scales with polyline length. These constraints are implicit in the grammar rather than injected as post-hoc regularisers.

Merrell's 2023 method [1] is, to our knowledge, the only published system that automatically extracts such grammars from example shapes without manual rule authoring. It handles the hardest structural case — shapes with closed cycles — that previous grammar methods explicitly avoided. This paper describes our implementation, our 3D extension, and the architectural path toward neural rule prediction.

2. Background: r-Similarity and Boundary Strings
2.1 r-Similarity

Two shapes are r-similar if every ball of radius r centred on one shape contains a neighbourhood that appears somewhere in the other. As r→0 this captures exact local topology. A grammar extracted from shape S can generate any shape that is r-similar to S — same local rules, arbitrarily different global form.

2.2 Boundary Strings

Merrell decomposes a shape graph G into primitives by cutting every edge into two directed half-edges — one primitive per vertex. The boundary ∂G of a graph encodes its open half-edges as a string of symbols, with turn angles tracking curvature. A complete graph has ∂G = "∧". Production rules are string transformations on ∂G.

Branch gluing: ∂G₁ + ∂G₂ → ∂G₃ = merge at complementary half-edge pair (a, ā) Loop gluing: "…a…ā…" → "…" requires ±360° turn between a and ā
INPUT MESH polygonal G EXTRACTION primitives + ∂G boundary strings RULE MINING freq ≥ 2 → rule symmetry detect GENERATION apply rules until ∂G = "∧" NEW SHAPES r-similar to G
Figure 1: Merrell grammar pipeline. Input mesh → primitive extraction → boundary string ∂G → rule mining (freq ≥ 2) → grammar-driven generation until ∂G = "∧". Output shapes are r-similar to input.
3. Implementation
3.1 2D Foundation

We begin with 2D shapes to validate the boundary string machinery. The extractor represents shapes as planar graphs with directed half-edges. Each vertex becomes a primitive with outgoing half-edges labelled by edge direction. Boundary computation follows Merrell Section 3.3, tracking open half-edges and cumulative turn angles.

3.2 3D Extension

In 3D, ∂G is a 2D surface graph — half-edges become half-faces. We implement a simplified version using edge-direction vectors discretised into 26 directions as the matching alphabet. Full 3D boundary (graph isomorphism on ∂G) is deferred: gluing combinations grow O(d!) with vertex degree d, making high-valence vertices intractable.

3.3 Symmetry Detection as Emergent Property

Frequency-based rule mining detects symmetry without any explicit symmetry analysis. For a chair mesh, the four leg-attachment subgraphs produce identical boundary strings — frequency = 4. The extractor identifies rule R_leg with multiplicity 4, encoding 4-fold leg symmetry automatically. n-fold rotational symmetry always produces rules with frequency exactly n.

Table 1: Grammar extraction results by shape category
ShapePrimitivesRulesMax freqSymmetry
Chair12744-fold legs, 2-fold back
Arch (cycle)534Loop-closing rule
Grid lattice946Translational H+V
Tree (no cycle)732Binary branching
4. The Neural Rule Predictor

Grammar extraction gives rule set R. Generation applies rules stochastically — unguided, no conditioning on image or semantic intent. The natural extension replaces stochastic selection with learned prediction: given an image, predict which rule to apply at each step.

We propose a DiT-based sequence model [4] trained on (image, rule-sequence) pairs. DINOv2 [2] encodes the image into 768-dimensional features. Flow matching [3] over a continuous relaxation of the discrete rule-selection space trains the sequence model. The grammar extractor serves as training data generator, not just an end system.

p(rₜ | r₁,...,rₜ₋₁, x_img) ≈ DiT(DINOv2(x_img), [r₁,...,rₜ₋₁])
5. Discussion & Conclusion

Grammar extraction works reliably for shapes with repetitive structure. Shapes without repeated substructure produce grammars with near-zero rule reuse — the method is a poor fit for freeform organic geometry. The non-differentiable executor gap (no gradient through rule-application) prevents end-to-end training; the neural predictor with black-box executor is the viable path.

This work feeds directly into SculptNet, which applies a structurally similar inductive bias to coarse-to-fine primitive assembly, and into building elevation reconstruction, where grammar-like constraints govern the 6-plane mesh executor. The architectural template — structured input → rule extraction → program → geometry — recurs across the entire thesis.

References
[1] Merrell, P. "Example-Based Procedural Modeling Using Graph Grammars." ACM Trans. Graph., 2023.
[2] Oquab, M. et al. "DINOv2: Learning Robust Visual Features without Supervision." TMLR, 2023.
[3] Lipman, Y. et al. "Flow Matching for Generative Modeling." ICLR, 2023.
[4] Peebles, W. & Xie, S. "Scalable Diffusion Models with Transformers (DiT)." ICCV, 2023.
[5] Chang, A. et al. "ShapeNet: An Information-Rich 3D Model Repository." shapenet.org
[6] Deitke, M. et al. "Objaverse: A Universe of Annotated 3D Objects." objaverse.allenai.org