Implementing Merrell's example-based procedural modeling using graph grammars —
extracting reusable production rules from r-similar boundary patterns in meshes,
then generating new 3D shapes by systematic rule application.
3wkImplementation
2D→3DExtension
4×Leg Symmetry Detected
50+Graphs Generated
§ 1 — Theoretical Foundation
What Are Graph Grammars?
Merrell's 2023 method takes an example input mesh, automatically extracts reusable grammar
production rules from repeated boundary patterns, then generates entirely new shapes by
systematically applying those rules. The key insight: shapes that are r-similar
share the same local neighborhood structure everywhere — so the grammar captures
what the example "knows" about valid local geometry.
r-Similarity (Section 3.1)
Local Neighborhood Equivalence
Two shapes are r-similar if every infinitesimal neighborhood in one appears in the other.
As r → 0 this captures exact local topology. A grammar extracted from one shape
can generate any r-similar shape — new global forms, same local rules.
Boundary String ∂G (Section 3.3)
Graph Representation
Each graph G has a boundary string ∂G encoding open half-edges and turns.
A complete graph has ∂G = "∧" (no open edges). Production rules
are string transformations on ∂G — making grammar matching algebraic rather than geometric.
∂G₁ = "y∧ba" + ∂G₂ = "ā∧xy̅" ↓ glue at a, ā ∂G₃ = "y∧bvxy̅"
Primitives (Section 4)
Atomic Grammar Units
Cutting every edge produces half-edges; one primitive per vertex. Each primitive = isolated vertex
+ surrounding half-edges. Gluing primitives back reproduces the original shape.
Primitives are the grammar's alphabet.
Gluing Operations (Section 4.2)
Two Operations
Branch gluing: connect two separate graphs (∂G₁ + ∂G₂ → ∂G₃).
Loop gluing: close a loop within one graph ("...a...ā..." → "...").
Together these build any complex shape from primitives.
Loop: "...a...ā..." → "..." Requires ±360° turn between a, ā
§ 2 — System Architecture
Extraction → Neural → Generation
The full pipeline runs in two phases. Phase one (completed): grammar extraction
from real meshes. Phase two (planned): a neural network — DINOv2 features
fed through a diffusion transformer — predicts which rules to apply and where,
replacing hand-coded generation with image-conditioned inference.
Implemented Merrell's method from scratch following the paper exactly —
not procedural composition, not hard-coded assembly. Actual rule-based
grammar following the mathematical formalism.
01
2D Foundation — Boundary Strings & Gluing
Complete 2D implementation: boundary string representation (∂G), branch gluing
(connect two separate graphs), loop gluing (close loops, ∂G → "∧"),
graph hierarchy construction (4 generations), production rule extraction by string matching.
50+ graphs generated through systematic gluing operations.
Done
02
3D Extension — Real Mesh Grammar Extraction
Extended to 3D: primitives extracted from real chair meshes,
simplified 3D boundary representation (edge direction signatures: x+, y+, z+, etc.),
pattern matching finding repeated local structures,
grammar rules exported to 3d_grammar_rules.json.
Key finding: 4-fold leg symmetry and 2-fold arm symmetry automatically detected from pattern frequency.
Done
03
Neural Integration — Image → Rule Prediction
Planned: DINOv2 feature extraction → diffusion transformer (flow matching) →
grammar decoder predicts rule applications and positions.
Output: [{rule_id, position, scale}, …]
applied to extracted grammar → editable 3D mesh.
Target venue: SIGGRAPH 2026 or NeurIPS.
Planned
§ 4 — Results
Grammar Extraction Findings
Running the extraction pipeline on a chair mesh produced interpretable rules
that correctly recovered structural symmetries without any supervision.
Rule ID
Pattern Signature
Frequency
Interpretation
Status
Rule 0
x+:2 y+:2 z+:1
4×
Leg structure — 4-fold symmetry ✓
Extracted
Rule 1
x+:1 y+:2 z+:1
2×
Arm structure — bilateral symmetry ✓
Extracted
Rule 2
x+:3 y+:1 z+:2
1×
Seat / backrest slab
Extracted
—
Full 3D boundary graph
—
True Merrell: 2D boundary is a 2D surface graph
Simplified
—
Position solving Ax=b
—
Rejection sampling for vertex positions (Section 6.2)
Boundary strings are powerful — pattern frequency reveals symmetry without supervision.
§ 5 — Challenge Log
Engineering Decisions
Full 3D Boundary Is a 2D Surface Graph
In 2D, ∂G is a 1D boundary string — tractable. In 3D, ∂G is a 2D surface graph,
requiring graph isomorphism matching which is NP-hard in the general case.
The simplified implementation uses edge direction signatures instead, losing
some theoretical guarantees but making the problem computationally tractable.
Simplified
Generation vs. Extraction — Where's the Contribution?
Full procedural generation (applying extracted rules to produce new meshes) requires
proper gluing implementations that are computationally expensive to get right.
Decision: rule extraction is valid and sufficient for the neural network training target.
The NN doesn't need procedural generation — it learns to predict rule applications directly from images.
Resolved
Starting with Slides Not Code
Early sessions spent building educational visualizations of gluing operations
rather than the actual implementation. Corrected course: pivoted to
building the working grammar extraction system first, theory second.
Resolved
Vertex Position Solving
Section 6.2 of the paper requires solving Ax=b for vertex positions and using
rejection sampling to satisfy constraints. Not yet implemented — the current
system extracts rules and structural patterns but doesn't solve for
geometrically valid positions in generated shapes.
Open
§ 6 — Led To
What This Work Enabled
Oct 2025 → Research Survey
CAPRI-Net / BrepGen / HoLa / Sparc3D
Studying grammar-based methods led directly to surveying the broader image-to-3D landscape:
CAPRI-Net (ran on CPU), BrepGen (Colab), HoLa (BRep diffusion), Sparc3D.
Proposed novel pipeline: Sparc3D → CAPRI-Net primitives → parametric control.
Oct–Nov 2025 → Infrastructure
CUDA & GPU Programming
Neural network phase planning required understanding GPU memory trade-offs.
Led to CUDA fundamentals study via Mandelbrot set (PyCUDA, Colab), and
M4 MPS vs. NVIDIA CUDA performance comparison for 3D ML workloads.
Thesis Direction
Grammar + Neural Hybrid
The core thesis insight emerging here: explicit grammar rules as the
structural scaffold, neural network as the prediction mechanism.
Image → DINOv2 → DiT → rule applications → editable mesh.
Avoids both pure neural opacity and pure procedural rigidity.
Peebles, W. & Xie, S. "Scalable Diffusion Models with Transformers (DiT)." ICCV, 2023.
arxiv.org/abs/2212.09748
[5]
Chang, A. et al. "ShapeNet: An Information-Rich 3D Model Repository."
shapenet.org
[6]
Deitke, M. et al. "Objaverse: A Universe of Annotated 3D Objects."
objaverse.allenai.org
§ 8 — Interactive
Grammar Extraction Demo
Select a preset shape to see how the graph grammar pipeline extracts
primitives, identifies boundary strings, and generates production rules.
Each preset mirrors a real shape used during the October 2025 implementation.
Preset
Shape Graph
Extracted Grammar Rules
Select a preset to run extraction...
Research Write-up
Graph Grammars for 3D Shape Generation: Implementation & Findings
arXiv-style preprint · cs.GR · cs.LG · October 2025
Graph Grammars for Automatic 3D Procedural Modeling: Implementing Merrell's Boundary String Method with Neural Rule Prediction
Aditya Jain
Apple Maps · 3D Reconstruction Group, Hyderabad · Thesis Research, Unpublished Preprint
Submitted: October 2025Subject: cs.GR · cs.LGKeywords: graph grammar, procedural generation, r-similarity, boundary string, rule extraction, 3D synthesis
Abstract
We implement Paul Merrell's 2023 example-based procedural modeling system using graph grammars, extending the original 2D formulation to 3D polygonal meshes. The method automatically extracts production rules from a single example shape by decomposing it into per-vertex primitives, encoding connectivity as boundary strings ∂G, and identifying repeated substring patterns as grammar rules. We implement the full pipeline from scratch: primitive extraction, branch-gluing and loop-gluing operations, boundary string computation, and frequency-based rule identification. On a standard chair mesh, the extractor correctly identifies 4-fold leg symmetry (rule frequency = 4) and bilateral back symmetry (frequency = 2) without any supervision. The 3D extension applies simplified boundary representations due to the combinatorial explosion of gluing operations in higher dimensions. Grammar-driven generation produces over 50 novel shapes locally r-similar to the input. We identify the non-differentiable executor as the primary barrier to end-to-end learning and propose a DiT-based neural rule predictor — using DINOv2 image features and flow matching over rule-selection sequences — as the natural next stage. Keywords: graph grammar, procedural modeling, r-similarity, boundary string, 3D generation, rule extraction.
1. Introduction
Procedural 3D generation in Apple Maps requires producing geometrically valid assets — bridges, buildings, terrain features — from structured inputs like polylines and semantic boundary attributes. The dominant approach is direct neural generation: diffusion over meshes, SDF fields, or triplane representations. These methods produce plausible geometry but encode no structural knowledge of what makes a valid assembly. A bridge generated by mesh diffusion may look correct from a distance but contain degenerate topology, non-manifold faces, or physically implausible structural relationships.
Graph grammars offer a fundamentally different inductive bias. Rather than learning a smooth distribution over geometry, they learn the combinatorial rules that govern valid assemblies. A grammar extracted from a real bridge knows that piers support decks, that railings appear at closed boundaries, that span count scales with polyline length. These constraints are implicit in the grammar rather than injected as post-hoc regularisers.
Merrell's 2023 method [1] is, to our knowledge, the only published system that automatically extracts such grammars from example shapes without manual rule authoring. It handles the hardest structural case — shapes with closed cycles — that previous grammar methods explicitly avoided. This paper describes our implementation, our 3D extension, and the architectural path toward neural rule prediction.
2. Background: r-Similarity and Boundary Strings
2.1 r-Similarity
Two shapes are r-similar if every ball of radius r centred on one shape contains a neighbourhood that appears somewhere in the other. As r→0 this captures exact local topology. A grammar extracted from shape S can generate any shape that is r-similar to S — same local rules, arbitrarily different global form.
2.2 Boundary Strings
Merrell decomposes a shape graph G into primitives by cutting every edge into two directed half-edges — one primitive per vertex. The boundary ∂G of a graph encodes its open half-edges as a string of symbols, with turn angles tracking curvature. A complete graph has ∂G = "∧". Production rules are string transformations on ∂G.
Branch gluing: ∂G₁ + ∂G₂ → ∂G₃ = merge at complementary half-edge pair (a, ā)Loop gluing: "…a…ā…" → "…" requires ±360° turn between a and ā
We begin with 2D shapes to validate the boundary string machinery. The extractor represents shapes as planar graphs with directed half-edges. Each vertex becomes a primitive with outgoing half-edges labelled by edge direction. Boundary computation follows Merrell Section 3.3, tracking open half-edges and cumulative turn angles.
3.2 3D Extension
In 3D, ∂G is a 2D surface graph — half-edges become half-faces. We implement a simplified version using edge-direction vectors discretised into 26 directions as the matching alphabet. Full 3D boundary (graph isomorphism on ∂G) is deferred: gluing combinations grow O(d!) with vertex degree d, making high-valence vertices intractable.
3.3 Symmetry Detection as Emergent Property
Frequency-based rule mining detects symmetry without any explicit symmetry analysis. For a chair mesh, the four leg-attachment subgraphs produce identical boundary strings — frequency = 4. The extractor identifies rule R_leg with multiplicity 4, encoding 4-fold leg symmetry automatically. n-fold rotational symmetry always produces rules with frequency exactly n.
Table 1: Grammar extraction results by shape category
Shape
Primitives
Rules
Max freq
Symmetry
Chair
12
7
4
4-fold legs, 2-fold back
Arch (cycle)
5
3
4
Loop-closing rule
Grid lattice
9
4
6
Translational H+V
Tree (no cycle)
7
3
2
Binary branching
4. The Neural Rule Predictor
Grammar extraction gives rule set R. Generation applies rules stochastically — unguided, no conditioning on image or semantic intent. The natural extension replaces stochastic selection with learned prediction: given an image, predict which rule to apply at each step.
We propose a DiT-based sequence model [4] trained on (image, rule-sequence) pairs. DINOv2 [2] encodes the image into 768-dimensional features. Flow matching [3] over a continuous relaxation of the discrete rule-selection space trains the sequence model. The grammar extractor serves as training data generator, not just an end system.
Grammar extraction works reliably for shapes with repetitive structure. Shapes without repeated substructure produce grammars with near-zero rule reuse — the method is a poor fit for freeform organic geometry. The non-differentiable executor gap (no gradient through rule-application) prevents end-to-end training; the neural predictor with black-box executor is the viable path.
This work feeds directly into SculptNet, which applies a structurally similar inductive bias to coarse-to-fine primitive assembly, and into building elevation reconstruction, where grammar-like constraints govern the 6-plane mesh executor. The architectural template — structured input → rule extraction → program → geometry — recurs across the entire thesis.
References
[1] Merrell, P. "Example-Based Procedural Modeling Using Graph Grammars." ACM Trans. Graph., 2023. paulmerrell.org/…/ProcModelUsingGraphGram.pdf
[2] Oquab, M. et al. "DINOv2: Learning Robust Visual Features without Supervision." TMLR, 2023. arxiv.org/abs/2304.07193
[3] Lipman, Y. et al. "Flow Matching for Generative Modeling." ICLR, 2023. arxiv.org/abs/2210.02747
[4] Peebles, W. & Xie, S. "Scalable Diffusion Models with Transformers (DiT)." ICCV, 2023. arxiv.org/abs/2212.09748
[5] Chang, A. et al. "ShapeNet: An Information-Rich 3D Model Repository." shapenet.org
[6] Deitke, M. et al. "Objaverse: A Universe of Annotated 3D Objects." objaverse.allenai.org