By October 2025 the thesis line had committed to procedural-modelling-aware ML [6,7] as its long-term direction. The obvious question before further commitment was whether the target capability — image input, parametric output — was already published. The five papers surveyed here are the strongest representatives of the two adjacent capabilities at the time.
The survey methodology was end-to-end: read the paper, clone the GitHub repository (when available), attempt local inference on accessible hardware (Intel iMac for CPU work; rented RTX 3060 via Vast.ai for GPU work; Google Colab T4 as a fallback). The local-inference attempt distinguishes "I read the paper" from "I understand the failure modes" — the former is fast and shallow; the latter is slower but produces the field-gap identification this survey turns on.
| Paper | Approach | Input | Output | Parametric? |
|---|---|---|---|---|
| CAPRI-Net [1] | CSG primitive composition via learned program | Point cloud | CSG program | Yes |
| BrepGen [2] | Diffusion on B-rep graph | Latent code | B-rep CAD model | Yes |
| HoLa [3] | Hierarchical learned B-rep with topology | Latent code | B-rep with full topology | Yes |
| SparC3D [4] | Sparse-cube transformer over voxel tokens | Single image | Sparse-voxel mesh | No |
| TRELLIS [5] | Structured latent + dual decoder | Single image | Sparse voxels + Gaussian splats | No |
CAPRI-Net takes a 3-D point cloud as input and produces a CSG program: a sequence of primitive instantiations (cuboids, cylinders, spheres) with parameters and a sequence of boolean operations that combine them into the target shape. The learning problem is decomposed into two stages: a primitive-predictor network predicts the parameters of each primitive; a sequence-predictor network predicts the order and operation type for combining them. The output is a fully parametric CAD-class program that can be edited or re-evaluated at different parameters. Limitations: requires a clean point cloud (no image input); the primitive vocabulary is fixed at three classes.
BrepGen operates on boundary-representation (B-rep) graphs — the standard format for engineering CAD. A B-rep is a graph of faces, edges, and vertices with topological constraints; BrepGen trains a diffusion model over this graph structure, generating B-rep graphs from a latent-code conditioning. The output is a fully parametric, edit-friendly CAD model. Limitations: input is a latent code (not an image); the diffusion is over a non-Euclidean graph space, which adds architectural complexity.
HoLa extends BrepGen with hierarchical topology generation: the output B-rep includes the full topological structure (which faces share which edges, which edges share which vertices) rather than just the local-geometry parameters. The hierarchical decomposition gives better topological coherence on complex models. Same image-input limitation as BrepGen.
SparC3D takes a single image as input and produces a sparse-voxel 3-D mesh as output. The image is encoded via a Vision Transformer; the resulting tokens are decoded by a sparse-cube transformer into ~100–200 active voxels per scene, each carrying SDF and deformation values; marching cubes extracts a final mesh. The output is high-fidelity reconstruction but completely non-parametric — the user gets a mesh, not editable parameters. The paper's central technical contribution is the Sparcubes representation (deformable vertex offsets on top of a sparse grid).
TRELLIS uses a structured-latent encoder + dual decoder architecture. The image-input is encoded to a structured latent (combination of sparse-voxel features and Gaussian-splat parameters); two parallel decoders produce sparse voxels (for geometric fidelity) and Gaussian splats (for photo-realistic rendering). The dual output is the closest competitor in scope to the thesis-line target — image input + multiple output representations — but the outputs are still non-parametric raw geometry.
The structural pattern visible at the table level — three papers with parametric output, two papers with image input, zero papers with both — is the survey's central finding. The bifurcation is not coincidental; it reflects a missing encoder-and-output-head architectural combination that no surveyed paper has shipped.
Plot the five papers on a 2-D axis (X = image-input capability, Y = parametric-output capability). The procedural-CAD cluster (CAPRI-Net, BrepGen, HoLa) sits at high Y, low X. The neural-reconstruction cluster (SparC3D, TRELLIS) sits at high X, low Y. The high-X, high-Y quadrant — image input and parametric output — is empty.
The empty quadrant is not empty by accident. The procedural-CAD methods all take inputs that are themselves abstract (latent codes, point clouds with extracted features) because going from image pixels to a parametric program requires a non-trivial encoder stage that no surveyed paper has shipped. The neural-reconstruction methods all produce raw geometry because there is no widely-deployed encoder that produces parametric output. The empty quadrant is the missing encoder-and-output-head combination.
This is the thesis-line opportunity. PGN [6] targets the gap with polyline input (a half-step from image input, since polylines are extractable from images) and DSL output (parametric). SculptNet [7] targets the gap with image input and primitive-program output (parametric). Both are the consequence of the survey's gap identification.
| Paper | Hardware tried | Outcome |
|---|---|---|
| CAPRI-Net | Intel iMac → GPU Jupyter notebook | Succeeded after dependency-pin fixes (numpy / scipy / torch versions) |
| BrepGen | Google Colab T4 | Succeeded after CUDA-version compatibility fix (paper targets 11.x; Colab ships 12.x) |
| HoLa | Intel iMac CPU | Too slow on CPU; queued for RTX 3060 retry |
| SparC3D | Hugging Face Space (cloud) | Succeeded — used the HF demo rather than local install |
| TRELLIS | Microsoft research release · A100 required | Read-only — local hardware insufficient |
The pattern: three of four locally-attempted papers required dependency-pin fixes to run. The defensive habit recorded as a feedback memory: always create a fresh conda env from the paper's own yaml, never reuse an existing env. A second pattern: Hugging Face Spaces are an underused survey tool — for SparC3D the HF demo provided sufficient understanding without the weeks of engineering a local install would have cost.
Three honest limitations. (i) Small sample. Five papers is a representative selection, not exhaustive. The closest competitors not surveyed — InstantMesh, GS-LRM, Gamba — are covered in the G-Splats-vs-VDB follow-up survey [9] and do not change the gap finding. (ii) Snapshot in time. The survey is dated October 2025. A bridge paper between the two clusters could appear at any time; the gap claim is timestamped. (iii) Maps bias. The field-gap-as-opportunity framing reflects the thesis line's specific use case (Houdini-integrable parametric output). For a use case where raw-geometry output is acceptable, the SparC3D / TRELLIS branch is already sufficient.
Five 3-D-generation methods were surveyed. The field bifurcates into image-input + raw-geometry-output and latent-input + parametric-output clusters; the image-input + parametric-output quadrant is empty. This quadrant is the thesis-line opportunity, operationalised by PGN and SculptNet.