Field-Gap Survey — White Paper

Image-to-3-D With Parametric Output: A Field-Gap Survey of Five 3-D-Generation Methods (CAPRI-Net, BrepGen, HoLa, SparC3D, TRELLIS) and the Thesis-Line Opportunity

Aaditya Jain

ad_jain@icloud.com · orcid.org/0009-0005-5534-5641

3-D Generation · Field Survey · Thesis-Line Scope Decision

Submitted: October 2025 Subject: cs.GR · cs.CV · cs.LG Keywords: 3-D generation survey, CAPRI-Net, BrepGen, HoLa, SparC3D, TRELLIS, parametric CAD generation, image-to-3-D, field-gap

Abstract

We survey five representative 3-D-generation methods — CAPRI-Net [1], BrepGen [2], HoLa [3], SparC3D [4], and TRELLIS [5] — and identify a structural field gap: no published method combines image input with parametric (procedural) output. The field bifurcates cleanly. Neural-reconstruction methods (SparC3D, TRELLIS) take images as input and produce raw geometry (sparse voxels, Gaussian splats, meshes) with no parametric interface. Procedural-CAD methods (CAPRI-Net, BrepGen, HoLa) take latent codes or point clouds as input and produce parametric CAD output (CSG programs, B-rep models) but cannot accept image input. The intersection — image input and parametric output — is empty. This intersection is the capability the Maps procedural-modelling thesis line needs most: an image of a bridge / building / venue, producing a parametric description that integrates with the existing Houdini-based procedural pipeline. The contribution of this survey is the field-gap identification and its operationalisation in the subsequent thesis line as the scope of PGN [6] (polyline input → DSL output, the first concrete entry into the gap) and SculptNet [7] (image input → primitive-program output, the second entry). For each surveyed method we report the architecture, the local-inference attempt outcome on Intel iMac / RTX 3060 / Google Colab T4 hardware, and the dependency-pinning issues encountered. The local-inference report is intended as a practical companion to the paper-reading survey for researchers attempting reproductions on consumer hardware. Keywords: 3-D generation survey, field-gap analysis, image-to-3-D, parametric CAD, procedural modelling, thesis-line scope.

1. Introduction

By October 2025 the thesis line had committed to procedural-modelling-aware ML [6,7] as its long-term direction. The obvious question before further commitment was whether the target capability — image input, parametric output — was already published. The five papers surveyed here are the strongest representatives of the two adjacent capabilities at the time.

The survey methodology was end-to-end: read the paper, clone the GitHub repository (when available), attempt local inference on accessible hardware (Intel iMac for CPU work; rented RTX 3060 via Vast.ai for GPU work; Google Colab T4 as a fallback). The local-inference attempt distinguishes "I read the paper" from "I understand the failure modes" — the former is fast and shallow; the latter is slower but produces the field-gap identification this survey turns on.

2. Per-Paper Summary

Table 1 — Five papers surveyed.
Paper	Approach	Input	Output	Parametric?
CAPRI-Net [1]	CSG primitive composition via learned program	Point cloud	CSG program	Yes
BrepGen [2]	Diffusion on B-rep graph	Latent code	B-rep CAD model	Yes
HoLa [3]	Hierarchical learned B-rep with topology	Latent code	B-rep with full topology	Yes
SparC3D [4]	Sparse-cube transformer over voxel tokens	Single image	Sparse-voxel mesh	No
TRELLIS [5]	Structured latent + dual decoder	Single image	Sparse voxels + Gaussian splats	No

2.1 CAPRI-Net (Yu et al., CVPR 2022)

CAPRI-Net takes a 3-D point cloud as input and produces a CSG program: a sequence of primitive instantiations (cuboids, cylinders, spheres) with parameters and a sequence of boolean operations that combine them into the target shape. The learning problem is decomposed into two stages: a primitive-predictor network predicts the parameters of each primitive; a sequence-predictor network predicts the order and operation type for combining them. The output is a fully parametric CAD-class program that can be edited or re-evaluated at different parameters. Limitations: requires a clean point cloud (no image input); the primitive vocabulary is fixed at three classes.

2.2 BrepGen (2024)

BrepGen operates on boundary-representation (B-rep) graphs — the standard format for engineering CAD. A B-rep is a graph of faces, edges, and vertices with topological constraints; BrepGen trains a diffusion model over this graph structure, generating B-rep graphs from a latent-code conditioning. The output is a fully parametric, edit-friendly CAD model. Limitations: input is a latent code (not an image); the diffusion is over a non-Euclidean graph space, which adds architectural complexity.

2.3 HoLa (HolaBRep, 2024)

HoLa extends BrepGen with hierarchical topology generation: the output B-rep includes the full topological structure (which faces share which edges, which edges share which vertices) rather than just the local-geometry parameters. The hierarchical decomposition gives better topological coherence on complex models. Same image-input limitation as BrepGen.

2.4 SparC3D (2024)

SparC3D takes a single image as input and produces a sparse-voxel 3-D mesh as output. The image is encoded via a Vision Transformer; the resulting tokens are decoded by a sparse-cube transformer into ~100–200 active voxels per scene, each carrying SDF and deformation values; marching cubes extracts a final mesh. The output is high-fidelity reconstruction but completely non-parametric — the user gets a mesh, not editable parameters. The paper's central technical contribution is the Sparcubes representation (deformable vertex offsets on top of a sparse grid).

2.5 TRELLIS (Microsoft Research, 2024)

TRELLIS uses a structured-latent encoder + dual decoder architecture. The image-input is encoded to a structured latent (combination of sparse-voxel features and Gaussian-splat parameters); two parallel decoders produce sparse voxels (for geometric fidelity) and Gaussian splats (for photo-realistic rendering). The dual output is the closest competitor in scope to the thesis-line target — image input + multiple output representations — but the outputs are still non-parametric raw geometry.

2.6 The structural pattern

The structural pattern visible at the table level — three papers with parametric output, two papers with image input, zero papers with both — is the survey's central finding. The bifurcation is not coincidental; it reflects a missing encoder-and-output-head architectural combination that no surveyed paper has shipped.

3. The Field Gap

Plot the five papers on a 2-D axis (X = image-input capability, Y = parametric-output capability). The procedural-CAD cluster (CAPRI-Net, BrepGen, HoLa) sits at high Y, low X. The neural-reconstruction cluster (SparC3D, TRELLIS) sits at high X, low Y. The high-X, high-Y quadrant — image input and parametric output — is empty.

The empty quadrant is not empty by accident. The procedural-CAD methods all take inputs that are themselves abstract (latent codes, point clouds with extracted features) because going from image pixels to a parametric program requires a non-trivial encoder stage that no surveyed paper has shipped. The neural-reconstruction methods all produce raw geometry because there is no widely-deployed encoder that produces parametric output. The empty quadrant is the missing encoder-and-output-head combination.

This is the thesis-line opportunity. PGN [6] targets the gap with polyline input (a half-step from image input, since polylines are extractable from images) and DSL output (parametric). SculptNet [7] targets the gap with image input and primitive-program output (parametric). Both are the consequence of the survey's gap identification.

4. Local Inference Outcomes

Table 2 — Local-inference attempts.
Paper	Hardware tried	Outcome
CAPRI-Net	Intel iMac → GPU Jupyter notebook	Succeeded after dependency-pin fixes (numpy / scipy / torch versions)
BrepGen	Google Colab T4	Succeeded after CUDA-version compatibility fix (paper targets 11.x; Colab ships 12.x)
HoLa	Intel iMac CPU	Too slow on CPU; queued for RTX 3060 retry
SparC3D	Hugging Face Space (cloud)	Succeeded — used the HF demo rather than local install
TRELLIS	Microsoft research release · A100 required	Read-only — local hardware insufficient

The pattern: three of four locally-attempted papers required dependency-pin fixes to run. The defensive habit recorded as a feedback memory: always create a fresh conda env from the paper's own yaml, never reuse an existing env. A second pattern: Hugging Face Spaces are an underused survey tool — for SparC3D the HF demo provided sufficient understanding without the weeks of engineering a local install would have cost.

5. Limitations

Three honest limitations. (i) Small sample. Five papers is a representative selection, not exhaustive. The closest competitors not surveyed — InstantMesh, GS-LRM, Gamba — are covered in the G-Splats-vs-VDB follow-up survey [9] and do not change the gap finding. (ii) Snapshot in time. The survey is dated October 2025. A bridge paper between the two clusters could appear at any time; the gap claim is timestamped. (iii) Maps bias. The field-gap-as-opportunity framing reflects the thesis line's specific use case (Houdini-integrable parametric output). For a use case where raw-geometry output is acceptable, the SparC3D / TRELLIS branch is already sufficient.

6. Conclusion

Five 3-D-generation methods were surveyed. The field bifurcates into image-input + raw-geometry-output and latent-input + parametric-output clusters; the image-input + parametric-output quadrant is empty. This quadrant is the thesis-line opportunity, operationalised by PGN and SculptNet.

References

[1] Yu, F. et al. "CAPRI-Net: Learning Compact CAD Shapes with Adaptive Primitive Assembly." CVPR, 2022.

[2] BrepGen authors. "BrepGen: A B-Rep Generative Diffusion Model." 2024.

[3] HolaBRep authors. "HoLa: Hierarchical Learned B-Rep Generation." 2024.

[4] SparC3D authors. "SparC3D: Sparse-Cube 3-D Generation from a Single Image." 2024.

[5] Microsoft Research. "TRELLIS: A Structured Latent Representation for Versatile and High-Quality 3-D Generation." 2024.

[6] Jain, A. "PGN: A Transformer-Based Procedural Generator Network." Thesis research, Sep 2025. /whitepaper/pgn

[7] Jain, A. "SculptNet: Coarse-to-Fine 3-D Reconstruction." Thesis research, Feb 2026. /whitepaper/sculptnet

[8] Jain, A. "Research Survey Documentation." /research/research-survey

[9] Jain, A. "Gaussian Splats vs VDB: Architecture Comparison." Thesis research, Nov 2025. /whitepaper/gsplats-vs-vdb