Architecture Survey · cs.GR · cs.LG · Dec 2025
Documentation → ← Back to White Papers
Gaussian Splats vs VDB for Single-Image-to-3-D: An Architecture Survey Across Splatter Image, GS-LRM, Triplane-Meets-Gaussian, and Gamba, with a Procedural-Pipeline-Aware Decision
Aaditya Jain
3-D Representations · Architecture Survey · Thesis-Line Output-Format Decision
Submitted: December 2025 Subject: cs.GR · cs.LG Keywords: Gaussian splatting, GS-LRM, Splatter Image, Gamba, VDB, FVDB, single-image-to-3-D, architecture survey, procedural-pipeline composability
Abstract
We survey four Gaussian-splat-based single-image-to-3-D methods — Splatter Image, GS-LRM, Triplane-Meets-Gaussian, Gamba — and compare the Gaussian-splatting (G-Splat) output format against the VDB / FVDB sparse-voxel format that the Maps procedural-modelling thesis line uses elsewhere. The comparison concludes that triplane is the universal intermediate for the thesis line's generators [1], with G-Splat conversion as an optional preview-rendering path and VDB / mesh conversion as the standard procedural-pipeline path. Among the four G-Splat methods, Gamba — Mamba over Gaussian-sequence tokens — is the closest architectural cousin to the thesis-line MambaFlow3D [2], and its existence in the published literature validates the architectural premise (Mamba state-space block as a competitive substitute for transformer attention at the SparC3D-class token counts). The contribution is the architecture-survey table comparing the four methods on speed, storage, output format, and procedural-pipeline composability, plus the documented decision-rule turning the comparison into the thesis-line architectural choice. Keywords: Gaussian splats, VDB, single-image-to-3-D survey, Mamba in 3-D, procedural-pipeline composability.
1. Introduction

Late 2025 saw a wave of single-image-to-3-D Gaussian-splat methods land at top venues, each demonstrating sub-second feed-forward inference and high photo-realism on a single A100. The thesis-line question: should the MambaFlow3D-class generator [2] target Gaussian splats as its native output, or should it stick with the triplane / VDB representations already used by the rest of the thesis line?

This paper documents the survey of the four leading G-Splat methods and the comparison against VDB / FVDB that informs the answer.

2. Four G-Splat Methods Surveyed
Table 1 — Four leading single-image-to-3-D Gaussian-splat methods.
MethodArchitectureInputSpeedThesis-line relevance
GS-LRMTransformer (LRM-style)2–4 sparse views0.23 s on A100Highest-quality reference
Triplane-Meets-GaussianDual decoder (point-cloud + triplane)Single view~0.5 sBridges triplane and G-Splat ecosystems
Splatter ImageU-Net pixel → GaussianSingle view~0.3 sSimplest architecture; one Gaussian per pixel
GambaMamba over Gaussian sequenceSingle view~0.4 sDirect architectural cousin of MambaFlow3D
2.1 GS-LRM (Zhang et al., 2024)

Large Reconstruction Model architecture. Input: 2–4 sparse views of a scene with known camera poses. Architecture: a deep transformer (24+ layers, multi-head self-attention) operates over per-view image-token sequences with cross-view attention. Output: a fixed-cardinality set of 3-D Gaussian primitives (position, scale, rotation, colour, opacity per Gaussian) — typically 4 096 Gaussians per scene. Reported inference: 0.23 s on A100 GPU. Quality is the highest of the four surveyed but requires multiple input views, which is a stricter setup than the single-image use case the thesis line targets.

2.2 Triplane-Meets-Gaussian

Dual-decoder bridge architecture. Input: single image. Architecture: a shared encoder produces a structured latent; two parallel decoders generate (a) a triplane feature representation, (b) Gaussian-splat parameters. The dual output lets downstream consumers pick the format that fits their use case. Inference ~0.5 s on consumer GPU. Most thesis-line-architecturally compatible because it includes a triplane output natively — easy to integrate with the thesis-line universal-intermediate decision [1].

2.3 Splatter Image (Szymanowicz et al., CVPR 2024)

Simplest of the four architecturally. Input: single image. Architecture: a U-Net predicts one Gaussian per input pixel — position offset from the pixel's ray, scale, rotation, colour, opacity. The result: H × W = 256 × 256 = 65 K Gaussians per scene, parameterised pixel-by-pixel. Inference ~0.3 s. The "one Gaussian per pixel" parameterisation is elegant but produces redundant Gaussians for smooth regions and under-represents fine surface detail.

2.4 Gamba (Shen et al., 2024) — the thesis-line cousin

Gamba is the most thesis-line-relevant of the four. Architecture: substitute a Mamba state-space block for transformer attention over the Gaussian-token sequence. Specifically, the image is encoded to a sequence of tokens; the tokens are processed by stacked Mamba blocks (linear time in sequence length, constant per-token-update memory) rather than transformer attention (quadratic in sequence length). The output is a Gaussian-primitive sequence in the same shape as GS-LRM's output. Inference ~0.4 s.

Gamba's architectural choice — Mamba over transformer for 3-D-generation token-sequence processing — is the same architectural choice MambaFlow3D [2] makes for SparC3D-class sparse-cube tokenisation. The two differ in the output target (Gamba outputs Gaussians; MambaFlow3D outputs sparse cubes) but share the backbone. Gamba's existence as a published paper is the strongest external validation of the Mamba-substitution premise that MambaFlow3D builds on; the MNIST validation in [3] provides internal empirical support.

3. G-Splats vs VDB vs Triplane — The Comparison
Table 2 — Three-way comparison on the dimensions that matter for the thesis-line use case.
PropertyGaussian SplatsVDB / FVDBTriplane (chosen)
Render speed (256² image)5–15 ms~100–300 ms30–80 ms
Storage (typical scene)50–500 MB10–80 MB6–12 MB
EditabilityMove Gaussians directlyHoudini-native (best for procedural)Edit 2-D feature planes
Procedural-pipeline integrationSplat-to-mesh required (lossy)NativeTriplane → marching cubes → mesh → Houdini
Photo-realismHighest (matches NeRF)MidMid–high
Single-image-to-3-D leadersGS-LRM, Splatter Image, GambaUncommon (heavy compute)EG3D, InstantMesh, TRELLIS
4. The Decision

The decision-rule, set ex ante: pick the representation with the smallest storage + best editability + cleanest convertibility to both alternatives. Storage and editability favour triplane. Convertibility: triplane → mesh extraction → Houdini (the procedural-pipeline path) is one step; triplane → density-field volume rendering → G-Splat-class preview is another path that does not require Gaussian-from-image inference. Conversely, G-Splat → mesh is lossy (Gaussians do not have an explicit surface); VDB → triplane is approximately a downsample-with-decoder operation but adds a stage.

Triplane wins as the universal intermediate. G-Splat is retained as an optional preview-rendering target for use cases where photo-realism matters. VDB is retained as the procedural-pipeline interchange format (Houdini-native).

5. Gamba and the MambaFlow3D Premise

The thesis-line MambaFlow3D [2] proposes substituting a Pure-Mamba state-space backbone for transformer attention over SparC3D-class sparse-cube tokens. The MNIST validation in [3] provides empirical support at 196 tokens. Gamba — published before this thesis-line work — provides published support for the same architectural substitution applied to Gaussian-sequence tokens. The substitution is therefore not a thesis-line novelty in isolation; what MambaFlow3D contributes is the application to sparse-cube tokens (rather than Gaussian-sequence tokens) and the consumer-GPU speed-up budget targeted at the Maps procedural use case.

6. Conclusion

Triplane is the universal intermediate for the thesis-line single-image-to-3-D generators. G-Splat is optional preview rendering; VDB is procedural-pipeline integration. Gamba validates the MambaFlow3D Mamba-substitution premise for the architectural class but not for the specific sparse-cube tokenisation MambaFlow3D targets.

References
[1] Jain, A. "Triplane Mechanics Deep-Dive." Thesis research, Jan 2026. /whitepaper/triplane-deep-dive
[2] Jain, A. "MambaFlow3D: Spec, Speed-up Budget, and ModelNet10 Phase-2." Thesis research, Nov 2025. /whitepaper/mambaflow3d
[3] Jain, A. "MNIST Flow-Matching Backbone Validation." Thesis research, Nov 2025. /whitepaper/mnist-flow-validation
[4] Kerbl, B. et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." SIGGRAPH, 2023.
[5] Szymanowicz, S. et al. "Splatter Image." CVPR, 2024.
[6] Zhang, K. et al. "GS-LRM: Large Reconstruction Model for 3-D Gaussian Splatting." 2024.
[7] Triplane-Meets-Gaussian authors. "Triplane Meets Gaussian Splatting." 2024.
[8] Shen, Q. et al. "Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction." 2024.
[9] NVIDIA. "FVDB Documentation." 2025.