G-Splats vs VDB Architecture Comparison — White Paper

Gaussian Splats vs VDB for Single-Image-to-3-D: An Architecture Survey Across Splatter Image, GS-LRM, Triplane-Meets-Gaussian, and Gamba, with a Procedural-Pipeline-Aware Decision

Aaditya Jain

ad_jain@icloud.com · orcid.org/0009-0005-5534-5641

3-D Representations · Architecture Survey · Thesis-Line Output-Format Decision

Submitted: December 2025 Subject: cs.GR · cs.LG Keywords: Gaussian splatting, GS-LRM, Splatter Image, Gamba, VDB, FVDB, single-image-to-3-D, architecture survey, procedural-pipeline composability

Abstract

We survey four Gaussian-splat-based single-image-to-3-D methods — Splatter Image, GS-LRM, Triplane-Meets-Gaussian, Gamba — and compare the Gaussian-splatting (G-Splat) output format against the VDB / FVDB sparse-voxel format that the Maps procedural-modelling thesis line uses elsewhere. The comparison concludes that triplane is the universal intermediate for the thesis line's generators [1], with G-Splat conversion as an optional preview-rendering path and VDB / mesh conversion as the standard procedural-pipeline path. Among the four G-Splat methods, Gamba — Mamba over Gaussian-sequence tokens — is the closest architectural cousin to the thesis-line MambaFlow3D [2], and its existence in the published literature validates the architectural premise (Mamba state-space block as a competitive substitute for transformer attention at the SparC3D-class token counts). The contribution is the architecture-survey table comparing the four methods on speed, storage, output format, and procedural-pipeline composability, plus the documented decision-rule turning the comparison into the thesis-line architectural choice. Keywords: Gaussian splats, VDB, single-image-to-3-D survey, Mamba in 3-D, procedural-pipeline composability.

1. Introduction

Late 2025 saw a wave of single-image-to-3-D Gaussian-splat methods land at top venues, each demonstrating sub-second feed-forward inference and high photo-realism on a single A100. The thesis-line question: should the MambaFlow3D-class generator [2] target Gaussian splats as its native output, or should it stick with the triplane / VDB representations already used by the rest of the thesis line?

This paper documents the survey of the four leading G-Splat methods and the comparison against VDB / FVDB that informs the answer.

2. Four G-Splat Methods Surveyed

Table 1 — Four leading single-image-to-3-D Gaussian-splat methods.
Method	Architecture	Input	Speed	Thesis-line relevance
GS-LRM	Transformer (LRM-style)	2–4 sparse views	0.23 s on A100	Highest-quality reference
Triplane-Meets-Gaussian	Dual decoder (point-cloud + triplane)	Single view	~0.5 s	Bridges triplane and G-Splat ecosystems
Splatter Image	U-Net pixel → Gaussian	Single view	~0.3 s	Simplest architecture; one Gaussian per pixel
Gamba	Mamba over Gaussian sequence	Single view	~0.4 s	Direct architectural cousin of MambaFlow3D

2.1 GS-LRM (Zhang et al., 2024)

Large Reconstruction Model architecture. Input: 2–4 sparse views of a scene with known camera poses. Architecture: a deep transformer (24+ layers, multi-head self-attention) operates over per-view image-token sequences with cross-view attention. Output: a fixed-cardinality set of 3-D Gaussian primitives (position, scale, rotation, colour, opacity per Gaussian) — typically 4 096 Gaussians per scene. Reported inference: 0.23 s on A100 GPU. Quality is the highest of the four surveyed but requires multiple input views, which is a stricter setup than the single-image use case the thesis line targets.

2.2 Triplane-Meets-Gaussian

Dual-decoder bridge architecture. Input: single image. Architecture: a shared encoder produces a structured latent; two parallel decoders generate (a) a triplane feature representation, (b) Gaussian-splat parameters. The dual output lets downstream consumers pick the format that fits their use case. Inference ~0.5 s on consumer GPU. Most thesis-line-architecturally compatible because it includes a triplane output natively — easy to integrate with the thesis-line universal-intermediate decision [1].

2.3 Splatter Image (Szymanowicz et al., CVPR 2024)

Simplest of the four architecturally. Input: single image. Architecture: a U-Net predicts one Gaussian per input pixel — position offset from the pixel's ray, scale, rotation, colour, opacity. The result: H × W = 256 × 256 = 65 K Gaussians per scene, parameterised pixel-by-pixel. Inference ~0.3 s. The "one Gaussian per pixel" parameterisation is elegant but produces redundant Gaussians for smooth regions and under-represents fine surface detail.

2.4 Gamba (Shen et al., 2024) — the thesis-line cousin

Gamba is the most thesis-line-relevant of the four. Architecture: substitute a Mamba state-space block for transformer attention over the Gaussian-token sequence. Specifically, the image is encoded to a sequence of tokens; the tokens are processed by stacked Mamba blocks (linear time in sequence length, constant per-token-update memory) rather than transformer attention (quadratic in sequence length). The output is a Gaussian-primitive sequence in the same shape as GS-LRM's output. Inference ~0.4 s.

Gamba's architectural choice — Mamba over transformer for 3-D-generation token-sequence processing — is the same architectural choice MambaFlow3D [2] makes for SparC3D-class sparse-cube tokenisation. The two differ in the output target (Gamba outputs Gaussians; MambaFlow3D outputs sparse cubes) but share the backbone. Gamba's existence as a published paper is the strongest external validation of the Mamba-substitution premise that MambaFlow3D builds on; the MNIST validation in [3] provides internal empirical support.

3. G-Splats vs VDB vs Triplane — The Comparison

Table 2 — Three-way comparison on the dimensions that matter for the thesis-line use case.
Property	Gaussian Splats	VDB / FVDB	Triplane (chosen)
Render speed (256² image)	5–15 ms	~100–300 ms	30–80 ms
Storage (typical scene)	50–500 MB	10–80 MB	6–12 MB
Editability	Move Gaussians directly	Houdini-native (best for procedural)	Edit 2-D feature planes
Procedural-pipeline integration	Splat-to-mesh required (lossy)	Native	Triplane → marching cubes → mesh → Houdini
Photo-realism	Highest (matches NeRF)	Mid	Mid–high
Single-image-to-3-D leaders	GS-LRM, Splatter Image, Gamba	Uncommon (heavy compute)	EG3D, InstantMesh, TRELLIS

4. The Decision

The decision-rule, set ex ante: pick the representation with the smallest storage + best editability + cleanest convertibility to both alternatives. Storage and editability favour triplane. Convertibility: triplane → mesh extraction → Houdini (the procedural-pipeline path) is one step; triplane → density-field volume rendering → G-Splat-class preview is another path that does not require Gaussian-from-image inference. Conversely, G-Splat → mesh is lossy (Gaussians do not have an explicit surface); VDB → triplane is approximately a downsample-with-decoder operation but adds a stage.

Triplane wins as the universal intermediate. G-Splat is retained as an optional preview-rendering target for use cases where photo-realism matters. VDB is retained as the procedural-pipeline interchange format (Houdini-native).

5. Gamba and the MambaFlow3D Premise

The thesis-line MambaFlow3D [2] proposes substituting a Pure-Mamba state-space backbone for transformer attention over SparC3D-class sparse-cube tokens. The MNIST validation in [3] provides empirical support at 196 tokens. Gamba — published before this thesis-line work — provides published support for the same architectural substitution applied to Gaussian-sequence tokens. The substitution is therefore not a thesis-line novelty in isolation; what MambaFlow3D contributes is the application to sparse-cube tokens (rather than Gaussian-sequence tokens) and the consumer-GPU speed-up budget targeted at the Maps procedural use case.

6. Conclusion

Triplane is the universal intermediate for the thesis-line single-image-to-3-D generators. G-Splat is optional preview rendering; VDB is procedural-pipeline integration. Gamba validates the MambaFlow3D Mamba-substitution premise for the architectural class but not for the specific sparse-cube tokenisation MambaFlow3D targets.

References

[1] Jain, A. "Triplane Mechanics Deep-Dive." Thesis research, Jan 2026. /whitepaper/triplane-deep-dive

[2] Jain, A. "MambaFlow3D: Spec, Speed-up Budget, and ModelNet10 Phase-2." Thesis research, Nov 2025. /whitepaper/mambaflow3d

[3] Jain, A. "MNIST Flow-Matching Backbone Validation." Thesis research, Nov 2025. /whitepaper/mnist-flow-validation

[4] Kerbl, B. et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." SIGGRAPH, 2023.

[5] Szymanowicz, S. et al. "Splatter Image." CVPR, 2024.

[6] Zhang, K. et al. "GS-LRM: Large Reconstruction Model for 3-D Gaussian Splatting." 2024.

[7] Triplane-Meets-Gaussian authors. "Triplane Meets Gaussian Splatting." 2024.

[8] Shen, Q. et al. "Gamba: Marry Gaussian Splatting with Mamba for Single-View 3D Reconstruction." 2024.

[9] NVIDIA. "FVDB Documentation." 2025.