Survey of single-image-to-3-D approaches comparing Gaussian splatting (Splatter Image, GS-LRM, Triplane-Meets-Gaussian, Gamba) against VDB / FVDB sparse-voxel approaches. The thesis-line question: which one to pick as the output representation for the MambaFlow3D-class generator. The decision: triplane as the universal intermediate, with VDB used for procedural-pipeline integration and G-splats as an optional fast- preview rendering target.
Late 2025 a wave of single-image-to-3-D Gaussian-splat methods landed at top venues: GS-LRM (sparse-view feed-forward, 0.23 s on A100), Triplane-Meets-Gaussian (dual decoder, point-cloud + triplane), Splatter Image (U-Net pixel-to-Gaussian), Gamba (Mamba over Gaussian sequences). The question for the thesis line was whether MambaFlow3D should target G-splats as its output, or stick with the VDB / triplane representations already used elsewhere in the thesis-line.
The decision: triplane (and its hexplane / six-plane generalisations) is the universal intermediate. G-splats and VDB are both consumers that the triplane can convert to. The reasoning is in §02.
| Property | Gaussian Splats | VDB / FVDB | Triplane (chosen) |
|---|---|---|---|
| Render speed (256² image) | 5–15 ms | ~100–300 ms (volume render) | 30–80 ms (volume render) |
| Storage | 50–500 MB (per scene) | 10–80 MB (per scene) | 6–12 MB (256² × 32 ch) |
| Editability | Move Gaussians directly | Houdini / Open3D native | Edit 2-D feature planes |
| Procedural pipeline (Houdini) | Awkward — splat-to-mesh conversion needed | Native — VDB is Houdini's format | Triplane → marching cubes → mesh → Houdini |
| Differentiability for training | Yes — standard practice | Yes — FVDB | Yes — bilinear lookup is differentiable |
| Single-image-to-3-D leaders | GS-LRM, Splatter Image, Gamba | Less common (heavier compute) | EG3D, InstantMesh, TRELLIS |
| Photo-realism | Highest (matches NeRF) | Mid (depends on shader) | Mid–high (depends on triplane resolution) |
Triplane as universal intermediate.
G-Splats for preview. VDB for Houdini.
Triplane wins as the universal generator output because: (i) it is the smallest of the three at 6–12 MB; (ii) it is the most edit-friendly via 2-D feature-plane editing, which composes with the procedural-modelling thesis line; (iii) it converts cleanly to either G-Splats (for fast preview rendering) or VDB / mesh (for Houdini procedural integration). G-Splats win at raw photo-realistic render speed; VDB wins at Houdini composability. Triplane wins at both if you accept the conversion cost, which is paid once per generated scene rather than per frame.
| Method | Architecture | Input | Speed | Notes |
|---|---|---|---|---|
| GS-LRM | Transformer (LRM-style) | 2–4 sparse views | 0.23 s on A100 | Highest quality at low view count |
| Triplane-Meets-Gaussian | Dual decoder (point-cloud + triplane) | Single view | ~0.5 s | Bridges triplane & G-Splat ecosystems |
| Splatter Image | U-Net pixel → Gaussian | Single view | ~0.3 s | One Gaussian per pixel; simple |
| Gamba | Mamba over Gaussian sequence | Single view | ~0.4 s | The direct inspiration for the MambaFlow3D backbone choice (Topic 26) |
Toggle between Gaussian splats and a triplane / VDB view of the same shape. Each representation renders differently; the underlying geometry is the same.
White paper · four-method G-Splat survey · three-way comparison vs VDB vs triplane · Gamba validates MambaFlow3D premise