Triplane Mechanics Deep-Dive — White Paper

Triplane Mechanics: Rendering, Storage, and the Mesh-Extraction Decoupling — A Decision Note Promoting Triplane to Universal Intermediate in the Thesis Line

Aaditya Jain

ad_jain@icloud.com · orcid.org/0009-0005-5534-5641

Triplane Representations · Mechanics · Thesis-Line Representation Decision

Submitted: January 2026 Subject: cs.GR · cs.LG Keywords: triplane, NeRF, EG3D, volume rendering, bilinear lookup, mesh extraction, thesis-line architecture decision

Abstract

We document the mechanics of triplane representations — three axis-aligned 2-D feature planes, queried per 3-D point by bilinear sampling and aggregated through a small MLP into density and colour — and the decision to promote triplanes to the universal intermediate representation for the Maps procedural-modelling thesis line. The decision is informed by a three-way comparison against alternative representations: triplane (~6–12 MB storage at 256² × 32 channels; ~30–80 ms render time at 256² output resolution on consumer GPU; 10–100× faster than vanilla NeRF), Gaussian splats (50–500 MB storage; ~5–15 ms render time; highest photo-realism; procedural-pipeline-unfriendly), and VDB / FVDB sparse voxels (10–80 MB storage; ~100–300 ms render time; Houdini-native, the thesis line's procedural substrate). Triplane wins as the universal intermediate because it has the smallest storage footprint, the most edit-friendly per-plane feature manipulation, and lossless convertibility to either G-Splat (for preview rendering) or VDB / mesh (for Houdini procedural integration). We document the rendering mechanics in detail (ray casting → bilinear sample on 3 planes → aggregate → MLP → density / colour → volume render), the common misconception that triplane rendering renders meshes (it does not — it volume-renders the triplane directly; mesh extraction is a separate downstream step), and the architectural consequences for downstream thesis-line topics: Hierarchical Triplane [1], Hexplane Autoencoder [2], MambaFlow3D [3], SculptNet [4], all consume a triplane-family representation as their native target. Keywords: triplane, EG3D, NeRF, universal intermediate, thesis-line architecture decision.

1. Introduction

By January 2026 the thesis line had three competing 3-D representations on the table: SDF + marching cubes (the Hexplane AE setup, the Six-Plane Mesh extraction), Gaussian splats (the late-2025 explosion of single-image-to-3-D papers including GS-LRM, Splatter Image, and Gamba), and triplanes (EG3D [5], InstantMesh, TRELLIS [6]). The pick-a-default exercise was load-bearing: every downstream generator's native output would consume the chosen representation, so the decision needed a written rationale.

This paper records the rationale. The decision: triplane as the universal intermediate. The reasoning is in §3; the prerequisite is a clean description of triplane mechanics (§2), because the decision turns on properties that the NeRF / G-Splat / VDB alternatives lack.

2. Triplane Mechanics

2.1 Storage

A triplane representation stores a 3-D scene as three axis-aligned 2-D feature planes F_xy ∈ ℝ^{H × W × C}, F_xz ∈ ℝ^{H × W × C}, F_yz ∈ ℝ^{H × W × C}. Typical values: H = W = 256 (the plane resolution), C = 32 (the per-pixel feature dimension). Total storage: 3 × 256 × 256 × 32 × 4 bytes (fp32) ≈ 25 MB; with fp16 ≈ 12 MB; with C = 16 ≈ 6 MB. The 6–12 MB range cited in the abstract reflects the practical trade-off between resolution and feature richness.

2.2 Feature query at a 3-D point

To query the scene at a 3-D point p = (x, y, z) with each coordinate normalised to [0, 1]:

F = F_xy(x, y) ⊕ F_xz(x, z) ⊕ F_yz(y, z)

where each F_*(·, ·) is a bilinear sample on the respective plane and ⊕ denotes the aggregation operation. Three aggregation choices are common: sum (cheapest, default for EG3D), concat (3× larger output dim, used by TRELLIS), product (Hadamard, occasionally used for sharpness). The aggregated feature vector F ∈ ℝ^C (or ℝ^{3C} for concat) is passed through a small MLP — typically 2–3 layers, ReLU activations, hidden dim 64 — to produce a per-point density σ ∈ ℝ_+ and colour c ∈ ℝ³.

2.3 Bilinear sampling on a 2-D plane

Bilinear sampling at continuous coordinate (u, v) on a discrete plane uses the four-corner-pixel weighted average:

F(u, v) = (1−a)(1−b) · F[⌊u⌋, ⌊v⌋] + a(1−b) · F[⌈u⌉, ⌊v⌋] + (1−a)b · F[⌊u⌋, ⌈v⌉] + ab · F[⌈u⌉, ⌈v⌉]

where a = u − ⌊u⌋ and b = v − ⌊v⌋. The bilinear sample is differentiable in both (u, v) and F, which is what makes triplanes trainable end-to-end via gradient descent.

2.4 Volume rendering

For each pixel in the camera view, cast a ray r(t) = o + t · d from the camera origin o through the pixel along direction d. Sample N points along the ray (typically N = 64–128 at strided t values t_1, …, t_N). For each sample point query the triplane for (σ_i, c_i). Volume-render via front-to-back alpha compositing:

α_i = 1 − exp(−σ_i · Δt_i) T_i = ∏_{j < i} (1 − α_j) (transmittance up to sample i) C = ∑_i T_i · α_i · c_i (the final pixel colour)

where Δt_i = t_{i+1} − t_i is the inter-sample interval. The same operation, computed per pixel, gives the final rendered image. Total cost: image_pixels × N × (3 bilinear samples + small MLP + alpha-composite step). At 256² output and N = 64 the cost is approximately 30–80 ms on a consumer GPU.

2.5 Mesh extraction (separate downstream step)

When the consumer wants an explicit triangle mesh — for Houdini integration, for export to CAD tools — query the triplane at a dense 3-D grid of points (typically 256³ = ~16 M queries) and produce a density field. Run marching cubes on the field at the isosurface threshold (the σ value chosen to be near-zero at the mesh surface). The marching-cubes step is not part of the rendering loop — it is a one-shot per-scene operation costing roughly 50–200 ms additional. The common misconception that triplane rendering renders meshes is wrong: rendering volume-renders the triplane directly.

3. The Comparison

Table 1 — Triplane vs G-Splats vs VDB across the dimensions that matter for the thesis-line use case.
Property	Triplane (chosen)	G-Splats	VDB / FVDB
Storage (typical scene)	6–12 MB	50–500 MB	10–80 MB
Render time (256² image)	30–80 ms	5–15 ms	100–300 ms
NeRF speed-up	10–100 ×	~10 ×	~5 ×
Editability	Edit 2-D feature planes directly	Move Gaussians (awkward)	Houdini-native (best for procedural)
Procedural-pipeline composability	Triplane → marching-cubes → mesh → Houdini (1 step)	Splat-to-mesh required (lossy)	Native
Differentiability	Yes (bilinear lookup + MLP)	Yes (standard)	Yes (FVDB)
Photo-realism	Mid-high	Highest	Mid (depends on shader)

Triplane wins the storage column, the editability column, and is competitive on render time. G-Splats win raw render speed and photo-realism; VDB wins procedural composability. The decision turns on a thesis-line-specific consideration: the universal intermediate is converted to G-Splat or to VDB / mesh as a one-time step per scene, not per frame. So the per-frame render-time advantage of G-Splats and the Houdini-native advantage of VDB both get recovered through conversion. The storage and editability advantages of triplane are not recoverable through conversion — those are properties of the storage format itself.

4. Architectural Consequences

Four downstream thesis-line topics are direct consumers of the triplane decision.

Table 2 — Downstream consumers of the triplane decision.
Topic	Use of triplane
Hierarchical Triplane [1]	Native target — per-part triplanes in local frames + global triplane for spatial context
Hexplane Autoencoder [2]	Six-plane generalisation (triplane × 2 signs); same lookup mechanics
MambaFlow3D [3]	Generates triplane-family tokens (SparseCubes in Phase 3); same retrieval semantics
SculptNet [4]	Primitive outputs are rasterised into triplanes for differentiable refinement

5. Conclusion

Triplane representations are the universal intermediate for the thesis line. The decision turns on storage size (6–12 MB), per-plane editability, and lossless convertibility to G-Splat or VDB / mesh for downstream consumption. Mesh extraction is a separate downstream step, not part of the rendering loop — a misconception worth pinning. The architectural consequences propagate to four downstream thesis-line topics.

References

[1] Jain, A. "Hierarchical Part-Based Triplane Reconstruction." Thesis research, Feb 2026. /whitepaper/hierarchical-triplane

[2] Jain, A. "When VAEs Meet Binary Geometry (Hexplane AE)." Thesis research, Dec 2025. /whitepaper/hexplane-ae

[3] Jain, A. "MambaFlow3D." Thesis research, Nov 2025. /whitepaper/mambaflow3d

[4] Jain, A. "SculptNet: Coarse-to-Fine 3-D Reconstruction." Thesis research, Feb 2026. /whitepaper/sculptnet

[5] Chan, E. R. et al. "EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks." CVPR, 2022. The triplane representation precursor.

[6] Microsoft Research. "TRELLIS: A Structured Latent Representation for Versatile and High-Quality 3-D Generation." 2024.

[7] Mildenhall, B. et al. "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis." ECCV, 2020. The volumetric-rendering baseline.

[8] Kerbl, B. et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." SIGGRAPH, 2023.