Hierarchical Part-Based Triplane Reconstruction

00 — Motivation

Triplanes have an occlusion problem. Parts solve it.

The thesis-line goal is a real-time, editable 3-D map of the world — every building, every facade element, every interior, encoded in a representation light enough to stream and structured enough to edit. Triplanes are the obvious efficient encoding: three orthogonal 2-D feature textures (XY, XZ, YZ) replace a dense voxel grid at a fraction of the memory cost, and a learned SDF decoder reconstructs the surface continuously. SparC3D and related work show the encoding is competitive on single-object benchmarks.

The trouble starts when objects have overlapping parts. Two structural elements occupying the same column of XY pixels — a chair leg behind another chair leg, a hand resting on a table, a U-shape's inner walls projecting onto the same plane as its outer walls — collapse into a single feature stack. The decoder cannot tell where one part ends and the next begins because the projection erased the layering. The result is "ghosting": phantom surfaces where parts blur together, missing geometry where the rear part is shadowed by the front, and a representation that's only as good as the most occlusion-free shape it was trained on.

The earlier Hybrid Sparse-Triplane Engine work attempted to fix this with a gated approach — a sparse VDB bitmask gating the triplane feature stack to suppress ghost surfaces. The gate helped but the representation was still per-shape, not per-part. The hierarchical part-based formulation here is the next step: give each semantic part its own isolated triplane set, plus a global triplane for spatial relationships between parts. Occlusion is then a per-part problem (where it's much smaller) rather than a per-shape problem. Editability comes for free — parts can be updated independently because they are stored independently.

Where this fits

This is one of several thesis-line attempts at the triplane representation problem. The companion thread six-plane reconstruction attacks the same issue with classical geometry rather than neural decoders; the six-plane depth validation provides the synthetic test inputs both threads consume. Both threads converge on the same answer: the geometry of a shape with overlapping parts cannot be encoded in a single projection; it has to be decomposed.

01 — The Inter-Part Occlusion Problem

One triplane × overlapping parts = ghost surfaces.

A triplane stores 2-D feature vectors at each (x, y), (x, z), and (y, z) pixel. To query a 3-D point p = (x, y, z), the decoder samples one feature from each plane and concatenates them: f(p) = [F_XY(x,y), F_XZ(x,z), F_YZ(y,z)]. The MLP decoder turns that triple into an SDF value.

The representational ambiguity is immediate. Consider two points p1 = (0, 0, 0.3) and p2 = (0, 0, 0.7). Both have (x, y) = (0, 0), so F_XY(x, y) is identical for both. F_XZ and F_YZ differ because z differs, but the MLP receives only three interpolated features — there's no mechanism to recover which 3-D point along the XY ray the feature actually describes. When two parts of a shape occupy the same XY column at different depths, their feature contributions blur into the same encoding and the decoder cannot disambiguate.

The visible failure: U-shaped silhouettes reconstruct as filled rectangles, interior cavities collapse, and pairs of parallel surfaces (e.g., the two walls of a tube) reduce to a single mean surface.

Core Insight

Decompose first.
Project second.

A triplane projection erases the depth axis. Two parts at different depths in the same XY column become indistinguishable. The fix is not a clever decoder — the decoder still sees only three 2-D features per query — but a decomposition that ensures only one part contributes feature mass to any given pixel. Parts are stored separately, queried separately, composed at the end. The decoder never has to disambiguate what was already structurally separated.

03 — Method

Local frames, shared decoder, late composition.

A — DECOMPOSITION

Semantic parts from PartNeXt annotations

Input meshes carry hierarchical part labels from the PartNeXt dataset (~26K models, 24 categories, 4–5 levels deep). Each part gets an axis-aligned bounding box and a local coordinate frame. For shapes without annotations, the fallback is a connected-component + small-cluster decomposition that approximates semantic parts.

B — PER-PART TRIPLANES

Local XY/XZ/YZ in the part's own frame

Each part's geometry is encoded as a triplane set in its local frame. Because the part is solo in its frame, no inter-part occlusion can occur — every (x, y) column in the part's XY plane is populated by at most one part. Resolution per-part is matched to the part's bounding-box scale, so small parts (handles, bolts) get comparable feature density to large parts (chair seat, wall).

C — GLOBAL TRIPLANE

Spatial-context features in the whole-shape frame

A coarser global triplane stores feature vectors in the full-shape coordinate frame. Its job is to encode spatial relationships between parts — which part attaches where, which parts are adjacent, the overall scale. The global triplane does NOT have to encode fine per-part detail (that's the per-part triplanes' job), so it runs at much lower resolution without quality loss.

D — DECODER

Shared SDF MLP, conditioned on part_id

A single MLP receives the part's local triplane features, the global triplane features at the same world point, and a part_id embedding. Outputs SDF. The shared decoder learns a unified geometry prior across part categories rather than per-part specialised networks — important for generalising to unseen part-instance combinations.

# Per-point SDF query
def sdf(p_world, part_id):
    # Sample features from the part's local triplane
    p_local = world_to_local(p_world, part_frames[part_id])
    f_xy = sample(part_planes[part_id]['xy'], p_local.xy)
    f_xz = sample(part_planes[part_id]['xz'], p_local.xz)
    f_yz = sample(part_planes[part_id]['yz'], p_local.yz)
    f_local = concat([f_xy, f_xz, f_yz])

    # Sample features from the global triplane at the same world point
    g_xy = sample(global_planes['xy'], p_world.xy)
    g_xz = sample(global_planes['xz'], p_world.xz)
    g_yz = sample(global_planes['yz'], p_world.yz)
    f_global = concat([g_xy, g_xz, g_yz])

    # Part-id embedding, then decode
    e = embed(part_id)
    return decoder_mlp(concat([f_local, f_global, e]))

04 — Trade-offs vs Single-Triplane

More storage, structurally better fidelity.

Property	Single Triplane	Hierarchical Part-Based
Memory	3 × N² features	(N_parts + 1) × 3 × n² features (n < N)
Inter-part occlusion	Causes ghost surfaces	Structurally eliminated
Per-part editing	Re-encode whole shape	Update one part triplane
Cross-category generalisation	Limited by training distribution	Shared decoder + part_id embedding generalises
Generation by diffusion	Single 3-channel tensor — easy	Variable-cardinality structured object — harder
Inference cost	3 triplane samples + 1 MLP call	3 + 3 triplane samples + 1 MLP call · ~2× slower

When part-based is the wrong choice

For shapes with no semantically meaningful parts — organic blobs, smooth sculptural forms, single-piece industrial geometry — the part decomposition is artificial and the overhead of separate triplanes is not paid back by occlusion handling. For these classes, the single-triplane representation with the SparC3D / SDF-extension is the right call. The hierarchical formulation here is for articulated or compositional shapes: furniture, mechanical assemblies, architectural elements, buildings with windows / cornices / doors as semantic parts.

Appendix — Raw Materials

Transcripts & Source References

████████████████████████████████████████████████
███████████████████████████████████████

01 — ██████████████████████████

██████████████████████████████████████

█████████ · ████ · █████████████████████

█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

██████████████████████████████████████████████

██████████ · ████ · ███████████████████████████████

02 — ████████████████████████████████

████████████████████████████████████████████

Restricted Access

Hierarchical Part-Based
Triplane Reconstruction.

Triplanes have an occlusion problem. Parts solve it.

One triplane × overlapping parts = ghost surfaces.

N + 1 triplane sets: one per part, plus one global.

Local frames, shared decoder, late composition.

More storage, structurally better fidelity.

Interactive Demo · Live

Full Technical Paper

Hierarchical Part-Based Triplane Reconstruction.

Triplanes have an occlusion problem. Parts solve it.

One triplane × overlapping parts = ghost surfaces.

N + 1 triplane sets: one per part, plus one global.

Local frames, shared decoder, late composition.

More storage, structurally better fidelity.

Interactive Demo · Live

Full Technical Paper

Hierarchical Part-Based
Triplane Reconstruction.