← Research Timeline Aditya Jain / Apple Maps · 3D Reconstruction
Feb 2026
Topic 33 Feb 2026 Neural 3D Representation · Architecture

Hierarchical Part-Based
Triplane Reconstruction.

Per-part isolated triplane sets that eliminate inter-part occlusion structurally, plus a global triplane for spatial context. Parts are composable, independently updatable, and decoded by a shared SDF network. The architectural answer to triplane ghosting on multi-part shapes.

00 — Motivation

Triplanes have an occlusion problem. Parts solve it.

The thesis-line goal is a real-time, editable 3-D map of the world — every building, every facade element, every interior, encoded in a representation light enough to stream and structured enough to edit. Triplanes are the obvious efficient encoding: three orthogonal 2-D feature textures (XY, XZ, YZ) replace a dense voxel grid at a fraction of the memory cost, and a learned SDF decoder reconstructs the surface continuously. SparC3D and related work show the encoding is competitive on single-object benchmarks.

The trouble starts when objects have overlapping parts. Two structural elements occupying the same column of XY pixels — a chair leg behind another chair leg, a hand resting on a table, a U-shape's inner walls projecting onto the same plane as its outer walls — collapse into a single feature stack. The decoder cannot tell where one part ends and the next begins because the projection erased the layering. The result is "ghosting": phantom surfaces where parts blur together, missing geometry where the rear part is shadowed by the front, and a representation that's only as good as the most occlusion-free shape it was trained on.

The earlier Hybrid Sparse-Triplane Engine work attempted to fix this with a gated approach — a sparse VDB bitmask gating the triplane feature stack to suppress ghost surfaces. The gate helped but the representation was still per-shape, not per-part. The hierarchical part-based formulation here is the next step: give each semantic part its own isolated triplane set, plus a global triplane for spatial relationships between parts. Occlusion is then a per-part problem (where it's much smaller) rather than a per-shape problem. Editability comes for free — parts can be updated independently because they are stored independently.

Where this fits
This is one of several thesis-line attempts at the triplane representation problem. The companion thread six-plane reconstruction attacks the same issue with classical geometry rather than neural decoders; the six-plane depth validation provides the synthetic test inputs both threads consume. Both threads converge on the same answer: the geometry of a shape with overlapping parts cannot be encoded in a single projection; it has to be decomposed.
01 — The Inter-Part Occlusion Problem

One triplane × overlapping parts = ghost surfaces.

A triplane stores 2-D feature vectors at each (x, y), (x, z), and (y, z) pixel. To query a 3-D point p = (x, y, z), the decoder samples one feature from each plane and concatenates them: f(p) = [F_XY(x,y), F_XZ(x,z), F_YZ(y,z)]. The MLP decoder turns that triple into an SDF value.

The representational ambiguity is immediate. Consider two points p1 = (0, 0, 0.3) and p2 = (0, 0, 0.7). Both have (x, y) = (0, 0), so F_XY(x, y) is identical for both. F_XZ and F_YZ differ because z differs, but the MLP receives only three interpolated features — there's no mechanism to recover which 3-D point along the XY ray the feature actually describes. When two parts of a shape occupy the same XY column at different depths, their feature contributions blur into the same encoding and the decoder cannot disambiguate.

The visible failure: U-shaped silhouettes reconstruct as filled rectangles, interior cavities collapse, and pairs of parallel surfaces (e.g., the two walls of a tube) reduce to a single mean surface.

02 — Architecture

N + 1 triplane sets: one per part, plus one global.

Input mesh .obj · multi-part w/ part labels INPUT PART DECOMPOSITION PartNeXt hierarchy N parts · bounding boxes PART TRIPLANES (×N) XY · XZ · YZ per part, local frame GLOBAL TRIPLANE XY · XZ · YZ · whole-shape frame PART-AWARE DECODER shared SDF MLP part_id + local + global feat MARCHING CUBES per-part + composed no inter-part ghosts MESH editable per-part USD OUTPUT N + 1 TRIPLANE SETS · inter-part occlusion eliminated structurally · parts independently updatable
Figure 1 — Architecture. The mesh is decomposed into N semantic parts; each part gets its own triplane set in a local frame; a global triplane stores spatial relationships between parts. The decoder takes (part_id, local_feat, global_feat) and predicts SDF. Occlusion is per-part rather than per-shape.
Core Insight

Decompose first.
Project second.

A triplane projection erases the depth axis. Two parts at different depths in the same XY column become indistinguishable. The fix is not a clever decoder — the decoder still sees only three 2-D features per query — but a decomposition that ensures only one part contributes feature mass to any given pixel. Parts are stored separately, queried separately, composed at the end. The decoder never has to disambiguate what was already structurally separated.

03 — Method

Local frames, shared decoder, late composition.

A — DECOMPOSITION
Semantic parts from PartNeXt annotations

Input meshes carry hierarchical part labels from the PartNeXt dataset (~26K models, 24 categories, 4–5 levels deep). Each part gets an axis-aligned bounding box and a local coordinate frame. For shapes without annotations, the fallback is a connected-component + small-cluster decomposition that approximates semantic parts.

B — PER-PART TRIPLANES
Local XY/XZ/YZ in the part's own frame

Each part's geometry is encoded as a triplane set in its local frame. Because the part is solo in its frame, no inter-part occlusion can occur — every (x, y) column in the part's XY plane is populated by at most one part. Resolution per-part is matched to the part's bounding-box scale, so small parts (handles, bolts) get comparable feature density to large parts (chair seat, wall).

C — GLOBAL TRIPLANE
Spatial-context features in the whole-shape frame

A coarser global triplane stores feature vectors in the full-shape coordinate frame. Its job is to encode spatial relationships between parts — which part attaches where, which parts are adjacent, the overall scale. The global triplane does NOT have to encode fine per-part detail (that's the per-part triplanes' job), so it runs at much lower resolution without quality loss.

D — DECODER
Shared SDF MLP, conditioned on part_id

A single MLP receives the part's local triplane features, the global triplane features at the same world point, and a part_id embedding. Outputs SDF. The shared decoder learns a unified geometry prior across part categories rather than per-part specialised networks — important for generalising to unseen part-instance combinations.

# Per-point SDF query def sdf(p_world, part_id): # Sample features from the part's local triplane p_local = world_to_local(p_world, part_frames[part_id]) f_xy = sample(part_planes[part_id]['xy'], p_local.xy) f_xz = sample(part_planes[part_id]['xz'], p_local.xz) f_yz = sample(part_planes[part_id]['yz'], p_local.yz) f_local = concat([f_xy, f_xz, f_yz]) # Sample features from the global triplane at the same world point g_xy = sample(global_planes['xy'], p_world.xy) g_xz = sample(global_planes['xz'], p_world.xz) g_yz = sample(global_planes['yz'], p_world.yz) f_global = concat([g_xy, g_xz, g_yz]) # Part-id embedding, then decode e = embed(part_id) return decoder_mlp(concat([f_local, f_global, e]))
04 — Trade-offs vs Single-Triplane

More storage, structurally better fidelity.

PropertySingle TriplaneHierarchical Part-Based
Memory3 × N² features(N_parts + 1) × 3 × n² features (n < N)
Inter-part occlusionCauses ghost surfacesStructurally eliminated
Per-part editingRe-encode whole shapeUpdate one part triplane
Cross-category generalisationLimited by training distributionShared decoder + part_id embedding generalises
Generation by diffusionSingle 3-channel tensor — easyVariable-cardinality structured object — harder
Inference cost3 triplane samples + 1 MLP call3 + 3 triplane samples + 1 MLP call · ~2× slower
When part-based is the wrong choice
For shapes with no semantically meaningful parts — organic blobs, smooth sculptural forms, single-piece industrial geometry — the part decomposition is artificial and the overhead of separate triplanes is not paid back by occlusion handling. For these classes, the single-triplane representation with the SparC3D / SDF-extension is the right call. The hierarchical formulation here is for articulated or compositional shapes: furniture, mechanical assemblies, architectural elements, buildings with windows / cornices / doors as semantic parts.

Interactive Demo · Live

Pick a shape or click the input canvas to cycle. The middle pane shows the per-part triplane decomposition (parts in different colours, each with its own XY/XZ/YZ feature textures); the right pane shows the reconstructed mesh with parts independently colour-coded. Drag the mesh to rotate.

01 — Input Shape · CLICK TO CYCLE CHAIR · 6 parts
02 — Per-Part Triplane Decomposition
03 — Composed Mesh (parts colour-coded) Drag to rotate

Full Technical Paper

arXiv-format write-up · Hierarchical Part-Based Triplane Reconstruction · occlusion analysis, architecture, trade-off table, predecessor work

Read Paper →
Related Thesis Chapters
6-Plane Mesh Reconstruction
Same underlying problem (overlapping parts collapse under projection) attacked with classical geometry instead of a neural decoder. The two threads converge on the same conclusion: decompose before projecting.
ProcGen3D — Edge-Based Tokenization
Sister direction on the structured-representation thesis line. ProcGen3D decomposes into edges, this work decomposes into parts. Both produce composable intermediate representations rather than monolithic outputs.
Building Elevation Reconstruction
Natural deployment target. Buildings are compositional by definition — facade, cornices, balconies, windows are all semantic parts. The hierarchical triplane is the architecture that would replace the current six-plane mesh frontend if neural quality becomes competitive.
Appendix — Raw Materials
Transcripts & Source References
████████████████████████████████████████████████
███████████████████████████████████████

██████████████████████████████████████
█████████ · ████ · █████████████████████
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
██████████████████████████████████████████████
██████████ · ████ · ███████████████████████████████
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

████████████████████████████████████████████
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Restricted Access