The key insight: when a neural network learns to represent a 3-D shape as a Signed Distance Field, its hidden-layer activations naturally encode part-level structure — without any supervision. Probe the DeepSDF decoder's hidden activations at each mesh vertex, cluster them into semantic surface types, split by mesh connectivity, and a classical balustrade railing falls apart into 1 top rail + 8 individual balusters + 1 base — correct instance segmentation, no labels. The segmentation works. The follow-on goal — train one small SDF per part and boolean-union them into sharp junctions — does not, and the project status document is candid about exactly why: interior-volume part-ownership is ambiguous.
DeepSDF trains a single MLP to represent a 3-D shape as a continuous
signed distance function. The network is never given part labels —
it only ever sees (point, sdf) pairs. The hypothesis
this project tests: the network's hidden activations, even
though it was trained on a single global SDF, encode the part
structure of the shape implicitly. A baluster's surface and a top
rail's surface produce different activation patterns because the
network has learned different local geometry to represent them — so
clustering the activations should recover the parts, for free.
The downstream motivation is the "3-D modelling inside neural networks" idea that runs through the whole thesis line: if a shape can be decomposed into parts, each part can be its own small neural SDF, and a boolean union of per-part SDFs gives sharp junctions at part boundaries — the kind of crisp edges a single global SDF smooths away. Unsupervised part discovery is the first step toward part-structured 3-D representation.
DeepSDF V2 — high-fidelity representation. A
scaled-up DeepSDF (256 hidden, 8 layers, skip connection at layer 4,
~480 K params) successfully represents complex real-world meshes
with thin features. Trained on a classical balustrade railing — 8
balusters, ornate profiles, thin gaps between parts — at resolution
256 it captures every detail that the V1 network (128 hidden, 4
layers, 46 K params) at resolution 128 lost entirely. Key config:
latent_dim=128, hidden=256, layers=8, skip_at=4, epochs=3000,
multi-band surface sampling with SDF clamp [−0.1, 0.1].
Activation probing — parts emerge without supervision. Hidden-layer activations of the DeepSDF decoder naturally encode part-level structure. Validated on 9 analytical CSG shapes with an average Adjusted Rand Index (ARI) of 0.559 and a peak of 0.996.
| Shape | Parts | Best layer | ARI | Notes |
|---|---|---|---|---|
| Lollipop | 2 | all layers | 0.996 | Near-perfect: sphere vs cylinder |
| Mushroom | 2 | layer 2 | 0.960 | Cap vs stem cleanly separated |
| Snowman | 2 | layer 0 | 0.634 | Positional distinction between spheres |
| L-shape | 2 | layer 0 | 0.606 | Two boxes — all other methods failed this |
| Chair | 6 | layer 0 | 0.371 | Legs hard to individuate |
| Head (real mesh) | 5 | layer 0 | n/a | Geometric regions: forehead, side, face, nose, neck |
Instance segmentation — semantic + connected components. The winning approach combines two steps. Step A: cluster activations into 3–5 semantic surface types with k-means on pure activation features (no spatial coordinates) — this identifies "rail surface", "baluster surface", "base surface", and the top rail stays one continuous piece. Step B: within each semantic type, find connected components on the mesh adjacency graph by BFS — this splits individual instances, because each baluster is disconnected from its neighbours by air gaps. Result on the railing: 1 top rail + 8 individual balusters + 1 base = 10 parts, each correctly isolated.
Activation classifier — 93.2 % accuracy on the surface. For each part, the mean activation vector (centroid) is computed from its segmented surface vertices. Any new query point is classified by forward-passing it through the original DeepSDF, reading its activations, and finding the nearest centroid. This hits 93.2 % accuracy on surface vertices — but degrades in the interior volume, which is the crux of §03.
The original DeepSDF has ONE SDF for the whole shape.
Any point inside is "inside" — it does not know which part a point belongs to.
Segmentation is a surface concept — it works because mesh vertices live on the surface and the network's activations there are meaningful. Per-part reconstruction needs an interior-volume concept: to train a watertight SDF for one baluster, you must know which 3-D points are inside that baluster and not its neighbour. The original DeepSDF cannot answer that — it was only trained near the zero-level set. Every masking attempt fails on this same rock.
The goal: train a small MLP per part, union them via min(),
get sharp edges at the boundaries. The status document is candid —
this is "the core unsolved problem". The fundamental obstacle is
that interior-volume part-ownership is ambiguous, and four distinct
masking strategies all break on it.
| Attempt | Method | Why it failed |
|---|---|---|
| 1 — watertight closing | Close each open part mesh via fill-holes / voxelize+fill+MC / convex hull | Voxelize wrapped entire regions; convex hull lost all concavities. Each part claimed nearly the whole volume — union had 16 M / 16 M voxels inside, marching cubes produced garbage |
| 2 — distance proximity mask | Use the original SDF if a query point is within a proximity radius of the part, else force positive | Radius impossible to tune — too large and parts leak into neighbours, too small and parts get holes. Nearest-surface-vertex doesn't determine interior ownership: a point in the air gap between balusters is near a baluster vertex but inside no baluster |
| 3 — segmentation-label mask | Find the nearest full-mesh vertex, check its part label; same part → real SDF, else force positive | Same flaw — nearest-vertex is a surface concept. Interior points get assigned to whichever surface is closest, not the correct enclosing part. 30–50 % inside ratios for small parts (should be 5–10 %), blob outputs |
| 4 — activation volume classifier | Classify each query point by its DeepSDF activation vector — the 93.2 %-on-surface classifier from §01 | Activations are unreliable off-surface — the network was never trained to produce meaningful activations in the interior. Noisy interior classification, 30–50 % inside ratios persist, union has artefacts though the railing structure is dimly visible |
The pattern across all four: segmentation is a surface
operation, reconstruction needs a volume operation, and the trained
DeepSDF only ever learned the surface. The thickened-shell
workaround — defining each part's SDF as
distance_to_part_surface − thickness/2 to turn open
sheets into thin watertight solids — produces clean individual
parts, but the union still inherits the interior-ownership ambiguity
wherever parts are close.
Watch the two-step segmentation. The left pane is a stylised railing. Toggle between the activation k-means step (semantic surface types — the rail is one colour, all balusters another) and the connected-components step (each baluster split into its own instance). The right pane shows the resulting part count.
White paper · activation-space part discovery · the WHAT/WHERE insight · the ARI validation · the unsolved interior-ownership problem · the DINO-extension design