(point, sdf) pairs, develops hidden activations that implicitly encode part structure, because the network learns distinct local geometry to represent distinct parts. We validate the observation on 9 analytical CSG shapes (average Adjusted Rand Index 0.559, peak 0.996 on a lollipop) and demonstrate a two-step instance-segmentation pipeline that combines activation k-means (semantic surface types) with mesh-connectivity connected components (instances), correctly decomposing a classical balustrade railing into 1 top rail + 8 individual balusters + 1 base. We also report the failure half honestly: the downstream goal — train a small SDF per part, boolean-union them into sharp junctions — does not work, and four distinct masking strategies all break on the same rock, the interior-volume-ownership problem: the original DeepSDF has one SDF for the whole shape, every point inside is "inside", and the network has no notion of which part a 3-D point belongs to because it was only ever trained near the zero-level set. The contribution is the validated segmentation method, the WHAT/WHERE decomposition principle (activations encode surface type, mesh topology encodes connectivity), the 93.2 %-on-surface activation classifier, and a precise statement of the open problem plus the design of a DINO-self-distillation extension intended to address it. Keywords: activation probing, self-supervised 3-D segmentation, DeepSDF, instance segmentation, interior-volume ownership.
DeepSDF trains a single MLP to represent a 3-D shape as a continuous signed distance function. The network is never given part labels — it only sees (point, sdf) pairs. The hypothesis of this work: the network's hidden activations, even though it was trained on a single global SDF, encode the part structure of the shape implicitly. A baluster's surface and a top rail's surface produce different activation patterns because the network has learned different local geometry to represent them — so clustering the activations should recover the parts, unsupervised.
The downstream motivation is the "3-D modelling inside neural networks" idea that runs through the thesis line: if a shape can be decomposed into parts, each part can be its own small neural SDF, and a boolean union of per-part SDFs gives sharp junctions at part boundaries — the crisp edges a single global SDF smooths away. This paper reports both halves honestly: the segmentation works (§3), the per-part reconstruction does not (§4), and the failure has a single precise cause.
The probing only works if the base SDF representation is faithful. A scaled-up DeepSDF (256 hidden units, 8 layers, skip connection at layer 4, ~480 K parameters) is trained on each input mesh. On a classical balustrade railing — 8 balusters, ornate profiles, thin gaps between parts — at marching-cubes resolution 256 it captures every detail that the V1 network (128 hidden, 4 layers, 46 K parameters) at resolution 128 lost entirely. Key configuration: latent_dim = 128, hidden = 256, layers = 8, skip_at = 4, epochs = 3000, with multi-band surface sampling and SDF clamping to [−0.1, 0.1]. The thin-feature fidelity is the prerequisite — without it, activation probing would be reading a network that never resolved the parts in the first place.
Hidden-layer activations of the trained DeepSDF decoder are extracted at each mesh vertex. Validated on 9 analytical CSG shapes (snowman, lollipop, barbell, mushroom, tower, table, T-pipe, L-shape, chair), clustering the activations recovers part structure with an average Adjusted Rand Index of 0.559 and a peak of 0.996.
| Shape | Parts | Best layer | ARI | Notes |
|---|---|---|---|---|
| Lollipop | 2 | all layers | 0.996 | Near-perfect — sphere vs cylinder |
| Mushroom | 2 | layer 2 | 0.960 | Cap vs stem cleanly separated |
| Snowman | 2 | layer 0 | 0.634 | Positional distinction between the two spheres |
| L-shape | 2 | layer 0 | 0.606 | Two boxes — every other method failed this case |
| Chair | 6 | layer 0 | 0.371 | Legs are hard to individuate |
| Head (real mesh) | 5 | layer 0 | n/a | Geometric regions — forehead, side, face, nose, neck |
Different shapes find their cleanest segmentation at different layers — lollipop at all layers, mushroom at layer 2, most others at layer 0 — which is itself informative: shallow layers carry positional / coarse-geometry distinctions, deeper layers carry surface-type distinctions, and which one is diagnostic depends on what distinguishes the parts of a given shape.
The winning instance-segmentation approach combines two steps. Step A — semantic types. Cluster the activation features (pure neural, no spatial coordinates) into 3–5 types via k-means. This identifies surface types — "rail surface", "baluster surface", "base surface" — and the top rail correctly stays one continuous piece. Step B — instances. Within each semantic type, find connected components on the mesh adjacency graph via BFS. This splits individual instances: each baluster becomes its own part because the balusters are disconnected from one another by air gaps, even though they share a semantic type. Result on the railing: 1 top rail + 8 individual balusters + 1 base = 10 parts, each correctly isolated — the correct segmentation for the downstream goal of training per-part SDFs.
The principle: activations encode WHAT (surface type), mesh topology encodes WHERE (connectivity). Neither alone works — activations give surface-type bands, topology alone gives one undifferentiated blob. Spatial k-means (concatenating XYZ to the activation features, weights 0.3 / 0.5 / 0.7 tested) consistently fails, because k-means can only draw straight Voronoi boundaries in the combined feature-plus-spatial space — it splits large continuous parts (the top rail, the base) in half along an axis while failing to separate individual balusters. The topological split that connected components provides cannot be approximated by a metric cut.
A volume classifier is built from the segmentation: for each part, compute the mean activation vector (centroid) from its segmented surface vertices; classify any new query point by forward-passing it through the original DeepSDF, reading its activations, and finding the nearest centroid. This achieves 93.2 % accuracy on surface vertices. The "on the surface" qualifier is load-bearing — §4 is the story of what happens when the same classifier is asked about interior-volume points.
The goal: train a small MLP per part, union them via min(), get sharp edges at the boundaries. The fundamental obstacle, stated precisely: the original DeepSDF has one SDF for the entire shape — any point inside the shape is "inside", and the network does not know which part a point belongs to. Segmentation is a surface concept; per-part reconstruction needs an interior-volume concept; the trained DeepSDF only ever learned the surface. Four masking strategies were attempted; all four break on this.
| Attempt | Method | Failure |
|---|---|---|
| 1 — watertight closing | Close each open part mesh via fill-holes, voxelize+fill+marching-cubes, or convex hull | Voxelise wrapped entire regions; convex hull lost all concavities. Each part claimed nearly the whole volume — the union had 16 M / 16 M voxels inside, marching cubes produced garbage |
| 2 — distance proximity mask | Use the original SDF if a query point is within a proximity radius of the part, else force the SDF positive | The radius is impossible to tune — too large and parts leak into neighbours, too small and parts develop holes. Nearest-surface-vertex does not determine interior ownership: a point in the air gap between two balusters is near a baluster vertex but inside no baluster |
| 3 — segmentation-label mask | Find the nearest full-mesh vertex, read its part label; same part → real SDF, different part → force positive | Same flaw — nearest-vertex is a surface concept. Interior points are assigned to whichever surface is closest, not the correct enclosing part. Inside-ratios of 30–50 % for small parts (should be 5–10 %); blob outputs |
| 4 — activation volume classifier | Classify each query point by its DeepSDF activation vector — the 93.2 %-on-surface classifier from §3.3 | Activations are unreliable off-surface. The network was never trained to produce meaningful activations in the interior volume, only near the zero-level set. Noisy interior classification, 30–50 % inside-ratios persist, the union has artefacts though the railing structure is dimly visible |
The thickened-shell workaround — defining each part's SDF as distance_to_part_surface − thickness/2, turning each open surface sheet into a thin watertight solid — does produce clean individual parts. But the union still inherits the interior-ownership ambiguity wherever parts are close: the boundary between a baluster and the rail it joins is exactly the region where "which part owns this volume" is undefined.
The diagnostic is clean and it generalises. Any method that reads structure from a network trained on a global SDF inherits the surface-vs-volume gap. The trained DeepSDF's activations are meaningful exactly where it was supervised — on the zero-level set — and meaningless in the interior, because the interior was never a region the loss cared about. Surface segmentation works precisely because mesh vertices live on the surface; volume reconstruction fails precisely because it needs the interior the network never modelled.
Two routes out are scoped. The first is a volume-supervised re-training: train the DeepSDF (or a per-part variant) with a loss that cares about interior-point part-ownership, not just surface SDF — this gives the network a reason to develop meaningful interior activations. The second, designed in the project's DINO-extension document, is a DINO self-distillation approach: borrow the self-distillation training signal that gives DINO its part-aware feature structure and apply it to the SDF decoder, so part-awareness is trained in rather than probed out. Neither is built yet; the per-part reconstruction problem is the open core of the project.
Activation probing of a DeepSDF decoder recovers part structure with no supervision — average ARI 0.559 across 9 analytical shapes, and a correct 10-part instance segmentation of a real balustrade railing via the WHAT/WHERE two-step pipeline. The follow-on goal of per-part reconstruction with sharp boolean-union junctions does not work, and the failure has a single precise cause: a DeepSDF trained on a global SDF has no notion of interior-volume part-ownership. The contribution is the validated segmentation method, the WHAT/WHERE principle, and an honest, precise statement of the open problem — the kind of negative result that is worth more than a vague success, because the next person knows exactly which rock to aim at.