Activation-Space SDF Part Discovery — White Paper

Activation-Space SDF Part Discovery: Self-Supervised 3-D Segmentation by Probing a DeepSDF Decoder's Hidden Activations, and the Interior-Volume-Ownership Problem That Blocks Per-Part Reconstruction

Aaditya Jain

ad_jain@icloud.com · orcid.org/0009-0005-5534-5641

Neural Implicit Representations · 3-D Part Segmentation · Thesis Research, Unpublished Preprint

Submitted: April 2026 Subject: cs.CV · cs.GR Keywords: DeepSDF, activation probing, unsupervised 3-D segmentation, instance segmentation, signed distance fields, part discovery, connected components

Abstract

We report a self-supervised 3-D shape segmentation method that recovers part-level structure by probing the hidden-layer activations of a learned DeepSDF decoder — with no part-label supervision at any point. The observation: a DeepSDF MLP trained on a single global signed distance function, given only (point, sdf) pairs, develops hidden activations that implicitly encode part structure, because the network learns distinct local geometry to represent distinct parts. We validate the observation on 9 analytical CSG shapes (average Adjusted Rand Index 0.559, peak 0.996 on a lollipop) and demonstrate a two-step instance-segmentation pipeline that combines activation k-means (semantic surface types) with mesh-connectivity connected components (instances), correctly decomposing a classical balustrade railing into 1 top rail + 8 individual balusters + 1 base. We also report the failure half honestly: the downstream goal — train a small SDF per part, boolean-union them into sharp junctions — does not work, and four distinct masking strategies all break on the same rock, the interior-volume-ownership problem: the original DeepSDF has one SDF for the whole shape, every point inside is "inside", and the network has no notion of which part a 3-D point belongs to because it was only ever trained near the zero-level set. The contribution is the validated segmentation method, the WHAT/WHERE decomposition principle (activations encode surface type, mesh topology encodes connectivity), the 93.2 %-on-surface activation classifier, and a precise statement of the open problem plus the design of a DINO-self-distillation extension intended to address it. Keywords: activation probing, self-supervised 3-D segmentation, DeepSDF, instance segmentation, interior-volume ownership.

1. Introduction

DeepSDF trains a single MLP to represent a 3-D shape as a continuous signed distance function. The network is never given part labels — it only sees (point, sdf) pairs. The hypothesis of this work: the network's hidden activations, even though it was trained on a single global SDF, encode the part structure of the shape implicitly. A baluster's surface and a top rail's surface produce different activation patterns because the network has learned different local geometry to represent them — so clustering the activations should recover the parts, unsupervised.

The downstream motivation is the "3-D modelling inside neural networks" idea that runs through the thesis line: if a shape can be decomposed into parts, each part can be its own small neural SDF, and a boolean union of per-part SDFs gives sharp junctions at part boundaries — the crisp edges a single global SDF smooths away. This paper reports both halves honestly: the segmentation works (§3), the per-part reconstruction does not (§4), and the failure has a single precise cause.

2. DeepSDF V2 — High-Fidelity Representation

The probing only works if the base SDF representation is faithful. A scaled-up DeepSDF (256 hidden units, 8 layers, skip connection at layer 4, ~480 K parameters) is trained on each input mesh. On a classical balustrade railing — 8 balusters, ornate profiles, thin gaps between parts — at marching-cubes resolution 256 it captures every detail that the V1 network (128 hidden, 4 layers, 46 K parameters) at resolution 128 lost entirely. Key configuration: latent_dim = 128, hidden = 256, layers = 8, skip_at = 4, epochs = 3000, with multi-band surface sampling and SDF clamping to [−0.1, 0.1]. The thin-feature fidelity is the prerequisite — without it, activation probing would be reading a network that never resolved the parts in the first place.

3. Activation Probing and Instance Segmentation — What Works

3.1 Parts emerge without supervision

Hidden-layer activations of the trained DeepSDF decoder are extracted at each mesh vertex. Validated on 9 analytical CSG shapes (snowman, lollipop, barbell, mushroom, tower, table, T-pipe, L-shape, chair), clustering the activations recovers part structure with an average Adjusted Rand Index of 0.559 and a peak of 0.996.

Table 1 — Activation-probing segmentation on analytical CSG shapes.
Shape	Parts	Best layer	ARI	Notes
Lollipop	2	all layers	0.996	Near-perfect — sphere vs cylinder
Mushroom	2	layer 2	0.960	Cap vs stem cleanly separated
Snowman	2	layer 0	0.634	Positional distinction between the two spheres
L-shape	2	layer 0	0.606	Two boxes — every other method failed this case
Chair	6	layer 0	0.371	Legs are hard to individuate
Head (real mesh)	5	layer 0	n/a	Geometric regions — forehead, side, face, nose, neck

Different shapes find their cleanest segmentation at different layers — lollipop at all layers, mushroom at layer 2, most others at layer 0 — which is itself informative: shallow layers carry positional / coarse-geometry distinctions, deeper layers carry surface-type distinctions, and which one is diagnostic depends on what distinguishes the parts of a given shape.

3.2 The two-step pipeline — WHAT + WHERE

The winning instance-segmentation approach combines two steps. Step A — semantic types. Cluster the activation features (pure neural, no spatial coordinates) into 3–5 types via k-means. This identifies surface types — "rail surface", "baluster surface", "base surface" — and the top rail correctly stays one continuous piece. Step B — instances. Within each semantic type, find connected components on the mesh adjacency graph via BFS. This splits individual instances: each baluster becomes its own part because the balusters are disconnected from one another by air gaps, even though they share a semantic type. Result on the railing: 1 top rail + 8 individual balusters + 1 base = 10 parts, each correctly isolated — the correct segmentation for the downstream goal of training per-part SDFs.

The principle: activations encode WHAT (surface type), mesh topology encodes WHERE (connectivity). Neither alone works — activations give surface-type bands, topology alone gives one undifferentiated blob. Spatial k-means (concatenating XYZ to the activation features, weights 0.3 / 0.5 / 0.7 tested) consistently fails, because k-means can only draw straight Voronoi boundaries in the combined feature-plus-spatial space — it splits large continuous parts (the top rail, the base) in half along an axis while failing to separate individual balusters. The topological split that connected components provides cannot be approximated by a metric cut.

3.3 The activation classifier — 93.2 % on the surface

A volume classifier is built from the segmentation: for each part, compute the mean activation vector (centroid) from its segmented surface vertices; classify any new query point by forward-passing it through the original DeepSDF, reading its activations, and finding the nearest centroid. This achieves 93.2 % accuracy on surface vertices. The "on the surface" qualifier is load-bearing — §4 is the story of what happens when the same classifier is asked about interior-volume points.

4. Per-Part Reconstruction — What Does Not Work

The goal: train a small MLP per part, union them via min(), get sharp edges at the boundaries. The fundamental obstacle, stated precisely: the original DeepSDF has one SDF for the entire shape — any point inside the shape is "inside", and the network does not know which part a point belongs to. Segmentation is a surface concept; per-part reconstruction needs an interior-volume concept; the trained DeepSDF only ever learned the surface. Four masking strategies were attempted; all four break on this.

Table 2 — Four per-part reconstruction attempts, all failing on interior-volume ownership.
Attempt	Method	Failure
1 — watertight closing	Close each open part mesh via fill-holes, voxelize+fill+marching-cubes, or convex hull	Voxelise wrapped entire regions; convex hull lost all concavities. Each part claimed nearly the whole volume — the union had 16 M / 16 M voxels inside, marching cubes produced garbage
2 — distance proximity mask	Use the original SDF if a query point is within a proximity radius of the part, else force the SDF positive	The radius is impossible to tune — too large and parts leak into neighbours, too small and parts develop holes. Nearest-surface-vertex does not determine interior ownership: a point in the air gap between two balusters is near a baluster vertex but inside no baluster
3 — segmentation-label mask	Find the nearest full-mesh vertex, read its part label; same part → real SDF, different part → force positive	Same flaw — nearest-vertex is a surface concept. Interior points are assigned to whichever surface is closest, not the correct enclosing part. Inside-ratios of 30–50 % for small parts (should be 5–10 %); blob outputs
4 — activation volume classifier	Classify each query point by its DeepSDF activation vector — the 93.2 %-on-surface classifier from §3.3	Activations are unreliable off-surface. The network was never trained to produce meaningful activations in the interior volume, only near the zero-level set. Noisy interior classification, 30–50 % inside-ratios persist, the union has artefacts though the railing structure is dimly visible

The thickened-shell workaround — defining each part's SDF as distance_to_part_surface − thickness/2, turning each open surface sheet into a thin watertight solid — does produce clean individual parts. But the union still inherits the interior-ownership ambiguity wherever parts are close: the boundary between a baluster and the rail it joins is exactly the region where "which part owns this volume" is undefined.

5. Discussion and the DINO Extension

The diagnostic is clean and it generalises. Any method that reads structure from a network trained on a global SDF inherits the surface-vs-volume gap. The trained DeepSDF's activations are meaningful exactly where it was supervised — on the zero-level set — and meaningless in the interior, because the interior was never a region the loss cared about. Surface segmentation works precisely because mesh vertices live on the surface; volume reconstruction fails precisely because it needs the interior the network never modelled.

Two routes out are scoped. The first is a volume-supervised re-training: train the DeepSDF (or a per-part variant) with a loss that cares about interior-point part-ownership, not just surface SDF — this gives the network a reason to develop meaningful interior activations. The second, designed in the project's DINO-extension document, is a DINO self-distillation approach: borrow the self-distillation training signal that gives DINO its part-aware feature structure and apply it to the SDF decoder, so part-awareness is trained in rather than probed out. Neither is built yet; the per-part reconstruction problem is the open core of the project.

6. Conclusion

Activation probing of a DeepSDF decoder recovers part structure with no supervision — average ARI 0.559 across 9 analytical shapes, and a correct 10-part instance segmentation of a real balustrade railing via the WHAT/WHERE two-step pipeline. The follow-on goal of per-part reconstruction with sharp boolean-union junctions does not work, and the failure has a single precise cause: a DeepSDF trained on a global SDF has no notion of interior-volume part-ownership. The contribution is the validated segmentation method, the WHAT/WHERE principle, and an honest, precise statement of the open problem — the kind of negative result that is worth more than a vague success, because the next person knows exactly which rock to aim at.

References

[1] Park, J. J. et al. "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation." CVPR, 2019.

[2] Caron, M. et al. "Emerging Properties in Self-Supervised Vision Transformers (DINO)." ICCV, 2021.

[3] Oquab, M. et al. "DINOv2: Learning Robust Visual Features without Supervision." 2023.

[4] Hubert, L., Arabie, P. "Comparing Partitions (Adjusted Rand Index)." Journal of Classification, 1985.

[5] Lorensen, W., Cline, H. "Marching Cubes: A High Resolution 3D Surface Construction Algorithm." SIGGRAPH, 1987.

[6] Jain, A. "Hypernet → DeepSDF: Image-to-3-D Research Archive." Thesis research, May 2026. /whitepaper/hypernet-deepsdf

[7] Jain, A. "SDF Research and Experiments." Thesis research, Feb 2025. /whitepaper/sdf-research

[8] Code: github.com/BOB-THE-BUILDER-in/activation-sdf-segmentation · Status & DINO-extension design docs in the repository.