This is where the "a 3-D shape is the weights of a neural network" hypothesis first becomes a running pipeline. Each object is rendered from 24 viewpoints; each view is overfitted into its own image-SIREN; a hypernetwork is trained per object to map across those 24 image-networks; and the hypernet's weights are routed through a mapper and an autoencoder into the weights of a shape-SIREN whose SDF, marching-cubed, is the reconstructed mesh. The bet is that weight-space relationships between image-networks and geometry-networks can be learned.
The honest framing: this is the prior work — referenced in the Hypernet → DeepSDF thesis as "our prior work / Topic 03". It is on the timeline because the twelve-phase archive only makes sense if you can see the original it post-mortems. This paper documents the pipeline, its two load-bearing findings, and the precise way each finding turns out to be double-edged.
The repository is a numbered pipeline — 01_watertight.py through 80_train_shape_sirens.py. The stages: download Objaverse objects; watertight-convert; render 24 views per object; train an image-SIREN per view (24 per object); train a per-object hypernetwork (~17.9 M parameters) across the 24 image-SIRENs; train the shape-SIRENs (~264 K weights each). Configuration lives in configs/ (CFG.data, CFG.shape_siren, …); core modules — siren, hypernet, render, watertight — in src/. Setting CFG.data.num_objects = 100 and running the numbered stages in order reproduces the N = 100 run.
The design decisions live in the step from hypernet weights to shape-SIREN weights, where the repository implements two competing routes.
| Route | Script | Behaviour |
|---|---|---|
| Direct mapper (baseline) | hypernet_to_shape_mapper.py | A mapper predicts all 264 K shape-SIREN weights directly from the hypernet weights. Topologically fragile |
| Latent autoencoder (current) | autoencoder_pipeline_n100_mlp.py | A weight-space autoencoder compresses to a 128-dim latent first; a tiny mapper targets the latent; the AE decoder reconstructs the weights |
| Scaling orchestrator | scale_to_n100.py | Orchestrates the scaling experiment to N = 100 objects |
| OOD test | ood_test_full.py | Out-of-distribution generalisation test on a held-out shape |
The autoencoder route is the current pipeline. Predicting 264 K shape-SIREN weights directly is topologically fragile: the mapper minimises aggregate weight MSE, but a small aggregate error lands adversarially on the handful of dimensions that control the SDF zero-crossing, and the mesh breaks. The weight-space autoencoder compresses the weights to a 128-dim latent the mapper can actually hit reliably; the AE decoder reconstructs the full weight vector from that latent. It rescues mesh quality.
A per-layer hypernetwork generates the weights of each layer of the shape-SIREN with a dedicated head. A monolithic hypernetwork emits the whole weight vector at once. The finding: the per-layer hypernetwork preserves genus-1 topology in the reconstructed mesh; the monolithic hypernetwork destroys topology even when the weight-space MSE is driven down to ~10⁻⁷.
This is the first appearance of a refrain that runs through the entire thesis line: aggregate weight MSE is a poor proxy for mesh quality. A monolithic hypernetwork can be numerically excellent and geometrically broken, because the weight dimensions are not equally important — some control the SDF zero-crossing and some do not, and an objective that treats them uniformly will spend its error budget destroying topology. The per-layer structure helps because it gives the hypernetwork a more aligned objective: each head is responsible for a coherent slice of the weight vector. The Hypernet → DeepSDF archive later states this lesson as a hard rule and re-derives it at scale in its phase-10 autoencoder result.
If every shape-SIREN is trained independently, the resulting weight vectors live in arbitrary permutation neighbourhoods — two networks computing nearly the same function can have arbitrarily different weight vectors, because of the permutation and sign symmetries of an MLP. The line segment between two such weight vectors decodes to garbage. Warm-starting all shape-SIRENs from a single shared anchor keeps every per-shape network in the same permutation neighbourhood, so weight-space interpolation stays on the shape manifold. This pipeline depends on the property — it is what makes the latent space the autoencoder learns coherent.
The double edge: the Hypernet → DeepSDF archive's central diagnostic, the warm-start dominance problem, is the discovery that the same warm-start property — necessary here for interpolation — is fatal at the scale of image-conditioned diffusion. Warm-starting concentrates the entire weight distribution into a thin shell (mean pairwise cosine ≥ 0.96), and per-shape signal becomes a thin residual that a diffusion model cannot extract conditionally. The property this pipeline relies on is the property the successor identifies as the thing to escape.
This pipeline is not superseded so much as diagnosed. The Hypernet → DeepSDF archive is the systematic post-mortem of exactly this approach — its phase 1 is this pipeline's per-shape SIRENs, its phases 6–9 are this pipeline's hypernet-mapper line extended to image conditioning, its phase 10 is this pipeline's weight-space autoencoder. The archive's contribution is the precise statement of why this weight-space approach does not extend to image-conditioned generation at the 976-shape scale, and the DeepSDF pivot — a shared decoder with a constructed 64-dim latent rather than an extracted one — that finally works.
Reading this paper before the archive makes the archive read as what it is: not a fresh project, but the rigorous unwinding of the pipeline documented here. The two findings of §3–4 are exactly the two threads the archive pulls on.
The hypernet → shape pipeline is the first concrete form of the weight-space image-to-3-D hypothesis: 24 image-SIRENs → per-object hypernetwork → mapper → 128-dim latent → weight-space autoencoder → shape-SIREN → SDF → mesh. It established that per-layer hypernetworks preserve topology where monolithic ones do not, that aggregate weight MSE is a poor proxy for mesh quality, and that warm-starting from a shared anchor is necessary for coherent weight-space interpolation. Each of these is a real finding — and each is the seed of a problem that the twelve-phase Hypernet → DeepSDF archive then diagnoses in full and pivots away from. This paper is the original; the archive is the post-mortem.