Technical Report · cs.CV · cs.GR · cs.LG · Apr 2026
Documentation → ← Back to White Papers
The Hypernet → Shape Pipeline: Per-Shape SIRENs, Per-Layer Hypernetworks, and a Weight-Space Autoencoder for Image-to-3-D — the Predecessor the Twelve-Phase Archive Diagnosed
Aaditya Jain
Hypernetworks · Weight-Space Learning · Thesis Research, Unpublished Preprint
Submitted: April 2026 Subject: cs.CV · cs.GR · cs.LG Keywords: hypernetworks, SIREN, weight-space learning, weight-space autoencoder, per-layer hypernetwork, warm-start, image-to-3-D
Abstract
We document the hypernet → shape pipeline — the standalone weight-space image-to-3-D system that the twelve-phase Hypernet → DeepSDF archive grew out of, built on, and ultimately diagnosed and abandoned. The pipeline encodes a 3-D object as: 24 rendered views → 24 per-view image-SIRENs → one per-object hypernetwork; then hypernetwork weights → a tiny mapper → a 128-dimensional latent → a weight-space autoencoder decoder → shape-SIREN weights → SDF → mesh. The central design decision documented here is that a weight-space autoencoder rescues mesh quality versus predicting all 264 K shape-SIREN weights directly with a mapper alone — direct prediction is topologically fragile because a small aggregate weight error lands adversarially on the dimensions controlling the SDF zero-crossing. The pipeline established two findings that the larger archive then both built on and unwound: (1) per-layer hypernetworks preserve genus-1 topology when reconstructing per-shape MLP weights, while monolithic hypernetworks destroy topology even at weight-space MSE ~10⁻⁷; and (2) warm-starting all shape-SIRENs from a single anchor is necessary for coherent weight-space interpolation. Both findings are double-edged: the topology lesson becomes the thesis-line refrain that aggregate weight MSE is a poor proxy for mesh quality, and the warm-start requirement becomes the "warm-start dominance problem" that, at the 976-shape scale of image-conditioned diffusion, is fatal. This paper documents the pipeline as the thing being post-mortemed — readable on its own, but most useful as the original of which the Hypernet → DeepSDF archive is the systematic study. Keywords: hypernetworks, SIREN, weight-space autoencoder, per-layer hypernetwork, warm-start, image-to-3-D.
1. Introduction

This is where the "a 3-D shape is the weights of a neural network" hypothesis first becomes a running pipeline. Each object is rendered from 24 viewpoints; each view is overfitted into its own image-SIREN; a hypernetwork is trained per object to map across those 24 image-networks; and the hypernet's weights are routed through a mapper and an autoencoder into the weights of a shape-SIREN whose SDF, marching-cubed, is the reconstructed mesh. The bet is that weight-space relationships between image-networks and geometry-networks can be learned.

The honest framing: this is the prior work — referenced in the Hypernet → DeepSDF thesis as "our prior work / Topic 03". It is on the timeline because the twelve-phase archive only makes sense if you can see the original it post-mortems. This paper documents the pipeline, its two load-bearing findings, and the precise way each finding turns out to be double-edged.

2. The Pipeline
2.1 Per-object stages

The repository is a numbered pipeline — 01_watertight.py through 80_train_shape_sirens.py. The stages: download Objaverse objects; watertight-convert; render 24 views per object; train an image-SIREN per view (24 per object); train a per-object hypernetwork (~17.9 M parameters) across the 24 image-SIRENs; train the shape-SIRENs (~264 K weights each). Configuration lives in configs/ (CFG.data, CFG.shape_siren, …); core modules — siren, hypernet, render, watertight — in src/. Setting CFG.data.num_objects = 100 and running the numbered stages in order reproduces the N = 100 run.

2.2 Two routes from hypernet to shape

The design decisions live in the step from hypernet weights to shape-SIREN weights, where the repository implements two competing routes.

Table 1 — Two routes from hypernet weights to shape-SIREN weights.
RouteScriptBehaviour
Direct mapper (baseline)hypernet_to_shape_mapper.pyA mapper predicts all 264 K shape-SIREN weights directly from the hypernet weights. Topologically fragile
Latent autoencoder (current)autoencoder_pipeline_n100_mlp.pyA weight-space autoencoder compresses to a 128-dim latent first; a tiny mapper targets the latent; the AE decoder reconstructs the weights
Scaling orchestratorscale_to_n100.pyOrchestrates the scaling experiment to N = 100 objects
OOD testood_test_full.pyOut-of-distribution generalisation test on a held-out shape

The autoencoder route is the current pipeline. Predicting 264 K shape-SIREN weights directly is topologically fragile: the mapper minimises aggregate weight MSE, but a small aggregate error lands adversarially on the handful of dimensions that control the SDF zero-crossing, and the mesh breaks. The weight-space autoencoder compresses the weights to a 128-dim latent the mapper can actually hit reliably; the AE decoder reconstructs the full weight vector from that latent. It rescues mesh quality.

3. Finding 1 — Per-Layer vs Monolithic Hypernetworks

A per-layer hypernetwork generates the weights of each layer of the shape-SIREN with a dedicated head. A monolithic hypernetwork emits the whole weight vector at once. The finding: the per-layer hypernetwork preserves genus-1 topology in the reconstructed mesh; the monolithic hypernetwork destroys topology even when the weight-space MSE is driven down to ~10⁻⁷.

This is the first appearance of a refrain that runs through the entire thesis line: aggregate weight MSE is a poor proxy for mesh quality. A monolithic hypernetwork can be numerically excellent and geometrically broken, because the weight dimensions are not equally important — some control the SDF zero-crossing and some do not, and an objective that treats them uniformly will spend its error budget destroying topology. The per-layer structure helps because it gives the hypernetwork a more aligned objective: each head is responsible for a coherent slice of the weight vector. The Hypernet → DeepSDF archive later states this lesson as a hard rule and re-derives it at scale in its phase-10 autoencoder result.

4. Finding 2 — The Warm-Start Requirement

If every shape-SIREN is trained independently, the resulting weight vectors live in arbitrary permutation neighbourhoods — two networks computing nearly the same function can have arbitrarily different weight vectors, because of the permutation and sign symmetries of an MLP. The line segment between two such weight vectors decodes to garbage. Warm-starting all shape-SIRENs from a single shared anchor keeps every per-shape network in the same permutation neighbourhood, so weight-space interpolation stays on the shape manifold. This pipeline depends on the property — it is what makes the latent space the autoencoder learns coherent.

The double edge: the Hypernet → DeepSDF archive's central diagnostic, the warm-start dominance problem, is the discovery that the same warm-start property — necessary here for interpolation — is fatal at the scale of image-conditioned diffusion. Warm-starting concentrates the entire weight distribution into a thin shell (mean pairwise cosine ≥ 0.96), and per-shape signal becomes a thin residual that a diffusion model cannot extract conditionally. The property this pipeline relies on is the property the successor identifies as the thing to escape.

5. Relationship to the Twelve-Phase Archive

This pipeline is not superseded so much as diagnosed. The Hypernet → DeepSDF archive is the systematic post-mortem of exactly this approach — its phase 1 is this pipeline's per-shape SIRENs, its phases 6–9 are this pipeline's hypernet-mapper line extended to image conditioning, its phase 10 is this pipeline's weight-space autoencoder. The archive's contribution is the precise statement of why this weight-space approach does not extend to image-conditioned generation at the 976-shape scale, and the DeepSDF pivot — a shared decoder with a constructed 64-dim latent rather than an extracted one — that finally works.

Reading this paper before the archive makes the archive read as what it is: not a fresh project, but the rigorous unwinding of the pipeline documented here. The two findings of §3–4 are exactly the two threads the archive pulls on.

6. Conclusion

The hypernet → shape pipeline is the first concrete form of the weight-space image-to-3-D hypothesis: 24 image-SIRENs → per-object hypernetwork → mapper → 128-dim latent → weight-space autoencoder → shape-SIREN → SDF → mesh. It established that per-layer hypernetworks preserve topology where monolithic ones do not, that aggregate weight MSE is a poor proxy for mesh quality, and that warm-starting from a shared anchor is necessary for coherent weight-space interpolation. Each of these is a real finding — and each is the seed of a problem that the twelve-phase Hypernet → DeepSDF archive then diagnoses in full and pivots away from. This paper is the original; the archive is the post-mortem.

References
[1] Ha, D., Dai, A., Le, Q. V. "HyperNetworks." ICLR, 2017.
[2] Sitzmann, V. et al. "Implicit Neural Representations with Periodic Activation Functions (SIREN)." NeurIPS, 2020.
[3] Park, J. J. et al. "DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation." CVPR, 2019.
[4] Erkoç, Z. et al. "HyperDiffusion: Generating Implicit Neural Fields with Weight-Space Diffusion." ICCV, 2023.
[5] Deitke, M. et al. "Objaverse: A Universe of Annotated 3D Objects." CVPR, 2023.
[6] Jain, A. "Hypernet → DeepSDF: Image-to-3-D Research Archive." Thesis research, May 2026. /whitepaper/hypernet-deepsdf
[7] Jain, A. "Activation-Space SDF Part Discovery." Thesis research, Apr 2026. /whitepaper/activation-sdf
[8] Code: github.com/BOB-THE-BUILDER-in/Hypernetwork · Checkpoints: huggingface.co/datasets/bobthebuilderinternational/hypernet-checkpoints