The standalone hypernet-to-shape pipeline — the predecessor that the twelve-phase Hypernet → DeepSDF archive (Topic 41) grew out of and ultimately abandoned. The chain: 24 images of an object → 24 image-SIRENs → one hypernetwork per object; then the hypernet weights → a tiny mapper → a 128-dim latent → an autoencoder decoder → shape-SIREN weights → SDF → mesh. The load-bearing design choice documented here: a weight-space autoencoder rescues mesh quality versus predicting all 264 K shape-SIREN weights directly — the move that the larger Topic-41 archive later diagnoses, and pivots away from, in detail.
This is where the "a 3-D shape is the weights of a neural network" hypothesis first becomes a running pipeline. Each object is rendered from 24 viewpoints; each view is overfitted into its own image-SIREN; a hypernetwork is trained per object to map across those 24 image-networks; and the hypernet's weights are then routed through a mapper and an autoencoder into the weights of a shape-SIREN whose SDF, marching-cubed, is the reconstructed mesh. The whole thing is a bet that weight-space relationships between image-networks and geometry-networks can be learned.
The honest framing: this pipeline is the prior work — referenced in the Topic-41 thesis as "our prior work / Topic 03". It established two things that the bigger archive then built on. One, that per-layer hypernetworks can preserve genus-1 topology when reconstructing per-shape MLP weights, while monolithic hypernets destroy topology even at MSE ~10⁻⁷. Two, that warm-starting all shape-SIRENs from a single anchor is necessary for coherent weight-space interpolation. Both findings turn out to be double-edged, as Topic 41 documents.
The repository is a numbered pipeline — 01_watertight.py
through 80_train_shape_sirens.py — plus experiment
scripts that branch off it. The stages, in order: download Objaverse
objects; watertight-convert; render 24 views per object; train an
image-SIREN per view; train a per-object hypernetwork across the 24
image-SIRENs; train the shape-SIRENs. The two competing routes from
hypernet weights to shape-SIREN weights are where the design
decisions live.
| Route | Script | What it does |
|---|---|---|
| Direct mapper (baseline) | hypernet_to_shape_mapper.py | A mapper predicts all 264 K shape-SIREN weights directly from the hypernet weights. Topologically fragile — small weight errors destroy mesh structure |
| Latent autoencoder (current) | autoencoder_pipeline_n100_mlp.py | A weight-space autoencoder compresses to a 128-dim latent first; a tiny mapper targets the latent; the AE decoder reconstructs the weights. Rescues mesh quality |
| Scaling orchestrator | scale_to_n100.py | Orchestrates the scaling experiment to N = 100 objects |
| OOD test | ood_test_full.py | Out-of-distribution generalisation test on a held-out shape |
Configuration lives in configs/ (CFG.data,
CFG.shape_siren, …); core modules — siren, hypernet,
render, watertight — in src/. Set
CFG.data.num_objects = 100 and run
00_download_objaverse.py … 80_train_shape_sirens.py
in order to reproduce the N = 100 run.
Don't predict 264K weights directly. Compress to 128 dims first.
The weight-space autoencoder is the rescue — and, later, the trap.
Predicting all 264 K shape-SIREN weights directly with a mapper is topologically fragile: a small aggregate weight error lands adversarially on the dimensions that control the SDF zero-crossing, and the mesh breaks. The weight-space autoencoder compresses the weights to a 128-dim latent the mapper can actually hit. It works here — and Topic 41 then shows, at larger scale, exactly where and why the autoencoder rescue itself fails: numerical reconstruction cosine is not mesh quality.
Per-layer hypernetworks preserve topology; monolithic ones do not. A per-layer hypernetwork — one that generates the weights of each layer of the shape-SIREN with a dedicated head — preserves genus-1 topology in the reconstructed mesh. A monolithic hypernetwork that emits the whole weight vector at once destroys topology even when the weight-space MSE is driven down to ~10⁻⁷. This is the first appearance of the thesis-line refrain that aggregate weight MSE is a poor proxy for mesh quality — a lesson Topic 41 then states as a hard rule.
Warm-starting from a shared anchor is necessary for coherent weight-space interpolation. If every shape-SIREN is trained independently, the resulting weight vectors live in arbitrary permutation neighbourhoods, and the line segment between two of them in weight space decodes to garbage. Warm-starting all shape-SIRENs from a single anchor keeps them in the same neighbourhood, so interpolation stays on the shape manifold. This pipeline depends on that property — and Topic 41's "warm-start dominance problem" is the discovery that the same property, at the scale of image-conditioned diffusion, is fatal: it concentrates the weight distribution into a thin shell where per-shape signal is unrecoverable.
Compare the two routes from hypernet to shape. Toggle between the direct-mapper baseline (predict 264 K weights) and the autoencoder route (compress to 128 dims first). The left pane shows the weight-error landing pattern; the right pane shows the resulting mesh quality. Direct prediction lands error on the topology-critical dimensions; the AE route does not.
White paper · the hypernet-to-shape weight-space pipeline · per-layer vs monolithic hypernets · the autoencoder rescue · the warm-start requirement · how it sets up the Topic-41 post-mortem