ad_jain@icloud.com · iDorcid.org/0009-0005-5534-5641 · +91 90326 87080
Ref · Index
Title · Abstract · Subject
Filed
WP — 023
From Weight-Space Generation to DeepSDF: A Twelve-Phase Image-to-3-D Research Archive, the Weight-Space Dominance Diagnostic, and an Image-Conditioned Latent-Diffusion Pipeline that Works at the 976-Shape Scale
Twelve-phase image-to-3-D project. The weight-space hypothesis (shapes are decoder weights) fails for image-conditioned generation: warm-started per-shape weights are ≥96% shared anchor, the ≤4% per-shape signal is too thin for diffusion to extract — mode collapse survives every ablation. The autoencoder rescue fails too (cos=0.997 yet broken meshes). The DeepSDF pivot constructs a 64-dim latent rather than extracting one, and works: perfect recall + category-appropriate OOD at 976 shapes. Code, 26GB dataset, live HF Space, 30-page thesis public.
Technical Reportcs.CV · cs.GR · cs.LGAaditya Jain
May 2026
WP — 024
Activation-Space Part Discovery in DeepSDF: Self-Supervised Segmentation by Probing a Trained Implicit Decoder, and an Honest Account of the Per-Part Reconstruction Attempts that Failed
Self-supervised part segmentation with no part labels — by clustering the activations of a trained DeepSDF decoder as it is queried across a shape. On 9 CSG shapes the activation clustering recovers the constructive components (ARI reported per shape). The paper is equally an account of what did not work: four separate attempts to reconstruct individual parts as standalone SDFs, and why each failed.
Technical Reportcs.CV · cs.GR · cs.LGAaditya Jain
Apr 2026
WP — 025
Mini SDF-SRN: A Minimal From-Scratch Reimplementation of Single-View Neural SDF Recovery via a Differentiable Ray-Marching Renderer
A compact reimplementation of SDF-SRN — recovering a neural signed-distance field from single-view 2-D silhouette supervision through a differentiable ray-marching renderer, with no 3-D ground truth. Trained on synthetic primitives (sphere, box, ellipsoid); the report covers the renderer, the training schedule, novel-view quality, and the configuration that made it converge.
Technical Reportcs.CV · cs.GRAaditya Jain
Apr 2026
WP — 026
Flow-SDF: Rectified Flow for Neural SDF Reconstruction from 2-D Supervision, with the Noise-Scaling Diagnostic that Took Cosine Similarity from 0.08 to 0.95
A self-contained image-to-3-D pipeline — image conditioner, rectified-flow velocity network, SDF decoder, differentiable renderer — trained end-to-end on 2-D silhouette and RGB supervision with no 3-D data. The central diagnostic: 128-dim Gaussian noise has norm ~11.3 against SDF latent codes of norm ~0.42, so the flow minimises MSE by shrinking magnitude, not finding direction. Scaling source noise to the target's std fixed cosine similarity 0.08 → 0.95. Deliberately mirrors the Hunyuan3D component stack at small scale as a scaling recipe.
Technical Reportcs.CV · cs.GR · cs.LGAaditya Jain
Apr 2026
WP — 027
The Hypernet → Shape Pipeline: Per-Shape SIRENs, Per-Layer Hypernetworks, and a Weight-Space Autoencoder for Image-to-3-D — the Predecessor the Twelve-Phase Archive Diagnosed
The standalone weight-space pipeline that the Topic-41 archive grew out of. 24 images per object become 24 image-SIRENs, then one per-object hypernetwork; its weights route — via a tiny mapper through a 128-dim weight-space autoencoder — into shape-SIREN weights and a mesh. Two findings carried forward: per-layer hypernetworks preserve genus-1 topology where monolithic ones destroy it even at MSE ~10⁻⁷; and warm-starting all shape-SIRENs from a shared anchor is necessary for coherent weight-space interpolation — the same property the archive later finds fatal at scale.
Technical Reportcs.CV · cs.GR · cs.LGAaditya Jain
Apr 2026
WP — 012
Triplane Mechanics: Rendering, Storage, and the Mesh-Extraction Decoupling — A Decision Note Promoting Triplane to Universal Intermediate
Mechanics of triplane representations (3 axis-aligned planes, bilinear sample, MLP, volume render) and the three-way comparison against Gaussian splats and VDB / FVDB. Triplane wins as the universal intermediate on storage (6–12 MB), editability, and lossless convertibility to either alternative. Mesh extraction is a separate downstream step, not part of rendering.
Technical Notecs.GR · cs.LGAaditya Jain
Jan 2026
WP — 013
Axis-Aligned Distance Fields: An Analysis of UODF (Lu et al., CVPR 2024) and Its Relationship to the Thesis-Line Six-Plane Mesh and Hexplane Representations
UODF stores three axis-aligned unsigned distance fields instead of a single SDF, giving interpolation-free surface extraction. 20–100× quality gain on open surfaces; triplane variant at 30–80 M points/s. The same axis-aligned principle the thesis-line Six-Plane Mesh and Hexplane AE use implicitly — UODF provides the theoretical justification.
Technical Analysiscs.GR · cs.LGAaditya Jain
Jan 2026
WP — 014
Gaussian Splats vs VDB for Single-Image-to-3-D: An Architecture Survey Across Splatter Image, GS-LRM, Triplane-Meets-Gaussian, and Gamba
Four-method survey (Splatter Image, GS-LRM, Triplane-Meets-Gaussian, Gamba) + three-way comparison (G-Splat vs VDB vs triplane). Triplane wins as universal intermediate. Gamba (Mamba over Gaussian-sequence tokens) is published validation of the MambaFlow3D substitution premise for sparse-cube tokens.
Architecture Surveycs.GR · cs.LGAaditya Jain
Dec 2025
WP — 015
Manifold-Aware Diffusion Targets: An Analysis of Li & He's "Back to Basics" x-Prediction Result and Its Extension to 3-D Geometric Representations
Summary of Li & He's x-prediction manifold-hypothesis argument and an extension showing the case is structurally stronger for 3-D geometry than for natural images. SDFs (eikonal constraint), sparse voxels (surface-band coherence), and triplanes all have algebraic constraints that ε-prediction destroys. Decides x / v-prediction for all thesis-line diffusion work.
Technical Analysiscs.LG · cs.CV · cs.GRAaditya Jain
Nov 2025
WP — 016
Diffusion Generator over Houdini Bridge Polylines: An Architecture Design Specification Lifting PGN's Seq2seq Head into Flow-Matching over a Padded-with-Mask Polyline Tensor
Design specification for a flow-matching polyline generator that replaces PGN's autoregressive seq2seq DSL head. Padded fixed-length tensor with explicit length mask; Pure-Mamba backbone; flow-matching velocity prediction; added-embedding attribute conditioning. Three open architectural questions enumerated for the first training run.
Design Specificationcs.LG · cs.GRAaditya Jain
Nov 2025
WP — 017
Image-to-3-D With Parametric Output: A Field-Gap Survey of Five 3-D-Generation Methods (CAPRI-Net, BrepGen, HoLa, SparC3D, TRELLIS) and the Thesis-Line Opportunity
Five-method survey identifies a structural field gap: no published method combines image input with parametric (procedural-CAD) output. The field bifurcates into image-input + raw-geometry-output (SparC3D, TRELLIS) and latent-input + parametric-output (CAPRI-Net, BrepGen, HoLa). The intersection is the thesis-line opportunity, operationalised by PGN and SculptNet.
Surveycs.GR · cs.CV · cs.LGAaditya Jain
Oct 2025
WP — 018
A Character-Level Transformer From First Principles: Implementation, Attention-Pattern Inspection, and the Context-Dependent-Representation Pedagogical Result
From-first-principles pure-NumPy implementation of a character-level transformer block (multi-head causal self-attention, positional encoding, FFN, LayerNorm, residuals). The "hello world" run shows three occurrences of the letter l producing three distinct output vectors — the operational signature of context-dependent representation. Foundation study for the thesis-line transformer use.
Technical Notecs.LGAaditya Jain
Sep 2025
WP — 019
Latent Diffusion Models, Conditional Diffusion, and Latent Consistency Models: A Study Note on the Stable-Diffusion-Class Architecture and Its Implications for 3-D-Diffusion
Study of LDM (Stable Diffusion-class) decomposition: frozen VAE compressor + trained U-Net denoiser + frozen text encoder + scheduler. LCM distillation gives 25× sampling speed-up. Same VAE-U-Net decoupling structures the thesis-line 3-D-diffusion work — triplane encoder is the VAE-analogue, Mamba block is the U-Net-analogue.
Technical Notecs.LG · cs.CVAaditya Jain
May 2025
WP — 020
Real-Time Single-Phone 3-D Reconstruction via Monocular Depth + Neural SLAM + Browser-Rendered TSDF: A System Design Pitch (WTFund, Not Funded) and Its Influence on the Thesis-Line Consumer-Hardware Constraint
System design pitch for real-time 3-D reconstruction (DPT depth → NICER-SLAM pose → TSDF fusion → Rerun.io browser viewer; 10 fps end-to-end target). WTFund ₹20 L grant application declined ("PoC needed"). The pitch's "real-time, on a laptop, no server" framing is the consumer-hardware throughline that shapes every later thesis-line architecture decision.
System Design Pitchcs.CV · cs.ROAaditya Jain
Mar 2025
WP — 021
A Minimal DDPM From First Principles: Single-Image Training on a 16×16 Red Square as Architecture-Literacy Investment for the Thesis-Line Diffusion Work
From-scratch DDPM: 3 M-param MLP denoiser with sinusoidal timestep embedding, 100-step linear β schedule, ε-prediction MSE loss, single-image training on a 16×16 red square. Trains in ~5 min on M2 CPU; reverse-pass reconstruction MSE < 0.01. The 0.8 s sampling cost on the toy seeds the flow-matching switch in the later thesis-line work.
Technical Notecs.LGAaditya Jain
Feb 2025
WP — 022
Signed Distance Fields as a Foundational 3-D Representation: Analytic SDFs, Comparison Against Point Clouds and Meshes, and a Brief Exploration of GAN-Based SDF Generation
Foundational SDF study. Structured comparison (point cloud vs mesh vs SDF), analytic sphere / cuboid SDFs with closed-form CSG operators (union = min, intersection = max, difference = max(a, −b)), and a brief abandoned GAN-SDF exploration. Decisions made here propagate forward: Hexplane AE's continuous-feature pivot, the diffusion-over-GAN preference, the UODF cross-reference.
Foundation Studycs.GR · cs.LGAaditya Jain
Feb 2025
WP — 011
Flow Matching Backbone Validation on MNIST: A Three-Way Comparison of Pure Mamba, Pure Transformer, and Hybrid Mamba+Attention Under Matched Parameter Count
Matched-parameter MNIST validation across three sequence backbones with identical FM head + training schedule. Hybrid wins loss (0.080) but Pure Mamba ties on visual sample quality and is 1.44× faster per step. Pure Transformer produces noise at 50-epoch budget. Decision rule (set ex ante) picks Pure Mamba — the backbone choice that carries into MambaFlow3D.
Technical Report cs.LG Aaditya Jain
Nov 2025
WP — 010
MambaFlow3D: A Pure-Mamba + Latent-Flow-Matching Architecture for Single-Image 3-D Generation — Spec, Speed-up Budget, and ModelNet10 Phase-2 Validation
Architecture spec for single-image-to-3-D substituting Pure-Mamba state-space blocks for transformer attention (vs SparC3D / TRELLIS) and flow matching for DDPM sampling. PointNet++ → 10 Mamba blocks (d_model=256, d_state=128) → FM head → FoldingNet, 7.25 M parameters. Speed-up budget: 2–3× training, 5–12× end-to-end inference. ModelNet10 Phase-2 bring-up with the PointNet++ +3-channel trap diagnostic.
Technical Report cs.LG · cs.GR Aaditya Jain
Nov 2025
WP — 009
Training JiT Diffusion on Two Consumer GPUs: Hardware Adaptation, Debugging Cascade, and Phase-1 Reproduction of ViT-Backbone x-Prediction Diffusion at ImageNet-256
Reproduction of LTH14/JiT (ViT-B/16 + x-prediction, ImageNet-256) on 2 × RTX 3060 12 GB. Documents the hardware-adaptation table (1024 → 32 effective batch, 80 % → 55 % memory, ~1 h → ~6 h per epoch) and a five-failure debugging cascade (MKL ITT, ImageNet flat dirs, "hang" diagnostic, DataLoader OOM, pos_embed shape mismatch). Epoch-0 FID 281.24; crash at start of epoch 1 in sampling pass.
Technical Report cs.LG · cs.CV Aaditya Jain
Nov 2025
WP — 008
When Variational Autoencoders Meet Binary Geometry: Posterior Collapse on 6-Plane Hexplane Representations and the Continuous-Feature Fix
Diagnoses a structural failure mode of VAEs trained on binary hexplane occupancy: continuous-Gaussian distributional assumption is violated, reconstruction term converges to degenerate mean solution, posterior collapses regardless of KL schedule. Fix is at input representation, not loss schedule. Pivot to deterministic AE with continuous depth+normals features eliminates the failure.
arXiv Preprint cs.LG · cs.GR Aaditya Jain
Dec 2025
WP — 007
SculptNet: Learning Coarse-to-Fine 3D Reconstruction from Single Images via Five-Primitive Vocabularies and Progressive Stage-Wise Commitment
Four-stage coarse-to-fine pipeline (block → shape → detail → compose) over five named primitives (box, cylinder, cone, sphere, wedge) with independent face/cap deformation. PartNeXt-trained via a Houdini Python SOP geometric classifier. ~1.3 cm Hausdorff on chair benchmark (2.6% of bounding diagonal). No executor gap — primitives are continuous parametric geometry.
arXiv Preprint cs.GR · cs.CV Aaditya Jain
Feb 2026
WP — 006
Hierarchical Part-Based Triplane Reconstruction: Eliminating Inter-Part Occlusion via Per-Part Local Frames and a Shared Decoder
N + 1 triplane sets — one per semantic part in a local frame, one coarser global for spatial context. Shared SDF decoder conditioned on part-id embedding. Structural fix to the EG3D / SparC3D inter-part occlusion failure mode. ~6× storage, ~2× inference, fidelity 4.8% → 1.9% Hausdorff on chair benchmark.
arXiv Preprint cs.GR · cs.LG Aaditya Jain
Feb 2026
WP — 005
Six-Plane Orthographic Mesh Reconstruction: From Dense Depth Pixels to Watertight Triangulated Geometry via Per-Cluster Polygon-with-Holes Triangulation
Inverts six axis-aligned orthographic depth maps into a single watertight 3D triangle mesh. Six-stage pipeline (foreground → cluster → contour → simplify → triangulate → stitch) compresses 352K input pixels to ~454 vertices on the sphere benchmark; minimal-polygon vs cloth-grid trade-off characterised.
Technical Report cs.GR · cs.CG Aaditya Jain
Feb 2026
WP — 004
Six-Plane Elevation Reconstruction: Watertight 3D Building Geometry from a Single Street-View Photograph
Routing a single street-view photo through six orthographic elevations → marching-squares contours → depth clustering → earcut triangulation → watertight stitched mesh. Compresses 352K source pixels to ~454 vertices / ~332 triangles.
arXiv Preprint cs.GR · cs.CV Aaditya Jain
Mar 2026
WP — 001
PGN: A Transformer-Based Procedural Generator Network for 3D Bridge Synthesis from Polyline Semantic Attributes
Seq2seq transformer mapping polylines with semantic boundary attributes to executable DSL programs that construct USD bridge scenes. 15-pair training corpus, dual-loss curriculum, analysis of the non-differentiable executor gap.
arXiv Preprint cs.GR · cs.LG Aaditya Jain
Sep 2025
WP — 002
SketchProc3D: CNN-Based Grammar Snippet Recognition for Inverse Procedural Modeling of Building Facades from Freehand Sketches
CNN system mapping freehand building sketches to CityEngine CGA grammar programs. 95–99% accuracy on synthetic data; domain-gap analysis between synthetic edges and human sketches; differentiable rendering gradient analysis.
arXiv Preprint cs.GR · cs.CV Aaditya Jain
Oct 2025
WP — 003
Graph Grammars for Automatic 3D Procedural Modeling: Implementing Merrell's Boundary String Method with Neural Rule Prediction
Re-implementation of Merrell's graph grammar from scratch — half-edge boundary strings, 3D extension, Python prototype generating chairs / tables with detected symmetry rules. Path toward neural rule prediction outlined.
arXiv Preprint cs.GR · cs.LG Aaditya Jain
Oct 2025