← Research Timeline Aditya Jain / Apple Maps · 3D Reconstruction
Feb 2026
Topic 36 Feb 2026 3D Reconstruction · Computational Geometry

Sphere Depth Maps
from Cube Faces.

A scratch-built pipeline that renders six orthographic depth maps from a 3D mesh — one per cube face — and reconstructs the silhouette as 3D polylines in OBJ. The geometry insight matters more than the code: a sphere's silhouette doesn't sit on the face plane, it sits at the equator.

00 — Motivation

A validation harness with a known-analytic ground truth.

The downstream Six-Plane Mesh Reconstruction pipeline (Topic 35) consumes six orthographic depth maps and emits a watertight mesh. The deployment target is Building Elevation Reconstruction (Topic 40) where those depth maps come from a generative view-synthesis frontend running on street-view photographs. Before any of that ML machinery is connected, the depth-map-to-mesh pipeline needs an independent stress test against inputs where the correct answer is mathematically known.

A sphere centred in the unit cube gives that ground truth. Every depth map is a radially-symmetric disc. The silhouette is the equatorial great circle. The pipeline should recover three perpendicular great circles passing through the origin — anything else means a bug in the reconstruction logic, the rendering, or the lift-to-3D stage. The sphere lets every stage be cross-checked against an analytic answer before any neural component or real-world image is in the loop.

The same scratch-built pipeline generalises to cube, cylinder, torus, and arbitrary OBJ meshes — each adds its own failure modes (cap rims on the cylinder, hole topology on the torus, concave silhouettes on L-shapes) that drive specific refinements to the rendering and lift stages. The sphere just goes first because its ground truth has no degrees of freedom.

What it feeds
The output is consumed directly by Topic 35 as the canonical synthetic test input. The contour-extraction algorithms developed here (subpixel marching squares, corner-preserving smoothing) are reused unchanged in Topic 40's architectural pipeline.
01 — Premise

Six orthographic views, six depth maps, one 3D wireframe.

The downstream Six-Plane Mesh Reconstruction pipeline (Topic 35) and Building Elevation Reconstruction (Topic 40) both consume six axis-aligned orthographic depth maps as input. The question this topic answers: how do you generate those depth maps from a given mesh, and how do you invert them back into a 3D wireframe to verify the pipeline is correct?

The validation target is a sphere centred in the unit cube. For a sphere we know the analytic answer: every depth map should be a radially-symmetric disc, and the reconstructed wireframe should be three perpendicular great circles passing through the sphere's centre. If the pipeline can't recover those three great circles, it has a bug. If it can, the pipeline transfers to harder shapes — cube, cylinder, torus, and ultimately arbitrary OBJ meshes.

02 — Pipeline

Five stages: render · extract · lift · stitch · export.

OBJ mesh trimesh.load centred + scaled INPUT ORTHOGRAPHIC RAYCAST 6 views · ±X ±Y ±Z analytic ∩ · 512² grid MARCHING SQUARES find_contours(level=0.01) subpixel · 1025 pts/face LIFT TO 3D silhouette = equator depth-extrapolated lift CORNER-PRESERVING SMOOTH turning-angle peaks split · smooth · re-join .obj EXPORT 3D polylines original scale restored OUTPUT SHAPE-AGNOSTIC PIPELINE — works on sphere, cube, cylinder, torus, arbitrary OBJ
Figure 1 — Five processing stages. Stage 3 (Lift to 3D) is the conceptually hard one: the silhouette extracted in 2D image space does not live on the cube face — it lives at the equatorial position of the shape inside the cube. Stages 1 + 2 are straightforward; stage 3 is where the wrong implementation reproduces six disconnected circles on the cube walls instead of one coherent 3D wireframe.
Core Insight

The silhouette lives at the equator.
Not on the face.

A sphere's silhouette under orthographic projection is the equatorial circle. Lifting the 2D contour onto the cube's face plane puts six disconnected circles at x = ±1, y = ±1, z = ±1 — wrong. The contour must be lifted to the surface at the limit of zero inward distance: a single great circle at the sphere's centre for each pair of opposite faces. Six views, three coincident great circles, one watertight wireframe.

03 — Stage 1 · Orthographic Raycast Rendering

Analytic ray intersection beats numerical for known shapes.

For each of the six cube faces, parallel rays are cast inward from a 512 × 512 grid covering that face. Ray-shape intersection is solved analytically where possible — for a sphere of radius r centred at the origin:

|origin + t · direction|² = r² → |o|² + 2t (o · d) + t² |d|² − r² = 0 → at² + bt + c = 0 (with a = |d|² = 1 for unit direction)

The smaller positive root is the near-surface intersection; the larger is the far-surface intersection. For a cube, the intersection is a per-axis slab test. For a cylinder, it's a quadratic in the lateral coordinates plus a plane test for the caps. For a torus, no closed-form is fast enough, so we fall back to sphere-marching on the SDF — march along the ray in steps equal to the current SDF value until convergence.

ShapeIntersection methodWhy
SphereAnalytic (quadratic, 2 roots)Single quadratic in t — exact, fast
CubeSlab test (per-axis)Six plane intersections, take innermost / outermost
CylinderAnalytic (quadratic in XZ + cap planes)Curved wall is quadratic, caps are planes — combine both
TorusSphere-marching on SDF4th-degree polynomial otherwise — SDF march is faster and robust
Arbitrary OBJ meshBVH ray-cast via trimesh.rayNo analytic form — bounding-volume hierarchy is the production choice

Each depth map is stored as a 512×512 grayscale PNG plus the metadata pair (t_min, t_range) — two floats per face, twelve floats total per shape — that lets a downstream consumer denormalise pixel values [0, 1] back to world t coordinates. Without those two floats, each face's depth would be auto-contrast-stretched per image and the gradients across opposite faces become incomparable.

04 — Stage 2 · Marching Squares with Subpixel Accuracy

Treat depth as a scalar field, not as a binary mask.

Early attempts thresholded the depth map into a binary hit/no-hit mask and traced the boundary pixel-by-pixel. The result was severe zigzag artefacts: the pixel grid forces axis-aligned steps, and a diagonal silhouette boundary produces a staircase of horizontal and vertical jumps. Three observations fixed it.

First, the depth map is itself a continuous scalar field sampled on a pixel grid. Treat it as such and run marching squares on the float values rather than a thresholded mask. Second, skimage.measure.find_contours(depth, level=0.01) extracts subpixel-accurate contours by linearly interpolating each grid edge between the two adjacent pixel values, yielding fractional (row, col) coordinates. Third, the level value matters: level=0 sits exactly at the discontinuity between hit (positive depth) and no-hit (zero), which is numerically unstable. level=0.01 sits just inside the silhouette where the field is well-defined.

contours = skimage.measure.find_contours(depth_map, level=0.01) # Result: list of (N, 2) float arrays — each contour is a sequence # of subpixel (row, col) coordinates tracing the iso-line # For the sphere case: a single contour with ~1025 points
Why subpixel matters
The contour for a sphere's silhouette is a perfect circle of radius 0.5 in UV space. Pixel-grid marching produces ~32 axis-aligned stair steps. Subpixel marching produces 1025 points lying within 0.001 of the true circle — three orders of magnitude better, no smoothing required at this stage.
05 — Stage 3 · The Equator Insight

Silhouette = limit at zero inward distance.

The first working implementation placed each face's contour on the face plane: P = normal · 1.0 + right · u + up · v. For a sphere this puts the +Z contour at z = 1 and the −Z contour at z = −1 — two parallel circles on opposite walls of the bounding cube. They are geometrically correct as renderings, but they are not the silhouette of the sphere. The sphere's silhouette is the equator. From any viewpoint, the silhouette is the set of surface points where the surface normal is perpendicular to the viewing direction. For a sphere centred at the origin viewed along ±Z, that locus is the unit-circle in the XY plane at z = 0.

The correct lift therefore reads the depth scalar at the silhouette contour and projects to that depth. But the depth field is discontinuous exactly at the silhouette — sampling at the contour pixel gives an unstable value. Sampling just inside the contour gives a value, but that value is epsilon away from the true surface and produces gaps between opposite faces: the +Z and −Z reconstructions sit at z = +ε and z = −ε rather than coinciding at z = 0.

The fix is extrapolation to the limit. Sample depth at several inward distances — typically 1 px, 2 px, and 4 px from the contour — fit a polynomial, and extrapolate back to zero inward distance. This recovers the surface depth at the silhouette without sampling the discontinuity. Opposite-face contours then coincide at the equator, producing one coherent 3D curve rather than two parallel circles.

d1 = bilinear_sample(depth, contour_pt − 1px · inward) d2 = bilinear_sample(depth, contour_pt − 2px · inward) d4 = bilinear_sample(depth, contour_pt − 4px · inward) # Fit polynomial through (1, d1), (2, d2), (4, d4) in (distance, depth) space # Evaluate at distance = 0 d_surface = polynomial_extrapolate([1, 2, 4], [d1, d2, d4], target=0) # Lift to 3D using face's normal/up/right basis P_3D = origin + (1.0 − d_surface) · normal + u · right + v · up
Where it breaks
For shapes with concave silhouettes — torus profiles, L-shapes — the inward direction is locally ambiguous. The current heuristic (inward = gradient direction in the binary mask) works on convex silhouettes and simple concave ones; arbitrary concavities require a per-point inward vector estimated from the contour's local frame.
06 — Stage 3b · Near / Far from Opposite Faces

No re-rendering required.

For solid shapes the depth map gives the near surface depth — the distance from the cube face to the first intersection. The far depth — distance to the back-side intersection — is also useful for thickness reasoning. Naively this would require a second pass of ray-casting. But there's an identity: the near depth of the opposite face is the far depth of the current face, reprojected.

For a ray from the +X face at pixel (u, v): origin (1, u, v), direction (−1, 0, 0). Hits the front surface at x_front = 1 − t_near_posX and the back surface at x_back = 1 − t_far_posX. For the same physical point on the back surface seen from the −X face: origin (−1, u, −v) (note: −X has up = −Z so v is flipped), direction (+1, 0, 0). That point has t_negX = 1 − x_back = t_far_posX. So t_far_posX = 2.0 − t_near_negX after the appropriate coordinate flip — no re-rendering, just arithmetic on existing maps.

# Reconstruct far depth from opposite face's near depth t_far_posX = 2.0 − reproject(t_near_negX, axis_flip='Y') # 6 near maps + 12 metadata floats give all 6 near + 6 far depths # Plus a gap map: thickness = t_far − t_near gap = t_far − t_near # bright = thick geometry, dark = thin
The visual sanity check that failed
When the gap reconstruction was first tested, all six gap maps for the sphere looked identical to all six near maps. That's because the sphere is symmetric — front-hemisphere depth and back-hemisphere thickness have the same radial gradient, and auto-contrast normalisation removes the offset. The maps were different in world units; only the visualisation collapsed them. The fix was to denormalise to world t values via the stored (t_min, t_range) per face before any visual comparison.
07 — Stage 4 · Corner-Preserving Smoothing

Detect corners by turning-angle peak, smooth between them.

The subpixel marching squares output is geometrically correct but visually noisy: at high zoom the contour has small wobbles from the pixel-level interpolation. A naive Gaussian smooth removes the wobble but also rounds sharp architectural corners — exactly the features that distinguish a rectangle from a circle and a window jamb from a wall.

The selective-smoothing algorithm: first apply a light Gaussian to remove pixel-grid noise while preserving structural features. Then compute the turning angle at each vertex — the angle between incoming and outgoing edges. Run scipy peak detection on the turning-angle series with a configurable threshold (default 30°). Vertices with turning-angle peaks above threshold are structural corners and get pinned. Apply heavy smoothing to each segment between corners independently.

corner_threshold_deg = 30 # Stage 1: light Gaussian to suppress pixel-grid noise contour = gaussian_filter1d(contour, sigma=1.0) # Stage 2: detect corners turning_angles = compute_turning_angles(contour) # in degrees corner_idx = scipy.signal.find_peaks(turning_angles, height=corner_threshold_deg)[0] # Stage 3: split, smooth each segment, rejoin segments = split_at_indices(contour, corner_idx) smoothed = [] for seg in segments: smoothed.append(gaussian_filter1d(seg, sigma=4.0)) # heavy smoothing contour_out = concat_segments(smoothed, corner_points=contour[corner_idx])
08 — Results · Per-Shape Verification

Validated against analytic ground truth on four primitives.

ShapeContour points / faceReconstructionNotes
Sphere (r=0.5)1025 subpixel pts3 perpendicular great circles at sphere centreAll 6 faces produce identical discs by symmetry
Cube (unit)4 corner pts + edgesWireframe edges of the cubeNear depth is flat — every pixel inside silhouette has identical t
Cylinder (h=1, r=0.5)~1000 pts / faceTop + bottom rim circles + 4 quarter-arcs for side curvatureSide-view rectangle's top/bottom edges lift to arcs, not lines — required multi-distance extrapolation
Torus (R=0.5, r=0.2)2 contours / top & bottom faceOuter + inner rim circles in the XZ planeHole correctly recovered; intermediate cross-section curvature still under refinement

The pipeline is shape-agnostic at the export stage: the final OBJ contains the union of all six lifted polylines, scaled back to the input mesh's original bounding box (the centring + unit-cube normalisation applied during rendering is inverted before export). The output is consumed downstream by Topic 35 (Six-Plane Mesh Reconstruction) where the polylines become the contour input for triangulation.

Interactive Demo · Live

Pick a shape from the preset row, or click the input canvas to cycle. The six depth maps and the 3-D wireframe update in real time as you switch shapes. Each shape exposes a different stress test for the pipeline — sphere validates symmetry, cube validates flat faces, cylinder validates cap rims, torus validates hole topology. Drag the wireframe pane to rotate.

01 — Input Shape · CLICK TO CYCLE SPHERE
02 — Six Orthographic Depth Maps
03 — Reconstructed Wireframe Drag to rotate
Related Thesis Chapters
Building Elevation Reconstruction
Direct downstream consumer. The six-orthographic-depth-map representation produced here is the input to the contour-to-mesh pipeline for architectural reconstruction.
PGN — Procedural Generator Network
Sister project on the structured-intermediate-representation thesis line. Here the intermediate is six depth maps; in PGN it is a DSL program.
ProcGen3D — Edge-Based Tokenization
Conceptually adjacent. ProcGen3D uses an autoregressive transformer to produce edge tokens; this pipeline uses analytic geometry to recover the same surface curves from depth views.
Appendix — Raw Materials
Transcripts & Source References
████████████████████████████████████████████████
███████████████████████████████████████

██████████████████████████████████████
█████████ · ████ · █████████████████████
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
██████████████████████████████████████████████
██████████ · ████ · ███████████████████████████████
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

████████████████████████████████████████████
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████

Feb 2026
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Feb 2026
██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████
Restricted Access