Camera Raymaps: 3D Geometry & Applications

Updated 14 April 2026

Camera raymaps are dense, per-pixel encodings that map image samples to 3D rays by integrating intrinsic and extrinsic parameters, thereby providing a precise geometric prior.
They underpin reconstruction, detection, and simulation techniques by enabling robust spatial consistency, dynamic scene suppression, and calibration-free imaging across varied setups.
Empirical benefits include improved 3D reconstruction accuracy, enhanced pose estimation in multi-camera rigs, and successful optical simulation with sub-pixel precision.

Camera raymaps are dense, per-pixel or per-ray encodings that capture the relationship between image samples and 3D light paths across the sensor’s field of view. They underpin modern methods in reconstruction, detection, simulation, and neural rendering by representing either the geometry (origin, direction) of viewing rays or, more generally, the mapping between image samples and their associated 3D rays—potentially accounting for arbitrary optics, camera parameters, and even surface reflections. Raymaps formalize the explicit geometric prior needed for spatial reasoning in both standard pinhole and more general or learned camera models, and are foundational to approaches that require spatial consistency across frames, view synthesis tasks, and robust 3D scene perception.

1. Mathematical Foundations and Representations

Camera raymaps are typically structured as dense fields mapping image coordinates to ray origins and directions. For a camera with intrinsic matrix $K \in \mathbb{R}^{3\times3}$ and extrinsic parameters $[R | \tau] \in SE(3)$ , the per-pixel raymap at pixel $(x, y)$ can be given by

$\mathrm{RayMap}(x, y) = \left[ c, \; \widehat{d}(x, y) \right] \in \mathbb{R}^6$

where $c = -R^\top \tau$ is the camera center in world coordinates and

$\widehat{d}(x, y) = \frac{R^\top K^{-1} \tilde{u}}{\| R^\top K^{-1} \tilde{u} \|_2}$

is the normalized direction for the pixel’s viewing ray ( $\tilde{u} = (x, y, 1)^\top$ ) (Wang et al., 21 Mar 2026, Li et al., 2 Jun 2025). In more complex or learning-based scenarios (e.g., CAM3R (Guruprasad et al., 23 Mar 2026)), raymaps are parameterized independently of explicit intrinsics, using spherical harmonics expansions to model the continuous pixel-to-ray direction field even under fisheye or panoramic distortions. For multi-camera or multi-modal applications, such as Rig3R (Li et al., 2 Jun 2025), raymaps can be defined relative to global frames or rig-centric coordinate systems, with each pixel storing both its direction and a shared camera center.

In the case of forward models and rendering, ray-transfer functions approximate the mapping between lens entry and exit rays using multivariate polynomials

$R: [x_1, y_1, \theta_{1x}, \theta_{1y}] \longmapsto [x_2, y_2, \theta_{2x}, \theta_{2y}]$

enabling dense simulation of imaging rays through arbitrary (“black-box”) optics without access to proprietary lens designs (Goossens et al., 2022).

2. Camera Raymaps in Learned 3D Perception and Reconstruction

Raymaps are intrinsic to modern streaming and feed-forward 3D scene reconstruction. Notably, RayMap3R (Wang et al., 21 Mar 2026) leverages the static-scene bias inherent in RayMap-only predictions: when the network produces geometry conditioned solely on the raymap (not appearance), transient objects are suppressed, because the latent memory is dominated by static content. Dual-branch inference schemes contrast main-branch (appearance + raymap) and RayMap-only predictions, using pixelwise depth discrepancies to gate dynamic updates and suppress memory contamination by moving objects. The result is a state-aware, drift-resistant mechanism that improves reconstruction and camera pose estimation, achieves metric consistency via reset alignments, and maintains trajectory stability through state-aware smoothing. Rig3R (Li et al., 2 Jun 2025) extends this approach to multi-camera rigs with both pose and rig-relative raymaps, supporting both global pose estimation and rig structure discovery, even from unordered images.

The CAM3R framework (Guruprasad et al., 23 Mar 2026) generalizes pixel-to-ray mapping by learning a spherical harmonic regression for ray directions, enabling robust geometry predictions across arbitrary projection models with neither calibration nor explicit parameterization of distortion.

3. Raymaps in Multi-View Detection and Semantic Mapping

Raymaps provide a crucial geometric scaffold for query-based 3D object detection and semantic BEV segmentation. In RayFormer (Chu et al., 2024), the camera raymap is the set of all rays discretized into angular (polar) sectors and depth bins, forming a structured 3D sampling grid. Queries are initialized uniformly along each ray, avoiding clumping and enabling each query to extract unique image and BEV features. Adaptive sampling along ray segments further refines localization, while sparse and foreground-biased allocation improves object-level distinctiveness. Empirically, ray-centric approaches yield significant gains in mAP and NDS over rectangular or “lift–splat” grids due to alignment with optical projection geometry.

Similarly, in LaRa (Bartoccioni et al., 2022), each image pixel’s direction and camera center are encoded via an MLP and fused across cameras using transformer-based attention to produce a compact latent space, from which the BEV semantic map is decoded. This sidesteps explicit depth estimation, projecting the fusing of views into a latent space indexed by precise geometric priors from the raymap.

4. Raymaps in Optical Simulation and Rendering

Camera raymaps are foundational in simulation and physically based rendering of imaging systems with complex or proprietary optics. With the polynomial ray-transfer function method (Goossens et al., 2022), raymaps are constructed by fitting polynomials to large datasets of ray correspondences (often generated by optical design software such as Zemax). For each image pixel (and optionally subpixel), the corresponding 3D ray is predicted by evaluating the polynomial RTF and embedding it into a rendering system such as PBRT. This approach bypasses the need for explicit lens prescriptions while maintaining sub-pixel photometric and geometric fidelity, as validated by rendered edge-spread functions (ESF) and relative illumination curves that match the ground truth.

5. Generalizations: Camera-Agnostic, Rig-Centric, and Non-Perspective Raymaps

Recent works have extended the raymap concept beyond classical intrinsics-based models. CAM3R (Guruprasad et al., 23 Mar 2026) dispenses with known intrinsics, using a transformer to predict spherical harmonic coefficients that map normalized image coordinates to directions on the unit sphere, enabling agnostic handling of pinhole, fisheye, and panoramic images, each with distinct nonlinear distortion profiles. Rig3R (Li et al., 2 Jun 2025) further introduces rig-relative raymaps, enabling inference of both extrinsics and rig structures, and supports multi-task learning of raymaps, camera centers, and pointmaps. By representing both pose and rig-relative frames, the network can reason about multiple possible coordinate systems and recover global or rig-specific geometry as required.

6. Raymaps for Non-Imaging Sensors and Virtual Radiance-Field Cameras

The raymap principle generalizes to non-conventional sensors, such as glossy objects that act as reflective “radiance-field cameras” (Tiwary et al., 2022). In ORCa, the surface of the object is treated as a 2D array mapping outgoing directions (determined by surface normals, incident rays, and curvature) to a 5D environment radiance field. Each surface patch and outgoing direction defines a raymap pixel, fully analytically parameterized, which samples from the learned radiance field for both color and depth. This construction enables synthesis of novel views, beyond-line-of-sight imaging, and occlusion-aware reconstructions entirely from object reflections.

7. Empirical Benefits, Key Applications, and Impact

Camera raymaps are empirically validated across a wide spectrum of tasks:

In streaming 3D reconstruction, RayMap3R (Wang et al., 21 Mar 2026) achieves state-of-the-art performance on dynamic scenes (e.g., reducing ATE from 0.210 m to 0.166 m on MPI-Sintel dynamic scenes) and improves robustness to motion-induced artifacts.
In rig-aware 3D reconstruction, Rig3R (Li et al., 2 Jun 2025) outperforms unstructured approaches by 17–45% in mean angular accuracy and delivers more accurate multi-camera pose estimation.
In camera-agnostic reconstruction, CAM3R (Guruprasad et al., 23 Mar 2026) dramatically improves relative pose accuracy on panoramic imagery (17.8% to 97.7% RRA@15) via learned raymap regression.
In multi-camera detection, RayFormer (Chu et al., 2024) achieves superior mAP (55.5%) and NDS (63.3%) over prior BEV query frameworks.
In simulation and rendering, polynomial RTF-based raymaps (Goossens et al., 2022) achieve sub-pixel ESF accuracy and preserve lens-design confidentiality while supporting arbitrary optics.

Camera raymaps unify the geometric encoding of view-dependent information in perception models, renderers, and simulation pipelines, supporting robust operation under unconstrained imaging scenarios and enabling learning and inference across the full space of camera geometries and rig configurations.