Pseudo Metric Depth Prior

Updated 8 February 2026

Pseudo metric depth prior is a learned representation that maintains depth orderings and geometric structure with an unknown global affine scale and shift.
Specialized loss functions like scale-and-shift invariant loss and virtual normal loss are used to enforce shape fidelity without enforcing absolute depth scale.
Hybrid methods fuse dense relative predictions with sparse metric anchors to correct global scale, enhancing robust 3D perception and cross-domain depth estimation.

A pseudo metric depth prior is a learned or constructed representation of scene depth that is accurate up to a global affine transformation of scale and shift. These priors, also called affine-invariant depth priors or calibrated relative depth fields, preserve depth orderings and geometric structure but not absolute distances. They have become central in modern depth estimation, fusion, and completion pipelines because they combine strong shape generalization with practical calibration flexibility, enabling robust 3D perception across domains, sensors, and environments.

1. Mathematical Characterization of Pseudo Metric Depth

Let $D^*(p)$ denote the true metric (absolute) depth at pixel $p$ . A pseudo metric depth map $D(p)$ satisfies

$D(p) = a\,D^*(p) + b$

for unknown global scale $a>0$ and shift $b \in \mathbb{R}$ , neither of which is observed intrinsically in monocular images or uncalibrated depth modalities (Yin et al., 2020). Thus, any two depth maps related by global affine transformation are observationally equivalent unless external metric cues are provided.

Key invariances:

Preserves ordinal relations: $D_1(p_1) < D_1(p_2) \iff D_2(p_1) < D_2(p_2)$
Preserves ratios of depth differences: $(D(p_1)-D(p_2))/(D(p_3)-D(p_4))$ invariant under scaling and shifting
Encodes scene “shape” up to arbitrary scale and translation along the depth axis

Such priors are pseudo-metric because, in the formal sense, the induced geometry does not provide a unique distance metric in Euclidean space—there is no absolute unit.

2. Algorithms and Losses for Learning Affine-Invariant Depth

Affine-invariant depth estimation requires specialized loss functions to enforce shape fidelity without enforcing absolute scale. The principal approaches include:

Scale-and-shift-invariant loss (SSIL):

$L_{\mathrm{ssi}}(D, D^*) = \frac{1}{2N} \min_{a,b} \sum_{i=1}^N \left( a D_i + b - D^*_i \right)^2$

This fits an optimal affine mapping between prediction and ground truth per image, decoupling learning from global scale/shifts (Yin et al., 2020).

Virtual Normal Loss (VNL):

$L_{\mathrm{vn}}(D, D^*) = \frac{1}{|\mathcal{T}|} \sum_{(i,j,k)\in\mathcal{T}} \left\| (\mathbf{X}_j-\mathbf{X}_i)\times(\mathbf{X}_k-\mathbf{X}_i) - (\mathbf{X}_j^*-\mathbf{X}_i^*)\times(\mathbf{X}_k^*-\mathbf{X}_i^*) \right\|_1$

This enforces local planarity and shape, exploiting the affine invariance of surface normals.

Integration into Curriculum Learning: For diverse, large-scale datasets, staged curricula based on difficulty, scene type, or error magnitude stabilize the training of affine-invariant depth predictors (Yin et al., 2020).
Test-Time Alignment and Fusion: When metric cues are available at runtime (e.g., sparse LiDAR), a pseudo metric prior can be globally aligned or locally fitted (e.g., via least squares) to provide true metric output (Hyoseok et al., 10 Feb 2025).

3. Construction and Fusion of Pseudo Metric Depth Priors

Recent systems leverage both learned and sensor-based depth sources to construct dense, robust pseudo metric priors:

Dense Relative Prediction + Sparse Metric Anchors: Relative monocular predictions from foundation models (such as ZoeDepth, DepthAnything) are combined with a small set of metric depth measurements (from LiDAR, stereo, visual odometry) by fitting per-image affine transformations, producing dense priors accurate up to metric scale (Cho et al., 1 Feb 2026, Wang et al., 15 May 2025).
Pixel-Level Alignment: Local, distance-weighted affine fits can propagate metric information smoothly into regions with missing sensor data, enabling coherent fusion for depth completion, inpainting, and super-resolution (Wang et al., 15 May 2025).
Gradient-Domain Densification: Poisson fusion using monocular depth gradients and sparse sensor constraints yields full-image pseudo metric maps that preserve both detailed structure and absolute anchor points (Cho et al., 1 Feb 2026).

4. Applications and Empirical Benefits

Pseudo metric depth priors serve as the core mechanism in several key tasks:

Task	Mechanism	Reference
Domain-general monocular depth	Affine-invariant priors and losses for strong zero-shot transfer	(Yin et al., 2020)
Depth completion	Align pseudo-metric prior to sparse metric supervision	(Hyoseok et al., 10 Feb 2025, Cho et al., 1 Feb 2026)
Depth fusion/inpainting	Pixel-level metric alignment and conditioned refinement	(Wang et al., 15 May 2025)
NeRF/3D reconstruction	Pseudo-metric depth as soft prior for supervision (EMD losses)	(Rau et al., 2024)
Panoramic depth (foundation)	Pseudo-label curation and sharpness/geometry losses to encode prior	(Lin et al., 18 Dec 2025)

Empirically, pseudo metric priors deliver:

Consistently reduced metric error versus both pure relative and in-domain metric-only models.
Drastic improvements in cross-domain and zero-shot scenarios, often with >20% reduction in RMSE or AbsRel on challenging benchmarks (Yin et al., 2020, Hyoseok et al., 10 Feb 2025, Cho et al., 1 Feb 2026).
Enabling of robust few-shot/few-anchor generalization by decoupling low-level structure from absolute calibration.

The pseudo metric depth paradigm supports a spectrum of hybrid priors:

Range-masked and sharpness-centric priors incorporate multiple synthetic and real sources, range gating, and domain-gap minimization to build explicit, highly transferable priors for panoramic and complex environments (Lin et al., 18 Dec 2025).
Local Topological and Linguistic Guidance: Multi-modal frameworks (e.g., vision-language) estimate dense, spatially varying affine transformations from semantic context and sparse cues, producing pixel-wise pseudo metric priors that can be further refined with contrastive supervision (Cui et al., 16 Jun 2025).
Ground-constraint and planar priors: Explicit geometric assumptions (e.g., known ground plane) can be injected to stabilize metric scale in self-supervised or unsupervised settings (Cecille et al., 2024).

6. Theoretical Aspects and Extensions

From a formal perspective, pseudo metric depth priors are not true metrics, but their mathematical properties are well characterized:

Affine and Isometry Invariance: They respect major transformation groups, which is both a strength for generalization and a limitation if absolute positioning is needed (Yin et al., 2020, Staerman et al., 2021).
Robustness and Consistency: Pseudo-metrics constructed via depth-trimmed regions or gradient-Poisson fusion exhibit high robustness to outliers and domain shifts, with theoretical and empirical guarantees (Staerman et al., 2021, Cho et al., 1 Feb 2026).
Probability Distribution Comparison: Related constructions in distributional geometry define pseudo-metrics based on the Hausdorff distance of depth-trimmed regions, generalizing quantile and Wasserstein metrics (Staerman et al., 2021).

A plausible implication is that future research may extend the pseudo-metric prior concept to nonlinear, locally-adaptive models (e.g., polynomial or neural warps), integrate predictive uncertainty propagation, or end-to-end learn both the affine alignment and fine-structure refinement within large, mixed-prior settings (Wang et al., 15 May 2025).

7. Limitations and Open Directions

Key recognized limitations include:

Requirement for external metric cues at test time to recover true scale, unless geometric priors (e.g., ground plane) are available (Cecille et al., 2024, Hyoseok et al., 10 Feb 2025).
Potential for residual bias in domain transitions, especially under severe appearance or sensor distribution shift (Lin et al., 18 Dec 2025).
Scaling issues for full-resolution inference and memory in foundation models (Wang et al., 15 May 2025).

Ongoing directions include simultaneous learning of alignment and refinement, uncertainty-aware prior fusion, and closed-loop integration of pseudo metric priors into robotics, AR, and SLAM systems.

For foundational coverage of the pseudo metric depth prior concept and its application in diverse monocular, fusion, and completion pipelines, see "DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data" (Yin et al., 2020), "Depth Anything with Any Prior" (Wang et al., 15 May 2025), "Zero-shot Depth Completion via Test-time Alignment with Affine-invariant Depth Prior" (Hyoseok et al., 10 Feb 2025), "OASIS-DC: Generalizable Depth Completion via Output-level Alignment of Sparse-Integrated Monocular Pseudo Depth" (Cho et al., 1 Feb 2026), and related works.