Fast Depth-Based (FDB) Estimator

Updated 9 April 2026

The paper introduces a novel FDB estimator that replaces iterative subset-search processes with a direct depth-trimming approach to compute robust location and scatter, achieving up to 10× speed improvements over traditional MCD methods.
The methodology leverages efficient algorithms and shallow CNN architectures with multiscale dilated convolutions and Dense Blocks to enable real-time depth recovery and high-quality view synthesis.
FDB estimators extend to robust regression and task-specific architectures, delivering high-breakdown performance and metabolic efficiency for high-dimensional data analysis and resource-constrained applications.

A Fast Depth-Based (FDB) Estimator refers to a class of estimators or algorithms that leverage statistical, geometric, or learned notions of "depth"—in various senses—combined with algorithmic stratagems emphasizing computational efficiency. FDB estimators arise in both statistical robust estimation (notably multivariate location/scatter and robust regression) and 3D visual perception (notably disparity/depth recovery for view synthesis, monocular and stereo depth, and robotics). In all these domains, the core concept involves the use of some notion of data depth to produce accurate, robust, and fast solutions, with modern developments spanning training-efficient deep networks, direct depth-trimming, and accelerated randomized optimization.

1. Statistical Depth-Based Estimation: Theory and Algorithms

Depth-based estimators in statistics measure the centrality of an observation within a data cloud relative to some center, yielding depth functions such as Tukey (halfspace), projection, and $L_2$ -depths. In robust location/scatter estimation, the Minimum Covariance Determinant (MCD) remains the gold standard for high-breakdown, affine-equivariant inference, but is computationally demanding in high dimensions. The Fast Depth-Based (FDB) estimator proposed by Zhang, Song, and Dai replaces the iterative subset-search via the MCD's concentration step (C-step) with a direct depth-trimming approach:

For a data matrix $X\in\mathbb{R}^{n\times p}$ and depth function $D(\cdot; X)$ , the $\alpha$ -trimmed region $\mathrm{TR}_\alpha(X)$ consists of the $h$ points with the highest empirical depth, for $h=\lfloor\alpha n\rfloor$ .
The FDB estimator computes robust location $\hat{\mu}_\mathrm{FDB}$ and scatter $\hat{\Sigma}_\mathrm{FDB}$ from $\mathrm{TR}_\alpha(X)$ , followed by a reweighting analog to DetMCD for finite-sample efficiency.

Asymptotically (under elliptical symmetry), the trimmed depth region coincides with the optimal MCD subset; thus, the FDB approach is both robust (maximal 50% breakdown) and statistically consistent. In practice, projection depth (based on the supremum of standardized univariate outlyingness $X\in\mathbb{R}^{n\times p}$ 0) and $X\in\mathbb{R}^{n\times p}$ 1-depth are tractable. The FDB estimator is empirically 2–10× faster than DetMCD and scales linearly (projection depth) or quadratically ( $X\in\mathbb{R}^{n\times p}$ 2-depth) in $X\in\mathbb{R}^{n\times p}$ 3, versus the cubic scaling of MCD approximations. The method is effective for PCA, LDA, outlier detection, and denoising in high-dimensional settings (Zhang et al., 2023).

2. Fast Depth-Based Robust Regression

Robust regression via statistical depth uses the projection-regression depth (PRD) functional. The induced median—defined as the minimizer of the unfitness functional over regression coefficients—is a high-breakdown, affine-equivariant estimator: $X\in\mathbb{R}^{n\times p}$ 4 where $X\in\mathbb{R}^{n\times p}$ 5 and $X\in\mathbb{R}^{n\times p}$ 6 is the $X\in\mathbb{R}^{n\times p}$ 7th data point.

Zuo (Zuo, 2020) presents exact and highly optimized approximate algorithms for PRD regression medians that rectify affine equivariance and accelerate computation via finite partitioning of the sphere $X\in\mathbb{R}^{n\times p}$ 8, direct candidate generation, and exploitation of efficient C++ via Rcpp for inner loops. Additional one-step depth estimators—deepest-candidate, $X\in\mathbb{R}^{n\times p}$ 9-point average, and UF-weighted variants—offer up to $D(\cdot; X)$ 0 speedup over previous approximations while retaining statistical robustness and near-minimal mean squared error. These advances enable feasible use of depth-based regression in settings previously dominated by classical (and less robust) methods.

3. Fast Depth-Based Estimation in Learning-Based Depth Recovery

Fast Depth-Based Estimator also refers to learning-based approaches in stereo and monocular depth estimation targeting real-time or accelerated inference, such as the framework detailed in "Fast Depth Estimation for View Synthesis" (Anantrasirichai et al., 2020):

The FDB estimator is a single, shallow-but-dense CNN, incorporating multiscale dilated convolutions (enlarged receptive fields without loss of spatial resolution) and Dense Blocks (all-layers-to-all subsequent layers concatenation for feature reuse).
A compact two-stage decoder with skip connections ensures effective upsampling while preserving edge detail vital for photorealistic view synthesis.
Non-linear depth remapping (power-transform of normalized ground truth), with $D(\cdot; X)$ 1, increases foreground precision; inverse transformation restores the linear scale during inference.
"Projection loss"—enforcing photometric consistency via image warping using estimated depth—further improves accuracy, especially for downstream novel view rendering.
This architecture surpasses prior fast networks (DenseMapNet, DispNet) in both accuracy (mean $D(\cdot; X)$ 2 lower disparity/depth error) and speed (running up to $D(\cdot; X)$ 3 faster than heavy baselines such as PSMNet), making it practical for real-time view synthesis scenarios.

4. Specialized FDB Architectures: Monocular and Task-Constrained Applications

Recent methods propose specialized FDB architectures addressing constraints of particular domains:

FA-Depth (Wang et al., 2024) uses a sparse, shallow backbone (SmallDepth, $D(\cdot; X)$ 42M params, $D(\cdot; X)$ 5500 FPS), multi-branch training-time filtering (Equivalent Transformation Module), pyramid loss enforcing cross-augmentation and multi-scale supervision, and targeted distillation loss (APX) from a heavyweight teacher. The inference network is thereby maximally light while obtaining accuracy rivaling far larger models. All complexity augmentation occurs during training only.
UDepth (Yu et al., 2022) adapts inputs by encoding underwater attenuation priors into a new RGB-like space, employs a physics-motivated least-squares coarse predictor, and fuses this with a lightweight MobileNetV2+Transformer architecture. A domain projection loss ensures the learned output respects the physical image formation model, yielding state-of-the-art underwater depth estimation on low-power embedded hardware.
CenterDepth (Tu et al., 26 Apr 2025) limits depth regression in driving scenes to detected object centers using keypoint-based detection and local CRF (Center FC-CRFs) refinement, reducing global computation to $D(\cdot; X)$ 6 versus $D(\cdot; X)$ 7 for pixel-wise models. This achieves $D(\cdot; X)$ 8 threshold accuracy at 60–300 FPS.

5. Diffusion-Prior-Inspired FDB Estimators

Recent advances exploit diffusion-prior backbone architectures repurposed for feed-forward depth recovery. In FiffDepth (Bai et al., 2024):

A Stable Diffusion U-Net, originally trained for RGB synthesis via denoising, is transformed to predict depth in a one-shot, deterministic forward pass by evaluating the denoiser at $D(\cdot; X)$ 9 and decoding the output via the VAE.
Training employs a "blended latent" approach, where synthetic and depth-targeted latents are combined, preserving the original diffusion trajectory while specializing for depth.
Integration of DINOv2 pseudo-labels at $\alpha$ 0 with a dedicated loss term enables both fine structure preservation and domain-robust generalization, outperforming other specialized MDE approaches in both fine-structure and accuracy with minimal labeled data.
The model achieves high frame rates ( $\alpha$ 111 FPS at $\alpha$ 2) and real-time feasibility despite using large pre-trained diffusion architectures.

6. Empirical Performance, Speed/Accuracy Tradeoffs, and Practical Implications

FDB estimators consistently demonstrate superior speed/accuracy profiles. Representative results include:

Method	Disparity EPE / View MAE (Sintel)	Model Params	Inference Speed	Relative Gains
DenseMapNet	4.41 px / 8.34 px	1.8M	0.20 s (GPU)	Baseline fast network
FDB (CNN)	3.95 px / 7.99 px	2.3M	0.21s (GPU)	$\alpha$ 345% better acc., %%%%33 $X\in\mathbb{R}^{n\times p}$ 034%%%% faster than heavy models (Anantrasirichai et al., 2020)

Similarly, FA-Depth achieves AbsRel $\alpha$ 6, $\alpha$ 7, and $\alpha$ 8500 FPS with two orders of magnitude fewer parameters than prior architectures (Wang et al., 2024).

Statistical FDB estimators via projection depth are $\alpha$ 9– $\mathrm{TR}_\alpha(X)$ 0 faster than DetMCD with equivalent breakdown and accuracy; regression FDB algorithms accelerate projection-regression median computation by $\mathrm{TR}_\alpha(X)$ 1– $\mathrm{TR}_\alpha(X)$ 2 with negligible loss in mean squared error, hence enabling robust estimation in previously infeasible regimes (Zhang et al., 2023, Zuo, 2020).

FDB strategies are therefore integral in: real-time 3D vision for robotics, embedded navigation, low-power devices, robust multivariate statistics, and large-scale outlier detection and denoising tasks.

7. Summary and Key Insights

Fast Depth-Based Estimators encapsulate a cross-disciplinary methodology leveraging depth-based trimming, dense and sparse learning architectures, statistical depth functionals, and acceleration via randomized or analytic algorithm design.
The concept is unified in aiming for maximal statistical/statistical-physical robustness and efficiency, minimal inference latency, and the capacity to scale to high dimension or real-time deployment.
State-of-the-art FDB designs illustrate that judicious use of depth notions (in both the geometric/vision and statistical sense), efficient sampling, and modular training-time augmentation allow practitioners to break traditional accuracy-vs-speed tradeoffs, yielding models viable for both high-stakes scientific analysis and resource-constrained operational environments (Anantrasirichai et al., 2020, Zhang et al., 2023, Wang et al., 2024, Bai et al., 2024, Tu et al., 26 Apr 2025, Yu et al., 2022, Zuo, 2020).