Relative Mahalanobis Distance (RMD)
- Relative Mahalanobis Distance (RMD) is a metric that subtracts global background distance from class-specific Mahalanobis distances to effectively detect out-of-distribution inputs.
- RMD is derived as a log-likelihood ratio under Gaussian assumptions and is framed within a Bayesian nonparametric context to stabilize detection in high dimensions.
- The method is computationally efficient, hyperparameter-free, and applicable across various domains such as vision, language, and genomics through post-hoc integration with deep feature embeddings.
Relative Mahalanobis Distance (RMD) is a metric designed to improve the detection of out-of-distribution (OOD) inputs in neural networks by addressing key limitations in the standard Mahalanobis Distance (MD) methodology. RMD is defined as the difference between the squared Mahalanobis distance of a feature vector to the most likely class-conditional Gaussian and to a global "background" (label-marginal) Gaussian fitted to all in-distribution data. It is motivated both as a practical fix to high-dimensional failure modes of MD and as a likelihood-ratio test, and has been analyzed within a Bayesian nonparametric framework as a log-odds score under a Dirichlet Process Mixture Model (DPMM) with Gaussian components. This metric is widely used in post-hoc OOD detection for embeddings produced by deep models in diverse domains such as vision, language, and genomics.
1. Formal Definitions and Computational Framework
Let denote a data point and its feature embedding. Given in-distribution training data with classes, compute for each class :
- Class mean:
- Shared class covariance:
For a test feature , the standard Mahalanobis distance to class is: with the OOD score .
RMD introduces a global "background" Gaussian:
- Marginal mean: 0
- Marginal covariance: 1
Define
2
and the class-wise relative Mahalanobis distance: 3 The RMD-based OOD confidence score is: 4 An alternative convention, particularly in the Bayesian literature, is to use 5 for inlier scoring (Linderman et al., 12 Feb 2025, Ren et al., 2021).
RMD thus measures, for each class 6, the improvement in Mahalanobis fit to class 7 over the global background, assigning inlier status to test points that are better explained by some class than by the background distribution.
2. Statistical Motivation and Likelihood-Ratio Derivation
RMD is motivated as the log-likelihood ratio between class-conditional and background Gaussian models in feature space. Under the assumption that 8 for class 9 and 0 for the background, the log-density ratio for a test point 1 is: 2 Maximizing this score across 3 is equivalent to minimizing 4.
Within the Bayesian nonparametric framework, RMD corresponds (up to scaling and constants) to the log-odds for inlier assignment under a Gaussian DPMM with tied covariance. The inlier probability is: 5 where 6 and 7 is the standard RMD score. This interpretation provides a probabilistic grounding and quantifies the OOD likelihood as a function of RMD under generative mixture modeling assumptions (Linderman et al., 12 Feb 2025).
3. Algorithmic Implementation and Practical Aspects
The canonical RMD computation workflow consists of:
Offline (single computation):
- Extract features and compute 8, 9 for each class and all data.
- Compute marginal 0, 1 over all data points.
- Precompute 2 and 3.
Online (per test point 4):
- Compute 5.
- For each 6, compute 7.
- Compute 8.
- For each 9, compute 0.
- Return 1 as the OOD score.
RMD is hyperparameter-free: no tuning or outlier validation is required. In practice, regularization (e.g., 2) ensures numerical stability for high-dimensional, low-sample-size regimes. RMD can be applied to features from any or multiple neural network layers. For ill-conditioned problems, combining RMD scores from several layers via averaging can yield increased robustness (Ren et al., 2021).
4. Theoretical Properties, Insights, and Limitations
MD fails for near-OOD detection in high dimension because most eigen-directions are class-nonspecific and drown out signal from discriminative axes. If 3 is the covariance eigendecomposition, then generically only a small subset of directions are informative (e.g., top 100 of 1024 for CIFAR-100 vs CIFAR-10). RMD cancels these non-discriminative components by subtracting the marginal MD4, leaving only the class-specific contributions: 5 where 6 are eigenvectors; only truly discriminative dimensions contribute to RMD.
Ablations confirm RMD exactly cancels non-discriminative noise and achieves perfect separation in diagnostic setups (e.g., 7 Gaussian, only first axis relevant). RMD is more stable than MD during continued model training, avoiding the AUROC "peak then degrade" phenomenon observed with MD (Ren et al., 2021).
The principal theoretical limitation is the Gaussianity assumption. For highly non-Gaussian feature distributions, replacing class-conditional and background Gaussians with more flexible densities (such as normalizing flows) can generalize the RMD framework, although gains reported are modest. In "far-OOD" settings (e.g., CIFAR-10 vs SVHN), where classes are well separated, MD already performs optimally, so RMD adds little.
5. Extensions: Bayesian Nonparametrics and Hierarchical Models
The standard RMD assumes tied (shared) covariance across classes. This is optimal only when class covariances are similar and abundant data is available. In cases of cluster-wise covariance heterogeneity or limited samples per class, hierarchical Bayesian models provide substantial improvements.
Three principal variants, as formalized in (Linderman et al., 12 Feb 2025), are:
- Full-covariance model: Each class 8 has its own 9, allowing arbitrary covariance differences; predictive densities become multivariate Student-0.
- Diagonal-covariance model: Each feature 1 in each class 2 has independent variances, regularized toward the global mean.
- Coupled diagonal-covariance model: Per-class variances are constrained by a global scale variable 3, inducing collective scaling while permitting direction-specific flexibility.
These models implement outlier detection by computing the log-density ratio 4 and thresholding the inlier probability. Empirically, hierarchical models outperform standard RMD when class covariance heterogeneity is pronounced or per-class sample sizes are small. For near-OOD image benchmarks (e.g., SSB Hard, NINCO on ImageNet-1K), the coupled-diagonal DPMM achieves higher AUROC than RMD. In very high dimension, diagonal models provide robust performance gains, as overfitting with full covariance is a concern (Linderman et al., 12 Feb 2025).
6. Benchmark Performance and Empirical Results
RMD has been evaluated across a diverse set of OOD detection settings. On near-OOD vision tasks (CIFAR-100 vs CIFAR-10, Wide-ResNet-28-10), RMD consistently improves on MD by several percentage points in AUROC (e.g., 74.91 → 81.01%). Substantial gains are demonstrated in genomics OOD (53.10 → 68.98% AUROC). With pretrained or fine-tuned models (ViT-B/16, BiT-R50x1, CLIP, BERT), RMD often exceeds the performance of both MD and max-softmax probability (MSP). Across all benchmarks, RMD rarely underperforms MD, and in cases with pronounced within-feature shared variability or tight class overlap, offers up to +15 AUROC improvements (Ren et al., 2021).
In the Bayesian nonparametric framework, hierarchical DPMM models further improve upon RMD for datasets with highly variable covariances across classes or limited samples per class, particularly in near-OOD regimes (Linderman et al., 12 Feb 2025).
7. Comparisons, Practical Guidelines, and Open Directions
RMD is a member of a family of embedding-space OOD detection methods:
| Method | Model Assumption | OOD score |
|---|---|---|
| Mahalanobis Distance (MDS) | Parametric, tied 5 | 6 |
| Relative Mahalanobis Distance | Parametric, tied 7 | 8 |
| Hierarchical DPMMs | Hierarchical/mixed | 9 |
| 0-Nearest Neighbor | Nonparametric | Min. distance to IND neighbors |
| Energy/softmax-based | Network logit based | Max(logit), ODIN, Energy |
RMD is more robust to hyperparameters and the choice of feature layer than MD or partial Mahalanobis Distance (PMD), which requires eigenbasis tuning. RMD does not require retraining or access to OOD samples, can be applied post-hoc to any trained model, and is straightforward to implement. For high-dimensional or poorly conditioned covariance matrices, regularization or dimension reduction are advised.
The framework naturally extends to non-Gaussian generative models, and the Bayesian interpretation invites further advances in estimating mixed manifolds or learning the discriminative subspace directly—currently an open problem (Ren et al., 2021, Linderman et al., 12 Feb 2025). A plausible implication is that further improvements in near-OOD detection may arise from generative or discriminative methods that incorporate both flexible class-conditional density modeling and explicit background cancellation.