Relative Mahalanobis Distance (RMD)

Updated 24 June 2026

Relative Mahalanobis Distance (RMD) is a metric that subtracts global background distance from class-specific Mahalanobis distances to effectively detect out-of-distribution inputs.
RMD is derived as a log-likelihood ratio under Gaussian assumptions and is framed within a Bayesian nonparametric context to stabilize detection in high dimensions.
The method is computationally efficient, hyperparameter-free, and applicable across various domains such as vision, language, and genomics through post-hoc integration with deep feature embeddings.

Relative Mahalanobis Distance (RMD) is a metric designed to improve the detection of out-of-distribution (OOD) inputs in neural networks by addressing key limitations in the standard Mahalanobis Distance (MD) methodology. RMD is defined as the difference between the squared Mahalanobis distance of a feature vector to the most likely class-conditional Gaussian and to a global "background" (label-marginal) Gaussian fitted to all in-distribution data. It is motivated both as a practical fix to high-dimensional failure modes of MD and as a likelihood-ratio test, and has been analyzed within a Bayesian nonparametric framework as a log-odds score under a Dirichlet Process Mixture Model (DPMM) with Gaussian components. This metric is widely used in post-hoc OOD detection for embeddings produced by deep models in diverse domains such as vision, language, and genomics.

1. Formal Definitions and Computational Framework

Let $x$ denote a data point and $f(x) = z \in \mathbb{R}^D$ its feature embedding. Given in-distribution training data with $K$ classes, compute for each class $k$ :

Class mean: $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$
Shared class covariance: $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$

For a test feature $z' = f(x')$ , the standard Mahalanobis distance to class $k$ is: $\mathrm{MD}_k(z') = (z' - \mu_k)^\top \Sigma^{-1} (z' - \mu_k)$ with the OOD score $C_\mathrm{MD}(x') = -\min_{k=1}^K \mathrm{MD}_k(z')$ .

RMD introduces a global "background" Gaussian:

Marginal mean: $f(x) = z \in \mathbb{R}^D$ 0
Marginal covariance: $f(x) = z \in \mathbb{R}^D$ 1

Define

$f(x) = z \in \mathbb{R}^D$ 2

and the class-wise relative Mahalanobis distance: $f(x) = z \in \mathbb{R}^D$ 3 The RMD-based OOD confidence score is: $f(x) = z \in \mathbb{R}^D$ 4 An alternative convention, particularly in the Bayesian literature, is to use $f(x) = z \in \mathbb{R}^D$ 5 for inlier scoring (Linderman et al., 12 Feb 2025, Ren et al., 2021).

RMD thus measures, for each class $f(x) = z \in \mathbb{R}^D$ 6, the improvement in Mahalanobis fit to class $f(x) = z \in \mathbb{R}^D$ 7 over the global background, assigning inlier status to test points that are better explained by some class than by the background distribution.

2. Statistical Motivation and Likelihood-Ratio Derivation

RMD is motivated as the log-likelihood ratio between class-conditional and background Gaussian models in feature space. Under the assumption that $f(x) = z \in \mathbb{R}^D$ 8 for class $f(x) = z \in \mathbb{R}^D$ 9 and $K$ 0 for the background, the log-density ratio for a test point $K$ 1 is: $K$ 2 Maximizing this score across $K$ 3 is equivalent to minimizing $K$ 4.

Within the Bayesian nonparametric framework, RMD corresponds (up to scaling and constants) to the log-odds for inlier assignment under a Gaussian DPMM with tied covariance. The inlier probability is: $K$ 5 where $K$ 6 and $K$ 7 is the standard RMD score. This interpretation provides a probabilistic grounding and quantifies the OOD likelihood as a function of RMD under generative mixture modeling assumptions (Linderman et al., 12 Feb 2025).

3. Algorithmic Implementation and Practical Aspects

The canonical RMD computation workflow consists of:

Offline (single computation):

Extract features and compute $K$ 8, $K$ 9 for each class and all data.
Compute marginal $k$ 0, $k$ 1 over all data points.
Precompute $k$ 2 and $k$ 3.

Online (per test point $k$ 4):

Compute $k$ 5.
For each $k$ 6, compute $k$ 7.
Compute $k$ 8.
For each $k$ 9, compute $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 0.
Return $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 1 as the OOD score.

RMD is hyperparameter-free: no tuning or outlier validation is required. In practice, regularization (e.g., $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 2) ensures numerical stability for high-dimensional, low-sample-size regimes. RMD can be applied to features from any or multiple neural network layers. For ill-conditioned problems, combining RMD scores from several layers via averaging can yield increased robustness (Ren et al., 2021).

4. Theoretical Properties, Insights, and Limitations

MD fails for near-OOD detection in high dimension because most eigen-directions are class-nonspecific and drown out signal from discriminative axes. If $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 3 is the covariance eigendecomposition, then generically only a small subset of directions are informative (e.g., top 100 of 1024 for CIFAR-100 vs CIFAR-10). RMD cancels these non-discriminative components by subtracting the marginal MD $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 4, leaving only the class-specific contributions: $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 5 where $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 6 are eigenvectors; only truly discriminative dimensions contribute to RMD.

Ablations confirm RMD exactly cancels non-discriminative noise and achieves perfect separation in diagnostic setups (e.g., $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 7 Gaussian, only first axis relevant). RMD is more stable than MD during continued model training, avoiding the AUROC "peak then degrade" phenomenon observed with MD (Ren et al., 2021).

The principal theoretical limitation is the Gaussianity assumption. For highly non-Gaussian feature distributions, replacing class-conditional and background Gaussians with more flexible densities (such as normalizing flows) can generalize the RMD framework, although gains reported are modest. In "far-OOD" settings (e.g., CIFAR-10 vs SVHN), where classes are well separated, MD already performs optimally, so RMD adds little.

5. Extensions: Bayesian Nonparametrics and Hierarchical Models

The standard RMD assumes tied (shared) covariance across classes. This is optimal only when class covariances are similar and abundant data is available. In cases of cluster-wise covariance heterogeneity or limited samples per class, hierarchical Bayesian models provide substantial improvements.

Three principal variants, as formalized in (Linderman et al., 12 Feb 2025), are:

Full-covariance model: Each class $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 8 has its own $\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i$ 9, allowing arbitrary covariance differences; predictive densities become multivariate Student- $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 0.
Diagonal-covariance model: Each feature $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 1 in each class $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 2 has independent variances, regularized toward the global mean.
Coupled diagonal-covariance model: Per-class variances are constrained by a global scale variable $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 3, inducing collective scaling while permitting direction-specific flexibility.

These models implement outlier detection by computing the log-density ratio $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 4 and thresholding the inlier probability. Empirically, hierarchical models outperform standard RMD when class covariance heterogeneity is pronounced or per-class sample sizes are small. For near-OOD image benchmarks (e.g., SSB Hard, NINCO on ImageNet-1K), the coupled-diagonal DPMM achieves higher AUROC than RMD. In very high dimension, diagonal models provide robust performance gains, as overfitting with full covariance is a concern (Linderman et al., 12 Feb 2025).

6. Benchmark Performance and Empirical Results

RMD has been evaluated across a diverse set of OOD detection settings. On near-OOD vision tasks (CIFAR-100 vs CIFAR-10, Wide-ResNet-28-10), RMD consistently improves on MD by several percentage points in AUROC (e.g., 74.91 → 81.01%). Substantial gains are demonstrated in genomics OOD (53.10 → 68.98% AUROC). With pretrained or fine-tuned models (ViT-B/16, BiT-R50x1, CLIP, BERT), RMD often exceeds the performance of both MD and max-softmax probability (MSP). Across all benchmarks, RMD rarely underperforms MD, and in cases with pronounced within-feature shared variability or tight class overlap, offers up to +15 AUROC improvements (Ren et al., 2021).

In the Bayesian nonparametric framework, hierarchical DPMM models further improve upon RMD for datasets with highly variable covariances across classes or limited samples per class, particularly in near-OOD regimes (Linderman et al., 12 Feb 2025).

7. Comparisons, Practical Guidelines, and Open Directions

RMD is a member of a family of embedding-space OOD detection methods:

Method	Model Assumption	OOD score
Mahalanobis Distance (MDS)	Parametric, tied $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 5	$\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 6
Relative Mahalanobis Distance	Parametric, tied $\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 7	$\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 8
Hierarchical DPMMs	Hierarchical/mixed	$\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top$ 9
$z' = f(x')$ 0-Nearest Neighbor	Nonparametric	Min. distance to IND neighbors
Energy/softmax-based	Network logit based	Max(logit), ODIN, Energy

RMD is more robust to hyperparameters and the choice of feature layer than MD or partial Mahalanobis Distance (PMD), which requires eigenbasis tuning. RMD does not require retraining or access to OOD samples, can be applied post-hoc to any trained model, and is straightforward to implement. For high-dimensional or poorly conditioned covariance matrices, regularization or dimension reduction are advised.

The framework naturally extends to non-Gaussian generative models, and the Bayesian interpretation invites further advances in estimating mixed manifolds or learning the discriminative subspace directly—currently an open problem (Ren et al., 2021, Linderman et al., 12 Feb 2025). A plausible implication is that further improvements in near-OOD detection may arise from generative or discriminative methods that incorporate both flexible class-conditional density modeling and explicit background cancellation.

Markdown Report Issue Upgrade to Chat

References (2)

A Bayesian Nonparametric Perspective on Mahalanobis Distance for Out of Distribution Detection (2025)

A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative Mahalanobis Distance (RMD).