Papers
Topics
Authors
Recent
Search
2000 character limit reached

Relative Mahalanobis Distance (RMD)

Updated 24 June 2026
  • Relative Mahalanobis Distance (RMD) is a metric that subtracts global background distance from class-specific Mahalanobis distances to effectively detect out-of-distribution inputs.
  • RMD is derived as a log-likelihood ratio under Gaussian assumptions and is framed within a Bayesian nonparametric context to stabilize detection in high dimensions.
  • The method is computationally efficient, hyperparameter-free, and applicable across various domains such as vision, language, and genomics through post-hoc integration with deep feature embeddings.

Relative Mahalanobis Distance (RMD) is a metric designed to improve the detection of out-of-distribution (OOD) inputs in neural networks by addressing key limitations in the standard Mahalanobis Distance (MD) methodology. RMD is defined as the difference between the squared Mahalanobis distance of a feature vector to the most likely class-conditional Gaussian and to a global "background" (label-marginal) Gaussian fitted to all in-distribution data. It is motivated both as a practical fix to high-dimensional failure modes of MD and as a likelihood-ratio test, and has been analyzed within a Bayesian nonparametric framework as a log-odds score under a Dirichlet Process Mixture Model (DPMM) with Gaussian components. This metric is widely used in post-hoc OOD detection for embeddings produced by deep models in diverse domains such as vision, language, and genomics.

1. Formal Definitions and Computational Framework

Let xx denote a data point and f(x)=zRDf(x) = z \in \mathbb{R}^D its feature embedding. Given in-distribution training data with KK classes, compute for each class kk:

  • Class mean: μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i
  • Shared class covariance: Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top

For a test feature z=f(x)z' = f(x'), the standard Mahalanobis distance to class kk is: MDk(z)=(zμk)Σ1(zμk)\mathrm{MD}_k(z') = (z' - \mu_k)^\top \Sigma^{-1} (z' - \mu_k) with the OOD score CMD(x)=mink=1KMDk(z)C_\mathrm{MD}(x') = -\min_{k=1}^K \mathrm{MD}_k(z').

RMD introduces a global "background" Gaussian:

  • Marginal mean: f(x)=zRDf(x) = z \in \mathbb{R}^D0
  • Marginal covariance: f(x)=zRDf(x) = z \in \mathbb{R}^D1

Define

f(x)=zRDf(x) = z \in \mathbb{R}^D2

and the class-wise relative Mahalanobis distance: f(x)=zRDf(x) = z \in \mathbb{R}^D3 The RMD-based OOD confidence score is: f(x)=zRDf(x) = z \in \mathbb{R}^D4 An alternative convention, particularly in the Bayesian literature, is to use f(x)=zRDf(x) = z \in \mathbb{R}^D5 for inlier scoring (Linderman et al., 12 Feb 2025, Ren et al., 2021).

RMD thus measures, for each class f(x)=zRDf(x) = z \in \mathbb{R}^D6, the improvement in Mahalanobis fit to class f(x)=zRDf(x) = z \in \mathbb{R}^D7 over the global background, assigning inlier status to test points that are better explained by some class than by the background distribution.

2. Statistical Motivation and Likelihood-Ratio Derivation

RMD is motivated as the log-likelihood ratio between class-conditional and background Gaussian models in feature space. Under the assumption that f(x)=zRDf(x) = z \in \mathbb{R}^D8 for class f(x)=zRDf(x) = z \in \mathbb{R}^D9 and KK0 for the background, the log-density ratio for a test point KK1 is: KK2 Maximizing this score across KK3 is equivalent to minimizing KK4.

Within the Bayesian nonparametric framework, RMD corresponds (up to scaling and constants) to the log-odds for inlier assignment under a Gaussian DPMM with tied covariance. The inlier probability is: KK5 where KK6 and KK7 is the standard RMD score. This interpretation provides a probabilistic grounding and quantifies the OOD likelihood as a function of RMD under generative mixture modeling assumptions (Linderman et al., 12 Feb 2025).

3. Algorithmic Implementation and Practical Aspects

The canonical RMD computation workflow consists of:

Offline (single computation):

  1. Extract features and compute KK8, KK9 for each class and all data.
  2. Compute marginal kk0, kk1 over all data points.
  3. Precompute kk2 and kk3.

Online (per test point kk4):

  1. Compute kk5.
  2. For each kk6, compute kk7.
  3. Compute kk8.
  4. For each kk9, compute μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i0.
  5. Return μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i1 as the OOD score.

RMD is hyperparameter-free: no tuning or outlier validation is required. In practice, regularization (e.g., μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i2) ensures numerical stability for high-dimensional, low-sample-size regimes. RMD can be applied to features from any or multiple neural network layers. For ill-conditioned problems, combining RMD scores from several layers via averaging can yield increased robustness (Ren et al., 2021).

4. Theoretical Properties, Insights, and Limitations

MD fails for near-OOD detection in high dimension because most eigen-directions are class-nonspecific and drown out signal from discriminative axes. If μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i3 is the covariance eigendecomposition, then generically only a small subset of directions are informative (e.g., top 100 of 1024 for CIFAR-100 vs CIFAR-10). RMD cancels these non-discriminative components by subtracting the marginal MDμk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i4, leaving only the class-specific contributions: μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i5 where μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i6 are eigenvectors; only truly discriminative dimensions contribute to RMD.

Ablations confirm RMD exactly cancels non-discriminative noise and achieves perfect separation in diagnostic setups (e.g., μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i7 Gaussian, only first axis relevant). RMD is more stable than MD during continued model training, avoiding the AUROC "peak then degrade" phenomenon observed with MD (Ren et al., 2021).

The principal theoretical limitation is the Gaussianity assumption. For highly non-Gaussian feature distributions, replacing class-conditional and background Gaussians with more flexible densities (such as normalizing flows) can generalize the RMD framework, although gains reported are modest. In "far-OOD" settings (e.g., CIFAR-10 vs SVHN), where classes are well separated, MD already performs optimally, so RMD adds little.

5. Extensions: Bayesian Nonparametrics and Hierarchical Models

The standard RMD assumes tied (shared) covariance across classes. This is optimal only when class covariances are similar and abundant data is available. In cases of cluster-wise covariance heterogeneity or limited samples per class, hierarchical Bayesian models provide substantial improvements.

Three principal variants, as formalized in (Linderman et al., 12 Feb 2025), are:

  • Full-covariance model: Each class μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i8 has its own μk=1Nki:yi=kzi\mu_k = \frac{1}{N_k}\sum_{i: y_i = k} z_i9, allowing arbitrary covariance differences; predictive densities become multivariate Student-Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top0.
  • Diagonal-covariance model: Each feature Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top1 in each class Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top2 has independent variances, regularized toward the global mean.
  • Coupled diagonal-covariance model: Per-class variances are constrained by a global scale variable Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top3, inducing collective scaling while permitting direction-specific flexibility.

These models implement outlier detection by computing the log-density ratio Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top4 and thresholding the inlier probability. Empirically, hierarchical models outperform standard RMD when class covariance heterogeneity is pronounced or per-class sample sizes are small. For near-OOD image benchmarks (e.g., SSB Hard, NINCO on ImageNet-1K), the coupled-diagonal DPMM achieves higher AUROC than RMD. In very high dimension, diagonal models provide robust performance gains, as overfitting with full covariance is a concern (Linderman et al., 12 Feb 2025).

6. Benchmark Performance and Empirical Results

RMD has been evaluated across a diverse set of OOD detection settings. On near-OOD vision tasks (CIFAR-100 vs CIFAR-10, Wide-ResNet-28-10), RMD consistently improves on MD by several percentage points in AUROC (e.g., 74.91 → 81.01%). Substantial gains are demonstrated in genomics OOD (53.10 → 68.98% AUROC). With pretrained or fine-tuned models (ViT-B/16, BiT-R50x1, CLIP, BERT), RMD often exceeds the performance of both MD and max-softmax probability (MSP). Across all benchmarks, RMD rarely underperforms MD, and in cases with pronounced within-feature shared variability or tight class overlap, offers up to +15 AUROC improvements (Ren et al., 2021).

In the Bayesian nonparametric framework, hierarchical DPMM models further improve upon RMD for datasets with highly variable covariances across classes or limited samples per class, particularly in near-OOD regimes (Linderman et al., 12 Feb 2025).

7. Comparisons, Practical Guidelines, and Open Directions

RMD is a member of a family of embedding-space OOD detection methods:

Method Model Assumption OOD score
Mahalanobis Distance (MDS) Parametric, tied Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top5 Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top6
Relative Mahalanobis Distance Parametric, tied Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top7 Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top8
Hierarchical DPMMs Hierarchical/mixed Σ=1Nk=1Ki:yi=k(ziμk)(ziμk)\Sigma = \frac{1}{N}\sum_{k=1}^K \sum_{i: y_i=k} (z_i - \mu_k)(z_i - \mu_k)^\top9
z=f(x)z' = f(x')0-Nearest Neighbor Nonparametric Min. distance to IND neighbors
Energy/softmax-based Network logit based Max(logit), ODIN, Energy

RMD is more robust to hyperparameters and the choice of feature layer than MD or partial Mahalanobis Distance (PMD), which requires eigenbasis tuning. RMD does not require retraining or access to OOD samples, can be applied post-hoc to any trained model, and is straightforward to implement. For high-dimensional or poorly conditioned covariance matrices, regularization or dimension reduction are advised.

The framework naturally extends to non-Gaussian generative models, and the Bayesian interpretation invites further advances in estimating mixed manifolds or learning the discriminative subspace directly—currently an open problem (Ren et al., 2021, Linderman et al., 12 Feb 2025). A plausible implication is that further improvements in near-OOD detection may arise from generative or discriminative methods that incorporate both flexible class-conditional density modeling and explicit background cancellation.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Relative Mahalanobis Distance (RMD).