Papers
Topics
Authors
Recent
Search
2000 character limit reached

Similarity-Distance-Magnitude (SDM) Activation

Updated 3 July 2026
  • The SDM activation function is a neural network calibration method that fuses similarity, distance, and magnitude to improve uncertainty awareness and robust classification.
  • It computes similarity via exemplar depth, evaluates distance using quantile-based metrics, and adjusts logit magnitude for sharper decision boundaries.
  • Empirical studies show that SDM enhances OOD detection and selective classification in applications like language modeling, image recognition, and verification.

The Similarity-Distance-Magnitude (SDM) activation function is an advancement in neural network output calibration and robustness that replaces or augments the classical softmax layer with a formulation explicitly incorporating three geometric signals: similarity to known training exemplars, distance to the training manifold, and the standard decision-boundary magnitude. By combining these notions within the activation and calibration pipeline, SDM activations provide enhanced epistemic uncertainty awareness, strong robustness to distribution shifts, and interpretable likelihoods directly grounded in exemplar neighborhoods. SDM activations have found utility across deep learning applications—including language modeling, classification, and verification—with demonstrated improvements in selective classification, out-of-distribution (OOD) rejection, and interpretability-by-exemplar (Schmaltz, 27 Feb 2025, Schmaltz, 16 Sep 2025, Schmaltz, 30 Oct 2025).

1. Mathematical Formulation and Components

Let xx denote an input, fθf_\theta a frozen or fixed underlying model (e.g., a LLM or CNN), and gg a small, trainable 1D CNN-based adaptor producing feature representation h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M. The SDM activation explicitly encodes:

  • Similarity (qq): The depth or run-length of consecutive nearest neighbors in feature space that match the predicted class and ground-truth label (i.e., “depth-match”). For qq computation, all training set representations $\{h'_n^{\rm tr}\}$ are sorted by L2L_2 distance to hh', and qq is the maximal fθf_\theta0 such that the first fθf_\theta1 neighbors are both predicted and labeled as the current predicted class.
  • Distance (fθf_\theta2): The quantile location of the input’s nearest-neighbor distance relative to the class-conditional empirical cumulative distribution function (eCDF) on a calibration split. For class fθf_\theta3, fθf_\theta4 is the empirical CDF of nearest-neighbor distances for calibration points with fθf_\theta5. For test input:

fθf_\theta6

This approaches zero for far-out-of-distribution points.

  • Magnitude (fθf_\theta7): The raw logit or decision-boundary margin produced by the final linear layer atop fθf_\theta8, analogous to the standard class logit.

The SDM activation for each class fθf_\theta9 is given by:

gg0

Softmax is recovered at gg1, gg2, yielding temperature gg3. When gg4 or gg5, the distribution is uniform, signaling high epistemic uncertainty (Schmaltz, 27 Feb 2025, Schmaltz, 16 Sep 2025).

2. Theoretical Motivation and Distinction from Softmax

Traditional softmax activations only encode the relative magnitude of class logits (i.e., distance to the decision boundary, “aleatoric” uncertainty) and do not distinguish between well-populated regions of the feature space and OOD or adversarial regions. SDM addresses this by:

  • Similarity-awareness (gg6): Quantifies the local density of correctly classified and predicted training points, serving as a local effective sample-size indicator. Low gg7 signals lack of reliable exemplar support.
  • Distance-awareness (gg8): Measures the quantile position of the input’s distance to the training manifold in the calibration set, identifying departures from the known distribution.
  • Magnitude (gg9): Maintains the standard margin confidence.

Multiplicatively combining these (via exponentiating the logits with base h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M0 and scale h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M1) yields a sharply peaked distribution for well-supported in-distribution points, and a flatter/uniform distribution for OOD or low-support points. Epistemic uncertainties are directly encoded, providing robust confidence estimates that do not collapse in distribution-shifted or OOD settings (Schmaltz, 27 Feb 2025, Schmaltz, 16 Sep 2025, Schmaltz, 30 Oct 2025).

3. Training and Calibration Methodologies

SDM networks employ a two-stage pipeline:

  • Adaptor Training: The model trains a 1D-CNN-based adaptor (parameters h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M2), linear head (h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M3), and calibration structures by minimizing the negative log-likelihood in the SDM base:

h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M4

After each epoch, h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M5 and h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M6 are recomputed and frozen for the next epoch. Early stopping is triggered when the balanced calibration loss plateaus.

  • Calibration/Selective Classification: A held-out calibration set is used to build class-conditional eCDFs for distance statistics and SDM outputs. Thresholds (e.g., h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M7 for each class and HR region h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M8) are found such that the class-conditional accuracy among accepted points exceeds a target (e.g., h=g(fθ(x))RMh' = g(f_\theta(x)) \in \mathbb{R}^M9). DKW bounds are applied to quantify finite-sample uncertainty, and conservative region-specific thresholds are computed to achieve guaranteed accuracy while maximizing acceptance rate (Schmaltz, 27 Feb 2025, Schmaltz, 16 Sep 2025).
  • Inference: For test input, the model computes qq0, obtains qq1, qq2, qq3, and calculates SDM probabilities. Using calibration-derived thresholds, a selective classifier admits a prediction only if both qq4 and SDM output exceed their respective class-conditional thresholds; otherwise, the input is rejected (abstained) (Schmaltz, 16 Sep 2025, Schmaltz, 27 Feb 2025).

4. Empirical Performance and Applications

SDM activations have been empirically validated on language modeling, closed-box binary classification, fact-checking, and LLM sequence verification (Schmaltz, 27 Feb 2025, Schmaltz, 16 Sep 2025, Schmaltz, 30 Oct 2025). Salient findings include:

  • OOD Robustness: On out-of-distribution datasets (e.g., IMDb sentiment OOD, fact-check benchmarks), SDM is the only method that achieves qq5 accuracy on admitted points while rejecting the remainder, outperforming softmax, temperature scaling, and conformal calibrators.
  • Coverage vs. Accuracy Trade-off: SDM achieves high accuracy on selective predictions while maintaining reasonable acceptance rates. For example, SDM achieves [1.00, 1.00] class-conditional accuracy on ≈1% of OOD-admitted IMDb examples, where other methods either admit too many OOD points or achieve low coverage.
  • Fine-tuning SDM LMs: Fine-tuning decoder-only Transformers with the SDM next-token loss and contrastive sampling yields increased statistical efficiency (lower abstention rates) and maintains ≥95% accuracy in high-probability acceptance regions.
  • Data Quality Discovery: SDM’s strict acceptance criteria highlight label errors in calibration/test splits, surfacing genuine annotation mistakes (Schmaltz, 27 Feb 2025).
  • Black-box LLM Calibration: Applying SDM activations atop LLM output logits enables reliable selective generation and region-specific coverage guarantees, without requiring retraining the core backbone (Schmaltz, 16 Sep 2025, Schmaltz, 27 Feb 2025).

5. Interpretability and Exemplar-based Explanations

Accepted points under SDM admit direct interpretability via their depth-matching exemplars: for each test input, the first qq6 training points forming the depth-match can be retrieved and visualized, all sharing the predicted class label and prediction. This enables “interpretability-by-exemplar” and partitioning of the acceptance region by similarity, distance, and calibration statistics. The mechanism offers richer, instance-grounded explanations than standard softmax-based confidence, which lacks any link to concrete training exemplars (Schmaltz, 16 Sep 2025, Schmaltz, 30 Oct 2025).

6. Practical Integration and Implementation

SDM activation is integrated as a lightweight, final-layer module atop a frozen or fine-tuned backbone. Key practical guidelines include:

  • Index and cache all training embeddings for efficient nearest-neighbor search (with FAISS or Annoy for large datasets).
  • Attach a 1D-CNN adaptor with up to 1000 filters, followed by a linear layer; batch sizes ∼50 and Adam learning rates of qq7 are typical.
  • For high-confidence selective classification, use class-conditional thresholds found via calibration-set eCDFs and set an acceptance risk parameter qq8 as desired.
  • In LLMs and Transformers, the SDM module operates on penultimate representations; in CNNs, it operates on global pooled embeddings.
  • The dominant computational cost is nearest-neighbor indexing; all other operations are negligible relative to ML inference time (Schmaltz, 16 Sep 2025, Schmaltz, 30 Oct 2025).

SDM extends and generalizes the notion of RBF or prototype-based similarity in neural classification (Amirian et al., 2022). In contrast to the quadratic RBF kernel qq9, which encodes similarity via Mahalanobis distance to class centers, SDM combines similarity depth, distance quantile, and margin in a single multiplicative activation. Unlike contrastive, triplet, or margin-based metric learning—which require explicit mining and margin balancing—SDM “bakes in” all three epistemic signals at the activation/output level, avoids complex sampling schemes, and supports end-to-end calibration and selective classification. Unlike Gaussian or multiquadric RBFs, SDM’s formulation directly supports instance-level calibrations and achieves robust OOD detection without vanishing gradients or exponential attenuation (Amirian et al., 2022, Schmaltz, 27 Feb 2025, Schmaltz, 16 Sep 2025).


Key References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Similarity-Distance-Magnitude (SDM) Activation Function.