Anomaly Score Learning Techniques

Updated 14 January 2026

Anomaly score learning is a method that assigns numerical values to data instances based on statistical rarity and semantic deviations.
It integrates density-based, margin-based, and deep representation techniques to robustly detect anomalies in diverse domains.
The approach employs principled objective functions, regularization, and multimodal fusion to improve detection accuracy and severity assessment.

Anomaly score learning refers to the systematic design, training, and evaluation of data-driven functions that assign numerical scores quantifying the degree of abnormality or “anomaly” of a data instance. These scores serve to rank or threshold samples for anomaly detection, localization, and severity assessment in domains ranging from high-dimensional tabular data and time series to images, point clouds, and scientific experiments. Effective anomaly score learning requires principled objective functions, architectures, and regularization strategies that align the score with statistical rarity, domain-relevant deviation, or semantic severity. The field encompasses classic density-based rankings, modern neural estimators, supervised and self-supervised routines, and multimodal fusion.

1. Formalization and Statistical Principles

Let $x\in\mathbb{R}^d$ be an input, drawn from either a nominal (normal) or anomalous distribution. An anomaly scoring function $s(x)$ is any mapping $s:\mathbb{R}^d\to\mathbb{R}$ that induces an order reflecting increasing "anomalousness." The optimal scoring function in the density-based paradigm is typically some monotonic function of the density, i.e., $s^*(x) = -\log p_{\text{normal}}(x)$ , or its empirical rank among normals, with higher $s(x)$ indicating lower likelihood under the nominal distribution (0910.5461, Goix et al., 2015).

Many statistical frameworks recast anomaly detection as (i) minimum-volume set estimation at level $1-\alpha$ (for false alarm $\alpha$ control), (ii) ranking by $p$ -values or excess-mass curves, or (iii) maximizing asymptotic ROC/AUC by dominating the score on inliers relative to outliers (0910.5461, Goix et al., 2015). The construction of the score function may use unsupervised data, true/weak/inexact labels, or multimodal signals.

2. Core Methodological Classes

2.1 Density and Level-Set Based Scoring

Density-based anomaly scoring relies on learning $p(x)$ via maximum likelihood estimation, normalizing flows, autoregressive estimators, or kernel methods. The negative log-likelihood $s(x) = -\log p(x)$ is widely adopted in neural density models. For instance, autoregressive models (such as MADE) factorize $s(x)$ 0 as a product of one-dimensional conditionals, $s(x)$ 1, estimated via masked feed-forward neural networks (Iwata et al., 2019). In supervised variants, a pairwise regularizer enforces anomalous points to have lower likelihood than normals. For highly imbalanced or inexactly-labeled data, loss surrogates such as smooth AUC maximization over max-score instances in "bags" of possibly anomalous samples are utilized (Iwata et al., 2019).

Graph-based scores such as the $s(x)$ 2-NN graph anomaly score [S(x) = \frac{1}{n}\sum_{i=1}ⁿ 1_{R_S(x)\leq R_S(x_i)}] avoid explicit density estimation and provide asymptotic optimality guarantees for various composite hypotheses (0910.5461, Qian et al., 2015).

2.2 Score Distribution and Margin-Based Learning

Explicitly optimizing the separation between the normal and anomalous score distributions, without relying on fixed targets or margins, is proposed via bounded overlap loss (Jiang et al., 2023). Here, kernel density estimators (KDEs) are used to nonparametrically estimate score densities $s(x)$ 3 for normals and anomalies, and the loss is defined as the area of overlap, $s(x)$ 4 at the score intersection $s(x)$ 5.

Deviation learning frameworks design the score function so that normal instances approximate a reference distribution (e.g., a Gaussian), while anomalies are enforced to attain statistically significant deviations, using soft-label weighting and mini-batch instance-reweighting to handle contamination (Das et al., 2024).

2.3 Deep and Representation-Based Methods

Several deep anomaly scoring strategies leverage autoencoders, variational autoencoders (VAEs), generative adversarial networks (GANs), and hybrids thereof (Šmídl et al., 2019, Lüer et al., 2023). Variational and adversarial approaches define scores combining reconstruction error, latent-space deviations from a reference prior, and discriminator feature-space discrepancies. Nonlinear fusion of multiple error criteria is achieved using kernel SVMs in low-dimensional error space, offering improved robustness over manual linear weighting (Lüer et al., 2023).

Contrastive and self-supervised techniques such as AnomalyCLR train representation encoders (e.g., permutation-invariant transformers) with physically and anomaly-informed augmentations; score assignment downstream is performed by autoencoder reconstruction error in the learned representation space (Dillon et al., 2023).

Center-based discriminative learning (CEDL) integrates geometric clustering of normals (to a deep feature center) into a unified radial logit, yielding an anomaly score as Euclidean distance to the center (Darban et al., 15 Nov 2025).

2.4 Score Regularization and Curriculum

Score-guided autoencoders incorporate an auxiliary scorer, trained with regularization terms that use 'easy' normal and abnormal cases to anchor score values, thus maximally widening the separation in ambiguous or "transition" regions (Huang et al., 2021).

3. Supervised, Semi/Weakly Supervised, and Unsupervised Settings

Learning anomaly scores encompasses purely unsupervised methods, supervised (with precise or weak labels), and semi-supervised formulations with limited or inexact anomaly information. In weak supervision with inexact labels (where sets of samples are known only to contain at least one anomaly), score functions are trained to maximize smooth surrogates of inexact AUC by pushing the maximum score in each ambiguous set above those of labeled normals (Iwata et al., 2019).

Overlap loss-based methods, as well as variants with ranking or ordinal regression terms, operate effectively given a handful of labels and are robust to contamination in the unlabeled pool, as the entire score distribution is adaptively reshaped without pre-specified targets (Jiang et al., 2023).

4. Multimodal, Structure-Aware, and Severity-Aligned Anomaly Scores

Contemporary applications require anomaly scores to align with structural or semantic aspects beyond binary abnormality. Multimodal methods, such as MDSS, generate image-level anomaly maps by fusing discrepancies between student-teacher RGB image embeddings and signed-distance scores from 3D point clouds, with feature-space scale alignment and pixel-wise aggregation (Sun et al., 2024).

In multilevel anomaly detection (MAD), the score must capture not just presence but severity of anomaly; metrics such as C-index and Kendall's Tau-b quantify the correspondence between score ranking and true severity levels. Results demonstrate that while conventional anomaly scores capture binary deviation, further work is required to ensure alignment with practical severity, especially in medical and industrial domains (Cao et al., 2024).

5. Objective Function Design and Optimization Schemes

The choice of loss functions and training objectives is central in anomaly score learning. Typical losses include:

Maximum Likelihood: $s(x)$ 6 for density estimators.
Contrastive: Pulls representation pairs for augmented normal neighbors together, pushes pairs for anomaly-augmented instances apart (Dillon et al., 2023).
Pairwise Sigmoid or AUC Surrogates: $s(x)$ 7 regularizer to order normal-abnormal pairs (Iwata et al., 2019).
Overlap Area: Integral loss over KDE-estimated score distributions, not requiring a fixed margin (Jiang et al., 2023).
Deviation: Penalizes scores for normals deviating from reference; enforces margin for labeled/suspected anomalies (Das et al., 2024).
Center-based Losses: Binary cross-entropy with logits parameterized by radial distance from a learned normal center (Darban et al., 15 Nov 2025).

Optimization employs standard stochastic gradient descent variants (Adam, SGD), with instance-level weighting schemes, early stopping, and careful batch construction (e.g., including ambiguous sets for inexact labels (Iwata et al., 2019)).

6. Empirical Performance, Applications, and Limitations

Extensive benchmarks across tabular, image, time-series, and scientific domains confirm key properties:

Minimal anomaly label budgets (1–5 instances) yield significant AUC improvements for supervised density regularization (Iwata et al., 2019).
Overlap loss and deviation-based strategies offer strong gains over margin-based and fixed-target approaches on highly contaminated data (Jiang et al., 2023, Das et al., 2024).
Score fusion frameworks (e.g., combining SVDD and flow-based rareness and differentness) outperform standalone models for event-level detection in collider physics (Caron et al., 2021).
Multilevel scoring alignment remains a challenge; knowledge-distillation and memory-bank models better capture severity orderings than pure reconstruction/density, while few-shot prompted MLLMs can further improve ordinal alignment (Cao et al., 2024).

Main limitations involve sensitivity to density mis-specification, the challenge of defining score distributions that correspond to semantic risk (not just statistical rarity), and robustness under data contamination or covariate shift.

References

(Iwata et al., 2019) Supervised Anomaly Detection based on Deep Autoregressive Density Estimators (Iwata & Yamanaka, 2019)
(Iwata et al., 2019) Anomaly Detection with Inexact Labels
(Sun et al., 2024) Memoryless Multimodal Anomaly Detection via Student-Teacher Network and Signed Distance Learning
(0910.5461) Anomaly Detection with Score functions based on Nearest Neighbor Graphs
(Caron et al., 2021) Rare and Different: Anomaly Scores from a combination of likelihood and out-of-distribution models to detect new physics at the LHC
(Jiang et al., 2023) Anomaly Detection with Score Distribution Discrimination
(Cao et al., 2024) Are Anomaly Scores Telling the Whole Story? A Benchmark for Multilevel Anomaly Detection
(Goix et al., 2015) On Anomaly Ranking and Excess-Mass Curves
(Dillon et al., 2023) Anomalies, Representations, and Self-Supervision
(Das et al., 2024) Adaptive Deviation Learning for Visual Anomaly Detection with Data Contamination
(Qian et al., 2015) Learning Efficient Anomaly Detectors from $s(x)$ 8-NN Graphs
(Lim et al., 2023) MadSGM: Multivariate Anomaly Detection with Score-based Generative Models
(Šmídl et al., 2019) Anomaly scores for generative models
(Aarrestad et al., 2021) The Dark Machines Anomaly Score Challenge
(Lüer et al., 2023) Adversarial Anomaly Detection using Gaussian Priors and Nonlinear Anomaly Scores
(Darban et al., 15 Nov 2025) CEDL: Centre-Enhanced Discriminative Learning for Anomaly Detection
(Huang et al., 2021) Enhancing Unsupervised Anomaly Detection with Score-Guided Network