Percentile-Based Anomaly Scoring

Updated 28 October 2025

Percentile-based anomaly scoring is a method that ranks observations by their empirical percentiles, providing clear and interpretable metrics for detecting outliers.
Rigorous fractional scoring algorithms address tied scores by proportionally dividing quantile intervals, ensuring that declared anomalies match target false alarm rates.
The approach extends to ensemble methods and context-adaptive systems, offering scalable and domain-generalized solutions for robust anomaly detection.

A percentile-based anomaly scoring system quantifies the degree to which an observation is anomalous by evaluating its position relative to a reference distribution, typically via empirical percentiles or quantile ranks. This approach is foundational for anomaly detection in diverse domains, including bibliometric evaluation, multivariate statistical analysis, unsupervised learning, large-scale sensor networks, and robust industrial filtering. The core principle is to assign anomaly scores or rank classes that directly correspond to distributional percentiles, ensuring interpretability, threshold-cognizance, and cross-domain comparability. Rigorous handling of ties, discrete score distributions, and context-dependent density regimes is essential for robust implementation.

1. Mathematical Foundations of Percentile-Based Anomaly Scoring

Percentile-based scoring assigns each item a value corresponding to its rank in the cumulative distribution of scores within a reference population. Let $\{x_i\}_{i=1}^N$ denote a set of observations, each with a scalar anomaly score $s(x_i)$ . The empirical percentile (rank) of a test item $x$ is:

$r(x) = \frac{1}{N} \sum_{i=1}^N \mathbf{1}\{s(x_i) \leq s(x)\}$

where $\mathbf{1}\{\cdot\}$ is the indicator function. An item is considered anomalous if its percentile falls below a specified threshold $\alpha$ , allowing fine control over the false alarm rate:

$\mathrm{Anomaly}(x) = \begin{cases} 1 & r(x) \leq \alpha \ 0 & \text{otherwise} \end{cases}$

In the presence of discrete or tied scores, fractional assignment methods rigorously address ambiguity. Suppose several items share the same score at a threshold; fractional scoring divides the quantile interval they occupy at the threshold between percentiles above and below, so the total number in each percentile class matches the theoretical proportion (Schreiber, 2013, Waltman et al., 2012). For a group of $K$ items tied at a threshold, the quantile interval $[\frac{i-1}{N}, \frac{i}{N}]$ for each $i$ is considered, and overlaps with the class boundaries are computed precisely.

The expected score for $K$ percentile rank classes with weights $w_k$ and proportions $p_k$ is

$R(K) = \sum_{k=1}^{K} p_k w_k$

This strict formalization eliminates biases induced by arbitrary tie-breaking and ensures that, for any false alarm (percentile) threshold, the declared anomalies align exactly with the target rate.

2. Robust Tie-Handling and Fractional Scoring Algorithms

Conventional percentile assignment encounters artifacts in the presence of ties at threshold boundaries, particularly with discrete distributions common in bibliometrics and many anomaly detection tasks. Integer-based rule variants—such as assigning all threshold-tied items to the lower/upper class, or ad hoc rounding—distort the total score and introduce sensitivity to minor changes.

Fractional scoring algorithms (Schreiber, 2013, Waltman et al., 2012) resolve these issues by assigning to each tied item the proportion of its quantile interval that falls within each class:

Algorithm Name	Tie Handling Strategy	PR Score Deviation
Lower-bound rule	All ties to lower class	Underestimate
Upper-bound rule	All ties to upper class	Overestimate
Average percentile	Assign mean quantile to block	Approximate
Fractional scoring	Rigorous quantile proportion	Zero (exact match)

Mathematically, for each item at rank $i$ (among $N$ total), its interval $[\frac{i-1}{N}, \frac{i}{N}]$ is intersected with each percentile class $[p_{k-1}, p_k]$ . The fraction assigned is $f_k = \max(\min(p_k, q_{i+1}) - \max(p_{k-1}, q_i), 0) / (q_{i+1} - q_i)$ , where $q_i$ is the cumulative fraction for score value $i$ .

Such precise fractional allocation leads to empirical scores that reproduce theoretical expectations precisely, as validated on large-scale bibliometric and anomaly detection datasets. Small changes (e.g., an extra citation or a single outlier) no longer shift large blocks between classes, ensuring smooth, interpretable scoring.

3. Contextual Extensions: Distribution-Aware and Adaptive Scoring

Real-world anomaly detection often requires percentile-based scoring adapted to heterogeneous data distributions, sensor modalities, crowd density regimes, and uncertainty profiles. Recent systems extend fundamental percentile assignment with data-driven class boundaries, distribution discrimination, and contextual normalization.

In dense crowd analysis (VelocityNet (AlGhamdi et al., 21 Oct 2025)), person-specific motion magnitudes are clustered to define context-dependent "normal" velocity ranges. Percentile-based scoring then quantifies deviation as percent distance from these boundaries. For example:

$\mathcal{A}(m) = \begin{cases} \frac{m - m^\text{normal}_{\max}}{m^\text{normal}_{\max}} \times 100\%, & m > m^\text{normal}_{\max} \ \frac{m - m^\text{normal}_{\min}}{m^\text{normal}_{\min}} \times 100\%, & m \leq m^\text{normal}_{\min} \ 0, & \text{else} \end{cases}$

Such systems dynamically adapt thresholds to empirical density clusters and avoid spurious alerts for marginal deviations.

Distribution-level discrimination (Overlap loss (Jiang et al., 2023)) minimizes overlap between anomaly and normal score distributions, without reliance on fixed percentile targets or margins. The overlap area is computed using kernel-density based empirical CDFs, with the loss:

$O(\boldsymbol{s}_n, \boldsymbol{s}_a) = P(\boldsymbol{s}_n > c) + P(\boldsymbol{s}_a < c)$

where $c$ is the intersection point of score distributions and $P$ is computed via empirical CDFs. This method is adaptive and robust to contamination in the unlabeled set.

4. Statistical Performance, Calibration, and Thresholding

Percentile-based scoring systems facilitate direct control of threshold rates (false alarm rate, recall rate) and uniformity across detectors, domains, and time windows. For example, in multi-sensor anomaly scoring (M $^2$ AD (Alnegheimish et al., 21 Apr 2025)), calibrated thresholds are set via the percentile of a Gamma-distributed global anomaly score, derived from aggregated, GMM-calibrated sensor residuals:

$S_t = -2\sum_{k=1}^d \lambda_k \log p(e_t^k)$

$\gamma = \text{Gamma-percentile threshold}$

$z_t = \mathds{1}[S_t > \gamma]$

Percentile-based aggregation methods enable consistent thresholds regardless of heterogeneity or dependencies, and facilitate root-cause attribution via sensor-level score contributions.

Percentile scoring is also central to mitigating false alarms in intrusion detection (Zohrevand et al., 2019): mapping raw anomaly scores to uniform percentiles or "bits of meta-rarity" allows explicit control of the rate of alerts regardless of the underlying score distributions.

5. Extensions to Ranking, Ensembles, and Score Similarity

Percentile-based scores are fundamental for ranking observations by abnormality and for constructing ensembles of diverse detection algorithms. Rank-SVM anomaly approaches (Qian et al., 2014, Qian et al., 2015, Root et al., 2016) use percentile ranks to declare anomalies at prescribed false alarm rates, and can approximate minimum volume sets adaptively for any target $\alpha$ without retraining.

Score similarity measures for ensemble construction (Copula Quadrant Similarity (Davidow et al., 2021)) operate directly on percentile-transformed anomaly scores, quantifying joint tail dependence in the upper quadrant $[q, 1]^2$ via conditional copula densities. This supports unsupervised clustering and maximally informative ensemble weighting.

Ensemble Step	Percentile-based Mechanism
Detector clustering	Tail similarity in anomaly score percentiles
Aggregation	Top percentile rank, average, max, or min
Thresholding	Top- $X$ % anomaly scores, percentile-of-combined scores

Uniform percentile normalization is crucial for comparability and interpretability in multi-detector and multi-domain settings.

6. Practical Considerations, Scalability, and Domain Generalization

Large-scale and heterogeneous datasets demand scalable, domain-adaptive percentile scoring mechanisms. Advanced systems implement local density-based anomaly score normalization (Wilkinghoff et al., 13 Sep 2025), scaling scores by reference sample density to mitigate domain mismatch, thus maintaining coherent percentile distributions across shifting source and target domains.

Patchwise, batch-level percentile thresholding and uncertainty calibration (SSFilter (Liu et al., 19 Feb 2025)) robustly filter noisy data by dynamically adapting per-batch cutoffs and integrating MC-dropout uncertainty measures using top percentile patch scores and consensus voting, supporting sample-level selection at industrial scale.

In all these cases, fractional and percentile-based methods offer scalability—precomputable normalization constants, sample-wise adaptability, and straightforward extensions to real-time, streaming, and cross-domain settings.

7. Impact and Recommendations for System Designers

Percentile-based anomaly scoring is mathematically rigorous and empirically validated for robust anomaly detection, precise threshold control, and fairness across heterogeneous data sources. Its adoption is strongly recommended in:

Bibliometric evaluation for publication ranking, avoiding genre- and field-specific score biases (Schreiber, 2013, Waltman et al., 2012).
Industrial sensor fusion and predictive maintenance, for interpretable, data-driven calibration (Alnegheimish et al., 21 Apr 2025).
Crowd, security, and fault detection systems, for automated context adaptation and real-time thresholding (AlGhamdi et al., 21 Oct 2025, Zohrevand et al., 2019).
Ensemble construction and method comparison, for unsupervised, scale-invariant strategy selection (Davidow et al., 2021).

Implementations should rigorously address ties, use fractional assignment at rank thresholds, and normalize or calibrate scores to target percentiles, leveraging kernel density or empirical CDF methods as appropriate. Adaptations for context-sensitive class boundaries and dynamic density regimes further enhance robustness and specificity. Uniform percentile normalization and rank aggregation remain best practices for ensemble anomaly systems and domain-generalized deployments.