Adaptive and Aggressive Rejection (AAR)

Updated 3 December 2025

AAR is a dynamic rejection framework that filters anomalous or adversarial contributions using robust statistical thresholds and a Gaussian mixture model for soft rejection.
It integrates a warm-up phase with hard rejection and a main phase with ternary weighting to optimize data retention and improve performance metrics like AUROC.
In adaptive control, AAR employs disturbance observers and finite-time controllers to aggressively cancel perturbations, ensuring rapid convergence under uncertainty.

Adaptive and Aggressive Rejection (AAR) encompasses a family of algorithmic mechanisms for robustly filtering out undesirable or adversarial contributions during inference or learning. In anomaly detection, AAR refers to a dynamic, data-driven rejection framework that adaptively identifies and excludes contaminated samples by jointly leveraging robust statistical thresholds and probabilistic modeling. In nonlinear control, AAR describes the coordinated use of adaptive, experience-accelerated disturbance estimators and finite-time controllers to aggressively cancel exogenous perturbations. Across both contexts, AAR is characterized by its principled, multi-phase rejection logic and its capacity to dynamically optimize the trade-off between retention and exclusion under uncertainty.

1. Mathematical Foundations of AAR for Anomaly Detection

AAR for anomaly detection operates on a contaminated dataset $\mathcal{D} = \{x_i\}_{i=1}^N$ , using anomaly scores $s_i = s(x_i)$ . For reconstruction-based models, %%%%2%%%%, where $f$ is typically an autoencoder. The framework dynamically rejects anomalies in each mini-batch via a tiered thresholding procedure:

Modified z–score (hard rejection): For batch size $B$ , compute

$\hat s = \mathrm{median}\{s_i\}, \quad \mathrm{MAD} = \mathrm{median}|s_i - \hat s|, \quad m_i = \frac{0.6745 (s_i - \hat s)}{\mathrm{MAD}}.$

Samples with $m_i > 3.5$ (i.e., $s_i > \tau_N$ ) are hard rejected with threshold

$\tau_N = \hat s + \frac{3.5}{0.6745} \mathrm{MAD}.$

Gaussian Mixture Model (GMM) intersection (soft rejection): Fit a two-component GMM,

$p(s) = \pi_1 \mathcal{N}(s\,|\,\mu_1,\sigma_1^2) + \pi_2 \mathcal{N}(s\,|\,\mu_2,\sigma_2^2), \quad \mu_1 < \mu_2.$

The intersection threshold $\tau_I$ solves

$\mathcal{N}(\tau_I\,|\,\mu_1,\sigma_1^2) = \mathcal{N}(\tau_I\,|\,\mu_2,\sigma_2^2).$

Explicitly,

$\tau_{I} = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a},$

with

$a = \frac{1}{\sigma_1^2} - \frac{1}{\sigma_2^2}, \quad b = \frac{\mu_2}{\sigma_2^2} - \frac{\mu_1}{\sigma_1^2}, \quad c = \frac{\mu_1^2}{\sigma_1^2} - \frac{\mu_2^2}{\sigma_2^2} - 2 \ln \frac{\sigma_2}{\sigma_1}.$

$z$ – $\sigma$ (stability guard): Compute

$\tau_\sigma = \mu_n + z\,\sigma_n$

with $(\mu_n, \sigma_n)$ the mean/std of the “normal” GMM component, $z\in[2,3]$ .

The final soft rejection threshold is $\tau = \max\{\tau_I,\tau_\sigma\}$ .

2. Integrated Hard and Soft Rejection Strategies

AAR integrates these thresholds into a phased rejection weighting scheme:

Warm-up (first $E$ epochs): Only the hard cutoff $\tau_N$ is active; $w_i=0$ for $s_i>\tau_N$ , $w_i=1$ otherwise.
Main phase ( $e>E$ ): Use weights

$w_i = \begin{cases} 0,& s_i > \tau_N \ t_s, & \tau < s_i \leq \tau_N \ 1, & s_i \leq \tau \end{cases}$

with $t_s\in(0,1)$ (typically $0.1$).

This approach transforms sample selection from binary (keep/discard) into a ternary regime $(1, t_s, 0)$ , allowing ambiguous samples to influence training with attenuated impact. This aggressive rejection—removing $5\text{–}10\%$ more than the nominal contamination—empirically yields heightened robustness and improved AUROC, particularly when normal and anomaly score distributions overlap (Lee et al., 26 Nov 2025).

3. AAR Algorithm and Computational Complexity

The AAR training protocol proceeds as follows for each mini-batch and epoch up to $T$ :

Compute anomaly scores $s_i$ .
Determine $\tau_N$ for all epochs.
If $e>E$ $e > E$ :
- Fit GMM and derive $\tau_I$ , $\tau_\sigma$ , $\tau$ .
Assign weights $w_i$ according to the current phase.
Compute the weighted loss,

$L = \frac{1}{B} \sum_{i=1}^B w_i \|x_i - f(x_i)\|_2^2,$

and update the model.

Computationally, for mini-batch size $B$ and feature dimensionality $d$ , the per-step cost is dominated by the forward/backward pass $\mathcal{O}(B\,d)$ ; thresholding and EM for GMM fitting add $\mathcal{O}(B)$ overhead, rendering AAR scalable for large $N$ and $T$ .

4. AAR in Adaptive Control: Disturbance Rejection

In adaptive nonlinear control, AAR is exemplified by architectures that combine online disturbance identification with aggressive, finite-time error suppression (Li et al., 2020). Consider a nonlinear plant,

$\dot{x} = f(x) + g(x)u + D\epsilon_T,$

subject to exosystem-generated disturbance $\dot{\epsilon}_T = S\epsilon_T$ . The core components are:

Adaptive disturbance observer: State-derivative-free estimation using a filtered regressor, adaptive update of $S$ via Lyapunov-stable adaptation,

$\dot{\hat{S}}_{\mathrm{vec}} = \Gamma(\bar{\epsilon}^T F^T \otimes D)^T \tilde{e} + \kappa\Gamma\sum Y_i^T(\text{exp. replay residuals}),$

where experience replay ( $\kappa>0$ ) accelerates convergence.

Aggressive (finite-time) controller: Integral-type terminal sliding mode with adaptive gain,

$u = g^+(x)[-f(x) + \dot{x}_d - D\hat{\epsilon}_T - \operatorname{sign}(e_x) - k(t)\operatorname{sign}(\sigma)],$

enforcing $e_x \to 0$ in finite time provided certain rank/richness conditions are met.

The “adaptive” aspect derives from online parameter learning, while “aggressive rejection” is realized through high-bandwidth feed-forward cancellation and non-asymptotic convergence guarantees.

5. Empirical Evaluation and Performance

In anomaly detection benchmarks (Lee et al., 26 Nov 2025), AAR demonstrates:

On MNIST/Fashion-MNIST with up to $20\%$ synthetic contamination, AAR achieves average AUROC increases of $0.006$ (MNIST) and $0.016$ (F-MNIST) over the prior best latent outlier exposure (LOE).
On $30$ UCI-type tabular datasets contaminated at $20\%$ , AAR lifts average AUROC by $0.033{-}0.033$ across AE/MemAE/DSVDD backbones relative to robust statistics (MZ).
Overall, AAR’s average AUROC gain over all prior methods is $+0.041$ .

Ablations confirm that slightly over-estimating contamination (by $5\text{–}10\%$ ) enhances robustness; increasing $z$ in the $z$ – $\sigma$ cutoff improves stability with negligible loss; soft rejection with $t_s\approx 0.1$ optimizes the bias-variance trade-off.

In adaptive control (Li et al., 2020), experience replay reduces disturbance estimation time from $\sim8$ s to $\sim2$ s and ensures finite-time tracking in $<3$ s (in nonlinear benchmarks), contrasting with much slower convergence in experience-free observers.

6. Practical Tuning, Limitations, and Outlook

Tuning recommendations for anomaly detection include $E=10{-}20$ (warm-up epochs), $z=2{-}3$ (stability guard), and $t_s=0.05{-}0.2$ (soft rejection weight). For adaptive control, filter and adaptation gains are selected to balance estimation speed and sensitivity to noise, while the replay window is optimized against memory and numerical stability.

Notable limitations:

The univariate GMM used in AAR assumes a bimodal, near-Gaussian score distribution, which can be violated in highly skewed or multi-modal cases.
Hyperparameters $(E,z,t_s)$ still require domain-specific tuning.

Open research directions involve meta-learning for automatic parameter adaptation, integrating limited anomaly labels (semi-supervised AAR), extending to high-dimensional or non-Gaussian score spaces, and adapting AAR for real-time data streams in cyber-physical systems and IoT scenarios (Lee et al., 26 Nov 2025, Li et al., 2020).