SGD-Based OCSVM Solver

Updated 15 December 2025

The paper introduces SONAR, a SGD-based OCSVM solver that leverages strongly convex regularization and Random Fourier Features to achieve efficient streaming anomaly detection.
The method reduces computational complexity while providing last-iterate guarantees on Type I/II error rates via a single-pass SGD algorithm.
Empirical evaluations demonstrate robust adaptivity and performance on synthetic and real-world data under both benign and adversarial non-stationary conditions.

Stochastic Gradient Descent (SGD)-based One-Class Support Vector Machine (OCSVM) solvers provide an efficient approach to streaming outlier detection, overcoming key limitations of traditional kernel methods in both computational tractability and statistical guarantees. SONAR, as introduced by (Suk et al., 11 Dec 2025), leverages strongly convex regularization and Random Fourier Features (RFFs) to deliver last-iterate statistical guarantees on Type I/II error rates for single-pass, non-stationary data streams, with extensions enabling adaptive tracking under adversarial non-stationarity.

1. Classical OCSVM Formulation and Limitations

Standard kernel-based OCSVM methods (Schölkopf et al. 1999, 2001) solve an RKHS maximum-margin problem. The primal soft-margin OCSVM is formulated as

$\min_{w\in\mc H,\;\rho\in\R,\;\xi\ge0}\;\frac12\|w\|_{\mc H}^2\;-\;\rho\;+\;\frac1{\lambda T}\sum_{t=1}^T\xi_t \quad\text{s.t.}\;\langle w,\varphi(X_t)\rangle\ge\rho-\xi_t,\;\xi_t\ge0,$

where $\lambda\in(0,1]$ bounds the permitted fraction of outliers (Type I error). Alternatively, this can be written as the unconstrained penalty problem

$\min_{w,\rho}\; \frac12\|w\|_{\mc H}^2 - \rho + \frac1{\lambda T}\sum_{t=1}^T (\rho - \langle w, \varphi(X_t)\rangle)_+,$

which relies on full access to the $T\times T$ Gram matrix $K(X_i, X_j)$ . Such requirements are computationally prohibitive for streaming or one-pass settings. Furthermore, the objective lacks strong convexity, resulting in slow $O(T^{-1/2})$ SGD convergence and no reliable last-iterate guarantees on outlier (decision) error.

2. Strongly Convex Reformulation with Random Fourier Features

SONAR addresses these limitations by approximating the RKHS kernel $K$ using RFFs, mapping $\varphi(x)\approx z(x)\in\mathbb{R}^d$ . The infinite-dimensional norm is replaced with the Euclidean norm, and the objective is made strongly convex by augmenting with a quadratic term in $\rho$ : $F(w,\rho) = \frac12 \|w\|_2^2 + \frac12 \rho^2 - \lambda \rho + \mathbb{E}_{X\sim P} (\rho - w^\top X)_+.$ Here, $X$ is normalized such that $X \in \mathbb{R}^d$ is supported on the unit sphere or is RFF-normalized, with $\lambda \in [0,1]$ . The strong convexity (1-strongly convex in $(w, \rho)$ ) underpins rapid last-iterate convergence and uniform high-probability error controls within streaming regimes (Suk et al., 11 Dec 2025).

3. Single-Pass SGD Algorithm and Update Rules

For the streaming context, SONAR employs the following update rules for each new sample $X_t$ :

Compute the instantaneous loss:

$f_t(w,\rho) = \frac12\|w\|^2 + \frac12\rho^2 - \lambda\rho + (\rho - w^\top X_t)_+$

Unbiased subgradient estimates:

$\nabla_w f_t = w - X_t \cdot \{\rho \ge w^\top X_t\}, \quad \nabla_\rho f_t = (\rho - \lambda) + \{\rho \ge w^\top X_t\}$

Diminishing step size: $\eta_t = 1/t$
Closed-form one-pass SGD updates:

$\begin{aligned} Z_t &= \{\langle w_{t-1}, X_t \rangle \le \rho_{t-1}\} \ w_t &= w_{t-1} - \eta_t [w_{t-1} - Z_t X_t] \ \rho_t &= \rho_{t-1} - \eta_t [\rho_{t-1} - \lambda + Z_t] \end{aligned}$

By induction, $\|w_t\| \le 1$ and $|\rho_t| \le 1$ at every step, obviating the need for explicit projection. This facilitates streaming operation with $O(d)$ time-per-update, in contrast to $O(T^3)$ training time and $O(T^2)$ memory cost of standard kernel OCSVM solvers.

4. Theoretical Performance and Lifelong Guarantees

SONAR provides explicit, high-probability finite-sample guarantees for both Type I (false positive) and Type II (false negative) errors:

Convergence of last iterate: With probability $1-\delta$ , after $T$ samples, $(w_T, \rho_T)$ satisfies

$\|(w_T, \rho_T) - (w_\lambda, \rho_\lambda)\|^2 = O\bigg(\frac{\log T \log(1/\delta)}{T}\bigg)$

Type I error (false positive) control: For the minimizer $(w_\lambda, \rho_\lambda)$ ,

$\mathrm{err}_1(w_\lambda, \rho_\lambda) < \lambda$

and for the SGD iterate, with proper sample size and an $O(\epsilon)$ shrinkage of the threshold, $\mathrm{err}_1(w_T, (1-\epsilon)\rho_T) < \lambda$ .

Large-margin (Type II) guarantees: The learned margin $r_T = \rho_T/\|w_T\|$ satisfies

$r_T \ge r_\lambda - O\bigg(\frac{\sqrt{\log T \log(1/\delta)}}{\lambda\sqrt{T}}\bigg)$

with $r^*$ —the support function margin—satisfying $r_\lambda \ge r^*$ .

Lifelong (transfer) learning: If the data distribution switches at $t_0$ , for all $t \ge t_0$ ,

$\Pr_{X\sim P'}\{w_t^\top X < \rho_t\} \le \Pr\{w_{t_0}^\top X < \rho_{t_0} + \frac{\lambda (t - t_0)}{t_0}\} \vee O\big((t - t_0)^{-1/2}\big)$

plus a telescopic lower bound on the margin. This enables the algorithm to inherit the statistical performance from previous phases in the stream, rapidly adapting to benign distribution shifts (Suk et al., 11 Dec 2025).

5. Adaptation to Adversarial Non-Stationarity: SONARC

To handle fully adversarial, possibly abrupt non-stationarity, SONAR is embedded within a classical ensemble architecture, SONARC (“SONAR with Changepoint detection” Editor’s term), as follows:

Maintains $\lfloor\log T\rfloor$ base learners, each resetting at dyadic epochs of length $2^m$
At each $t$ , compares $(w_t, \rho_t)$ with each base’s last reset point $(w_{t,m}, \rho_{t,m})$ ; if

$\|(w_t, \rho_t) - (w_{t,m}, \rho_{t,m})\|^2 \ge C \frac{\log T \log(1/\delta)}{2^m}$

for any $m$ , a changepoint is detected and all learners are restarted on the remaining stream

Safety and adaptivity theorems guarantee that changepoint detection is triggered only when the true underlying minimizer shifts, and within stationary phases of sufficient length, SONARC achieves matching Type I/II guarantees to those of an oracle with phase knowledge.

6. Empirical Validation and Computational Efficiency

Empirical evaluation on synthetic and real-world data (including the SKAB water-loop and Aposemat Malware-Capture datasets) demonstrates:

Computational complexity: After mapping to $d = O(D \log(D/\delta))$ RFFs, each update and evaluation is $O(d)$ , contrasting with the cubic/memory-intensive complexity of standard OCSVM QP solvers.
Type I error tracking: Cumulative online Type I error closely matches the user-specified $\lambda$ . For example, SONAR maintains $\approx 0.5\%$ Type I error in SKAB.
Type II error rates: SONAR achieves a final online Type II error of approximately $55\%$ on challenging cases, competitive with deep-learning baselines.
Robust adaptivity: SONARC matches oracle-level false positive and margin guarantees in synthetic multi-phase streams, outperforming multivariate changepoint detection techniques that can falsely trigger on non-critical distributional shifts.

These empirical findings confirm that SONAR and its ensemble extension provide streaming, computation-efficient, and adaptively reliable anomaly detection, establishing the first guarantees for last-iterate errors, margin growth, and lifelong transfer under both benign and adversarial regime changes (Suk et al., 11 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to SGD-based OCSVM Solver.