Papers
Topics
Authors
Recent
2000 character limit reached

SGD-Based OCSVM Solver

Updated 15 December 2025
  • The paper introduces SONAR, a SGD-based OCSVM solver that leverages strongly convex regularization and Random Fourier Features to achieve efficient streaming anomaly detection.
  • The method reduces computational complexity while providing last-iterate guarantees on Type I/II error rates via a single-pass SGD algorithm.
  • Empirical evaluations demonstrate robust adaptivity and performance on synthetic and real-world data under both benign and adversarial non-stationary conditions.

Stochastic Gradient Descent (SGD)-based One-Class Support Vector Machine (OCSVM) solvers provide an efficient approach to streaming outlier detection, overcoming key limitations of traditional kernel methods in both computational tractability and statistical guarantees. SONAR, as introduced by (Suk et al., 11 Dec 2025), leverages strongly convex regularization and Random Fourier Features (RFFs) to deliver last-iterate statistical guarantees on Type I/II error rates for single-pass, non-stationary data streams, with extensions enabling adaptive tracking under adversarial non-stationarity.

1. Classical OCSVM Formulation and Limitations

Standard kernel-based OCSVM methods (Schölkopf et al. 1999, 2001) solve an RKHS maximum-margin problem. The primal soft-margin OCSVM is formulated as

$\min_{w\in\mc H,\;\rho\in\R,\;\xi\ge0}\;\frac12\|w\|_{\mc H}^2\;-\;\rho\;+\;\frac1{\lambda T}\sum_{t=1}^T\xi_t \quad\text{s.t.}\;\langle w,\varphi(X_t)\rangle\ge\rho-\xi_t,\;\xi_t\ge0,$

where λ(0,1]\lambda\in(0,1] bounds the permitted fraction of outliers (Type I error). Alternatively, this can be written as the unconstrained penalty problem

$\min_{w,\rho}\; \frac12\|w\|_{\mc H}^2 - \rho + \frac1{\lambda T}\sum_{t=1}^T (\rho - \langle w, \varphi(X_t)\rangle)_+,$

which relies on full access to the T×TT\times T Gram matrix K(Xi,Xj)K(X_i, X_j). Such requirements are computationally prohibitive for streaming or one-pass settings. Furthermore, the objective lacks strong convexity, resulting in slow O(T1/2)O(T^{-1/2}) SGD convergence and no reliable last-iterate guarantees on outlier (decision) error.

2. Strongly Convex Reformulation with Random Fourier Features

SONAR addresses these limitations by approximating the RKHS kernel KK using RFFs, mapping φ(x)z(x)Rd\varphi(x)\approx z(x)\in\mathbb{R}^d. The infinite-dimensional norm is replaced with the Euclidean norm, and the objective is made strongly convex by augmenting with a quadratic term in ρ\rho: F(w,ρ)=12w22+12ρ2λρ+EXP(ρwX)+.F(w,\rho) = \frac12 \|w\|_2^2 + \frac12 \rho^2 - \lambda \rho + \mathbb{E}_{X\sim P} (\rho - w^\top X)_+. Here, XX is normalized such that XRdX \in \mathbb{R}^d is supported on the unit sphere or is RFF-normalized, with λ[0,1]\lambda \in [0,1]. The strong convexity (1-strongly convex in (w,ρ)(w, \rho)) underpins rapid last-iterate convergence and uniform high-probability error controls within streaming regimes (Suk et al., 11 Dec 2025).

3. Single-Pass SGD Algorithm and Update Rules

For the streaming context, SONAR employs the following update rules for each new sample XtX_t:

  • Compute the instantaneous loss:

ft(w,ρ)=12w2+12ρ2λρ+(ρwXt)+f_t(w,\rho) = \frac12\|w\|^2 + \frac12\rho^2 - \lambda\rho + (\rho - w^\top X_t)_+

  • Unbiased subgradient estimates:

wft=wXt{ρwXt},ρft=(ρλ)+{ρwXt}\nabla_w f_t = w - X_t \cdot \{\rho \ge w^\top X_t\}, \quad \nabla_\rho f_t = (\rho - \lambda) + \{\rho \ge w^\top X_t\}

  • Diminishing step size: ηt=1/t\eta_t = 1/t
  • Closed-form one-pass SGD updates:

Zt={wt1,Xtρt1} wt=wt1ηt[wt1ZtXt] ρt=ρt1ηt[ρt1λ+Zt]\begin{aligned} Z_t &= \{\langle w_{t-1}, X_t \rangle \le \rho_{t-1}\} \ w_t &= w_{t-1} - \eta_t [w_{t-1} - Z_t X_t] \ \rho_t &= \rho_{t-1} - \eta_t [\rho_{t-1} - \lambda + Z_t] \end{aligned}

By induction, wt1\|w_t\| \le 1 and ρt1|\rho_t| \le 1 at every step, obviating the need for explicit projection. This facilitates streaming operation with O(d)O(d) time-per-update, in contrast to O(T3)O(T^3) training time and O(T2)O(T^2) memory cost of standard kernel OCSVM solvers.

4. Theoretical Performance and Lifelong Guarantees

SONAR provides explicit, high-probability finite-sample guarantees for both Type I (false positive) and Type II (false negative) errors:

  • Convergence of last iterate: With probability 1δ1-\delta, after TT samples, (wT,ρT)(w_T, \rho_T) satisfies

(wT,ρT)(wλ,ρλ)2=O(logTlog(1/δ)T)\|(w_T, \rho_T) - (w_\lambda, \rho_\lambda)\|^2 = O\bigg(\frac{\log T \log(1/\delta)}{T}\bigg)

  • Type I error (false positive) control: For the minimizer (wλ,ρλ)(w_\lambda, \rho_\lambda),

err1(wλ,ρλ)<λ\mathrm{err}_1(w_\lambda, \rho_\lambda) < \lambda

and for the SGD iterate, with proper sample size and an O(ϵ)O(\epsilon) shrinkage of the threshold, err1(wT,(1ϵ)ρT)<λ\mathrm{err}_1(w_T, (1-\epsilon)\rho_T) < \lambda.

  • Large-margin (Type II) guarantees: The learned margin rT=ρT/wTr_T = \rho_T/\|w_T\| satisfies

rTrλO(logTlog(1/δ)λT)r_T \ge r_\lambda - O\bigg(\frac{\sqrt{\log T \log(1/\delta)}}{\lambda\sqrt{T}}\bigg)

with rr^*—the support function margin—satisfying rλrr_\lambda \ge r^*.

  • Lifelong (transfer) learning: If the data distribution switches at t0t_0, for all tt0t \ge t_0,

PrXP{wtX<ρt}Pr{wt0X<ρt0+λ(tt0)t0}O((tt0)1/2)\Pr_{X\sim P'}\{w_t^\top X < \rho_t\} \le \Pr\{w_{t_0}^\top X < \rho_{t_0} + \frac{\lambda (t - t_0)}{t_0}\} \vee O\big((t - t_0)^{-1/2}\big)

plus a telescopic lower bound on the margin. This enables the algorithm to inherit the statistical performance from previous phases in the stream, rapidly adapting to benign distribution shifts (Suk et al., 11 Dec 2025).

5. Adaptation to Adversarial Non-Stationarity: SONARC

To handle fully adversarial, possibly abrupt non-stationarity, SONAR is embedded within a classical ensemble architecture, SONARC (“SONAR with Changepoint detection” Editor’s term), as follows:

  • Maintains logT\lfloor\log T\rfloor base learners, each resetting at dyadic epochs of length 2m2^m
  • At each tt, compares (wt,ρt)(w_t, \rho_t) with each base’s last reset point (wt,m,ρt,m)(w_{t,m}, \rho_{t,m}); if

(wt,ρt)(wt,m,ρt,m)2ClogTlog(1/δ)2m\|(w_t, \rho_t) - (w_{t,m}, \rho_{t,m})\|^2 \ge C \frac{\log T \log(1/\delta)}{2^m}

for any mm, a changepoint is detected and all learners are restarted on the remaining stream

  • Safety and adaptivity theorems guarantee that changepoint detection is triggered only when the true underlying minimizer shifts, and within stationary phases of sufficient length, SONARC achieves matching Type I/II guarantees to those of an oracle with phase knowledge.

6. Empirical Validation and Computational Efficiency

Empirical evaluation on synthetic and real-world data (including the SKAB water-loop and Aposemat Malware-Capture datasets) demonstrates:

  • Computational complexity: After mapping to d=O(Dlog(D/δ))d = O(D \log(D/\delta)) RFFs, each update and evaluation is O(d)O(d), contrasting with the cubic/memory-intensive complexity of standard OCSVM QP solvers.
  • Type I error tracking: Cumulative online Type I error closely matches the user-specified λ\lambda. For example, SONAR maintains 0.5%\approx 0.5\% Type I error in SKAB.
  • Type II error rates: SONAR achieves a final online Type II error of approximately 55%55\% on challenging cases, competitive with deep-learning baselines.
  • Robust adaptivity: SONARC matches oracle-level false positive and margin guarantees in synthetic multi-phase streams, outperforming multivariate changepoint detection techniques that can falsely trigger on non-critical distributional shifts.

These empirical findings confirm that SONAR and its ensemble extension provide streaming, computation-efficient, and adaptively reliable anomaly detection, establishing the first guarantees for last-iterate errors, margin growth, and lifelong transfer under both benign and adversarial regime changes (Suk et al., 11 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SGD-based OCSVM Solver.