Papers
Topics
Authors
Recent
2000 character limit reached

SPLBoost: Robust Self-Paced Boosting

Updated 29 November 2025
  • SPLBoost is a robust boosting algorithm integrating self-paced learning into AdaBoost to automatically down-weight or discard noisy and outlying samples.
  • The method alternates between closed-form latent weight updates and weak learner optimization using various SP-regularizers, ensuring effective sample selection.
  • Empirical evaluations on synthetic and UCI datasets demonstrate that SPLBoost achieves lower test-error rates and enhanced robustness under noisy conditions.

SPLBoost is a robust boosting algorithm designed by integrating the self-paced learning (SPL) paradigm into the AdaBoost framework. The principal innovation lies in introducing a latent sample-weighting mechanism governed by a self-paced regularizer, which adaptively emphasizes easy samples and down-weights or discards potential outliers within each boosting round. This yields a saturated loss function, rendering SPLBoost highly insensitive to noise and extreme outlier contamination, and it is implemented via minimal modifications to standard boosting routines (Wang et al., 2017).

1. Formal Objective and SPL Regularization Schemes

Given binary-labeled data {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n, yi{±1}y_i \in \{\pm1\}, with a current strong classifier F(x)F(x), SPLBoost iteratively seeks to add a new weak learner f(x){±1}f(x)\in\{\pm1\} with coefficient α\alpha, while jointly optimizing latent sample weights v[0,1]n\mathbf{v} \in [0, 1]^n via a self-paced regularizer. The per-iteration objective is formulated as:

minα,  f,  v[0,1]ni=1n[vi  exp(yi(F(xi)+αf(xi)))+f^(vi;λ)]\min_{\alpha,\;f,\;\mathbf v\in[0,1]^n} \sum_{i=1}^n \left[v_i\;\exp\left(-y_i\left(F(x_i)+\alpha f(x_i)\right)\right) + \hat{f}(v_i;\lambda)\right]

Here, viv_i encodes the participation level of sample ii in the current round, and f^(v;λ)\hat{f}(v;\lambda) is the SP-regularizer (also referred to as “age” regularizer) parametrized by λ>0\lambda > 0. Three prevalent regularization choices are provided, all yielding closed-form solutions v(;λ)v^*(\ell;\lambda) for each sample's loss \ell:

Regularizer Type Expression v(;λ)v^*(\ell;\lambda)
Hard weighting f^(v;λ)=λv\hat f(v;\lambda) = -\lambda v $1$ if <λ\ell < \lambda; $0$ otherwise
Linear soft weighting λ(12v2v)\lambda(\frac{1}{2}v^2 - v) max{0,1λ}\max\{0, 1 - \tfrac{\ell}{\lambda}\}
Polynomial soft (t>1t>1) λ(1tvtv)\lambda(\tfrac{1}{t}v^t - v) (1/λ)1/(t1)(1-\ell/\lambda)^{1/(t-1)} if <λ\ell<\lambda; $0$ otherwise

By integrating f^\hat f with the exponential loss, SPLBoost automates sample selection, systematically suppressing the influence of large-loss (likely noisy or outlying) samples.

2. Alternating Optimization Procedure

SPLBoost employs a block-coordinate descent within each boosting round, alternating between two update phases:

a) Majorization (Update v\mathbf v):

With (α,f)(\alpha, f) fixed, update viv_i as:

vi=argminv[0,1]vi(α,f)+f^(v;λ),i(α,f)=exp(yi(F(xi)+αf(xi)))v_i^* = \arg\min_{v \in [0,1]} v \cdot \ell_i(\alpha, f) + \hat{f}(v; \lambda), \quad \ell_i(\alpha, f) = \exp(-y_i (F(x_i) + \alpha f(x_i)))

Solutions for viv_i^* depend directly on the choice of f^\hat f.

b) Minimization (Update (f,α)(f, \alpha)):

With v\mathbf v fixed, minimize:

minα,fi=1nviexp(yi(F(xi)+αf(xi)))\min_{\alpha, f} \sum_{i=1}^n v_i \exp(-y_i(F(x_i)+\alpha f(x_i)))

Analogous to AdaBoost, ftf_t is fit by minimizing the weighted squared-error proxy iviwi(yif(xi))2\sum_i v_i w_i (y_i - f(x_i))^2 with wi=exp(yiF(xi))w_i = \exp(-y_iF(x_i)). The optimal coefficient is:

αt=12lni:yi=ft(xi)viwii:yift(xi)viwi\alpha_t = \frac{1}{2} \ln \frac{\sum_{i: y_i = f_t(x_i)} v_i w_i}{\sum_{i: y_i \neq f_t(x_i)} v_i w_i}

Sample weights are then updated for the next round:

wiwiexp(αtyift(xi)),F(x)F(x)+αtft(x)w_i \leftarrow w_i \exp(-\alpha_t y_i f_t(x_i)), \quad F(x) \leftarrow F(x) + \alpha_t f_t(x)

Empirical observations suggest that often only one inner alternation suffices per round for effective optimization.

3. Algorithmic Description: Pseudocode

A concise pseudocode representation is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
Input:  {(x_i, y_i)}_{i=1}^n, number of rounds T, SP-parameter λ
Initialize:  w_i  1/n,  v_i  1 for all i
F(x)  0

for t = 1T do
    1) Compute v_i  v*(ℓ_i; λ),  where ℓ_i = exp( -y_i·F(x_i) )
    2) Fit weak learner f_t: train on {(x_i, y_i)} with weight u_i = v_i·w_i
    3) Compute error err = _{i: y_i  f_t(x_i)} u_i
       α_t = ½·ln[(1err)/err]
    4) Update strong model: F(x)  F(x) + α_t f_t(x)
    5) Update AdaBoost weights: w_i  w_i·exp(α_t y_i f_t(x_i))
end for

Output: final classifier sign(F(x))

The mapping v(;λ)v^*(\ell; \lambda) is determined analytically, based on the chosen SP-regularizer.

4. Theoretical Analysis and Guarantees

SPLBoost's procedure is rigorously characterized as a majorization–minimization (MM) algorithm on a latent nonconvex objective of the form:

i=1nF~λ(exp(yiF(xi)))\sum_{i=1}^n \widetilde{F}_\lambda(\exp(-y_iF(x_i)))

where

F~λ()=0v(l;λ)dl\widetilde{F}_\lambda(\ell) = \int_0^\ell v^*(l; \lambda) dl

is a saturated, nonconvex loss function. Each inner step involves constructing and minimizing a tight surrogate Q(α,fα,f)Q(\alpha, f \mid \alpha^*, f^*), ensuring monotonic decrease of the objective. The objective is bounded below; thus, the sequence {Ft}\{F_t\} converges to a local stationary point.

A key theoretical property is robustness: once the loss \ell for a sample exceeds threshold λ\lambda, F~λ()\widetilde{F}_\lambda(\ell) saturates and the sample's gradient vanishes, resulting in an automatic exclusion (via vi=0v_i = 0) of outliers and heavy-noise points.

5. Empirical Evaluation and Benchmarking

Experimental results are reported across synthetic and real-world benchmarks:

a) Synthetic 2D Gaussian Toy:

  • Constructed from two 2D Gaussians (100 samples each) with 15% random label flips.
  • Weak learners: C4.5 classification tree (AdaBoost/SPLBoost/RobustBoost), CART regression tree (LogitBoost/RBoost).
  • Alternatives compared: AdaBoost, LogitBoost, SavageBoost, RBoost, RobustBoost, SPLBoost (with λ\lambda tuned).
  • SPLBoost is observed to assign zero weight to persistent misclassified (outlier) points and achieves a decision boundary near Bayes-optimal. In contrast, AdaBoost/LogitBoost overweight noisy points, and other nonconvex boosters still allocate weight to some outliers.

b) Seventeen UCI Datasets:

  • Dataset features vary (4–72), sizes range from 200 to 130,000 samples (e.g., adult, spambase, magic, miniboone).
  • Label noise injected at 0%, 5%, 10%, 20%, and 30%.
  • Standard splits: 70/30 train/test, 5-fold cross-validation for λ\lambda and rounds (maximum 200), 50 repetitions.
  • SPLBoost exhibits uniformly lower test-error rates at all noise levels. Rank statistics over the 85 dataset/noise combinations position SPLBoost in the top tier for roughly 80% of cases, clearly outperforming convex and nonconvex boosting baselines.

c) Regularizer Ablation:

  • Four SPLBoost variants (hard, linear soft, polynomial soft with t=1.3t=1.3 and t=4t=4) yield similar robustness, indicating insensitivity to f^\hat f choice, provided it enforces drop-out for large-loss samples.

6. Mechanisms Underlying SPLBoost Robustness

Instead of designing robust loss functions, SPLBoost leverages the SPL sample-selection to truncate the exponential loss beyond a fixed threshold λ\lambda, yielding a saturated loss. The gradient vanishes for samples with large negative margins—outliers thus have no influence on weak learner fitting.

Alternating closed-form viv_i updates with standard AdaBoost steps circumvents nonconvex optimization difficulties (e.g., no need for Newton updates or differential equations as required in RobustBoost). The "age" parameter λ\lambda modulates the algorithm stringency: smaller λ\lambda increases outlier exclusion, larger λ\lambda recovers AdaBoost as λ\lambda\rightarrow\infty. Practical cross-validation typically chooses λ\lambda within [1,6][1, 6] and applies a brief warm-start to λ\lambda to avoid excessive pruning in initial iterations.

SPLBoost thus combines automatic sample pruning with additive-model expertise from boosting to provide a scalable and robust classification approach (Wang et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to SPLBoost.