SPLBoost: Robust Self-Paced Boosting
- SPLBoost is a robust boosting algorithm integrating self-paced learning into AdaBoost to automatically down-weight or discard noisy and outlying samples.
- The method alternates between closed-form latent weight updates and weak learner optimization using various SP-regularizers, ensuring effective sample selection.
- Empirical evaluations on synthetic and UCI datasets demonstrate that SPLBoost achieves lower test-error rates and enhanced robustness under noisy conditions.
SPLBoost is a robust boosting algorithm designed by integrating the self-paced learning (SPL) paradigm into the AdaBoost framework. The principal innovation lies in introducing a latent sample-weighting mechanism governed by a self-paced regularizer, which adaptively emphasizes easy samples and down-weights or discards potential outliers within each boosting round. This yields a saturated loss function, rendering SPLBoost highly insensitive to noise and extreme outlier contamination, and it is implemented via minimal modifications to standard boosting routines (Wang et al., 2017).
1. Formal Objective and SPL Regularization Schemes
Given binary-labeled data , , with a current strong classifier , SPLBoost iteratively seeks to add a new weak learner with coefficient , while jointly optimizing latent sample weights via a self-paced regularizer. The per-iteration objective is formulated as:
Here, encodes the participation level of sample in the current round, and is the SP-regularizer (also referred to as “age” regularizer) parametrized by . Three prevalent regularization choices are provided, all yielding closed-form solutions for each sample's loss :
| Regularizer Type | Expression | |
|---|---|---|
| Hard weighting | $1$ if ; $0$ otherwise | |
| Linear soft weighting | ||
| Polynomial soft () | if ; $0$ otherwise |
By integrating with the exponential loss, SPLBoost automates sample selection, systematically suppressing the influence of large-loss (likely noisy or outlying) samples.
2. Alternating Optimization Procedure
SPLBoost employs a block-coordinate descent within each boosting round, alternating between two update phases:
a) Majorization (Update ):
With fixed, update as:
Solutions for depend directly on the choice of .
b) Minimization (Update ):
With fixed, minimize:
Analogous to AdaBoost, is fit by minimizing the weighted squared-error proxy with . The optimal coefficient is:
Sample weights are then updated for the next round:
Empirical observations suggest that often only one inner alternation suffices per round for effective optimization.
3. Algorithmic Description: Pseudocode
A concise pseudocode representation is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Input: {(x_i, y_i)}_{i=1}^n, number of rounds T, SP-parameter λ
Initialize: w_i ← 1/n, v_i ← 1 for all i
F(x) ← 0
for t = 1…T do
1) Compute v_i ← v*(ℓ_i; λ), where ℓ_i = exp( -y_i·F(x_i) )
2) Fit weak learner f_t: train on {(x_i, y_i)} with weight u_i = v_i·w_i
3) Compute error err = ∑_{i: y_i ≠ f_t(x_i)} u_i
α_t = ½·ln[(1−err)/err]
4) Update strong model: F(x) ← F(x) + α_t f_t(x)
5) Update AdaBoost weights: w_i ← w_i·exp(−α_t y_i f_t(x_i))
end for
Output: final classifier sign(F(x)) |
The mapping is determined analytically, based on the chosen SP-regularizer.
4. Theoretical Analysis and Guarantees
SPLBoost's procedure is rigorously characterized as a majorization–minimization (MM) algorithm on a latent nonconvex objective of the form:
where
is a saturated, nonconvex loss function. Each inner step involves constructing and minimizing a tight surrogate , ensuring monotonic decrease of the objective. The objective is bounded below; thus, the sequence converges to a local stationary point.
A key theoretical property is robustness: once the loss for a sample exceeds threshold , saturates and the sample's gradient vanishes, resulting in an automatic exclusion (via ) of outliers and heavy-noise points.
5. Empirical Evaluation and Benchmarking
Experimental results are reported across synthetic and real-world benchmarks:
a) Synthetic 2D Gaussian Toy:
- Constructed from two 2D Gaussians (100 samples each) with 15% random label flips.
- Weak learners: C4.5 classification tree (AdaBoost/SPLBoost/RobustBoost), CART regression tree (LogitBoost/RBoost).
- Alternatives compared: AdaBoost, LogitBoost, SavageBoost, RBoost, RobustBoost, SPLBoost (with tuned).
- SPLBoost is observed to assign zero weight to persistent misclassified (outlier) points and achieves a decision boundary near Bayes-optimal. In contrast, AdaBoost/LogitBoost overweight noisy points, and other nonconvex boosters still allocate weight to some outliers.
b) Seventeen UCI Datasets:
- Dataset features vary (4–72), sizes range from 200 to 130,000 samples (e.g., adult, spambase, magic, miniboone).
- Label noise injected at 0%, 5%, 10%, 20%, and 30%.
- Standard splits: 70/30 train/test, 5-fold cross-validation for and rounds (maximum 200), 50 repetitions.
- SPLBoost exhibits uniformly lower test-error rates at all noise levels. Rank statistics over the 85 dataset/noise combinations position SPLBoost in the top tier for roughly 80% of cases, clearly outperforming convex and nonconvex boosting baselines.
c) Regularizer Ablation:
- Four SPLBoost variants (hard, linear soft, polynomial soft with and ) yield similar robustness, indicating insensitivity to choice, provided it enforces drop-out for large-loss samples.
6. Mechanisms Underlying SPLBoost Robustness
Instead of designing robust loss functions, SPLBoost leverages the SPL sample-selection to truncate the exponential loss beyond a fixed threshold , yielding a saturated loss. The gradient vanishes for samples with large negative margins—outliers thus have no influence on weak learner fitting.
Alternating closed-form updates with standard AdaBoost steps circumvents nonconvex optimization difficulties (e.g., no need for Newton updates or differential equations as required in RobustBoost). The "age" parameter modulates the algorithm stringency: smaller increases outlier exclusion, larger recovers AdaBoost as . Practical cross-validation typically chooses within and applies a brief warm-start to to avoid excessive pruning in initial iterations.
SPLBoost thus combines automatic sample pruning with additive-model expertise from boosting to provide a scalable and robust classification approach (Wang et al., 2017).