Granular Ball SVR
- Granular Ball SVR is a novel regression algorithm that uses compact granular balls to replace individual data points, significantly reducing computational costs.
- It employs a two-stage approach where high-quality granular balls are generated through iterative K-means splits before modifying the standard SVR formulation.
- Empirical results show that GBSVR delivers faster training times and improved accuracy metrics (R², MAE, RMSE) compared to traditional SVR, especially in noisy and large-scale datasets.
Granular Ball Support Vector Regression (GBSVR) is a regression algorithm developed to address both the computational inefficiency and sensitivity to outliers inherent in traditional Support Vector Regression (SVR) frameworks. GBSVR introduces the concept of "granular regression balls"—compact, representative subsets derived from the data—which serve as the basic units for model training, replacing individual data points. This methodology produces substantial reductions in computational complexity and enhances robustness to noise and outliers, particularly for large-scale or heteroscedastic datasets (Rastogi et al., 13 Mar 2025).
1. Granular Regression Ball Framework
A granular regression ball, denoted as , aggregates data points based on proximity in the feature space. It is defined by its center and radius , containing all points such that , with as the Euclidean distance. Given points within a ball:
For greater robustness to outliers, the radius can alternatively employ the mean distance:
Each ball is constructed to be “pure” with respect to discretized target labels, produced via quantile-based binning of -values. The quality of a granular regression ball (GRB) is quantified as:
The ball generation proceeds by recursively splitting the lowest-quality or largest balls via K-means (), until every ball achieves quality and has at least points.
2. Modified Support Vector Regression Formulation
Standard SVR solves an -insensitive regression with a cubic computational cost in the number of samples (). GBSVR instead replaces the data points with granular balls , where is the mean target within :
The model enforces the SVR margin constraint such that the farthest point in each ball remains within the SVR -tube:
Introducing slack variables results in the soft-margin GBSVR optimization:
subject to:
The dual problem involves variables for centers and radii, with the following compact form for and :
3. Ball Construction and Training Algorithms
The GBSVR methodology involves two algorithmic stages:
- Granular Regression Ball Generation:
- Discretize targets into bins (quantiles) for label assignment.
- Initialize all points in one ball.
- Iteratively split the lowest-quality or smallest ball using K-means, until thresholds on purity () and minimum size () are met.
- For each ball, compute center , radius , and mean target .
- GBSVR Training:
- Input granular balls , regularization , and tube width .
- Solve the dual quadratic program for .
- Recover via closed-form expressions.
The replacement of samples with balls reduces the problem size and computational cost.
4. Discretization and Purity Measurement
To enable the construction of pure balls, the continuous target variable is sorted and partitioned into non-overlapping quantiles, assigning labels to each target. This approach turns the regression target into a pseudo-classification problem, clarifying the definition of ball “purity” and guiding splits. Balls are further divided until their quality reaches a user-determined threshold and size .
5. Computational Complexity and Runtime Characteristics
The substitution of data points with granular regression balls directly impacts computational efficiency. The standard SVR quadratic program has variables, leading to an cost. GBSVR works with only $2n$ variables (for ), resulting in a reduced asymptotic cost , where typically , yielding a 5–10× speed-up during training. Empirically, GBSVR training was 8–12× faster than SVR and NuSVR on UCI datasets (inputs: 159–414 samples). For example, on the Servo dataset GBSVR completed training in approximately 1.0 s, compared to 9.3 s for SVR and 4.7 s for NuSVR (Rastogi et al., 13 Mar 2025).
| Method | Dataset Size | Training Time (s) |
|---|---|---|
| GBSVR | 159–414 | 1.0 (Servo) |
| SVR | 159–414 | 9.3 (Servo) |
| NuSVR | 159–414 | 4.7 (Servo) |
6. Empirical Evaluation Across Domains
GBSVR’s empirical evaluation covers synthetic, benchmark, and real-world datasets:
- Synthetic Data: On regression functions of Type A () and Type B () with six heteroscedastic noise models, GBSVR achieved superior , MAE, MSE, and RMSE to SVR and NuSVR, especially under high noise conditions.
- UCI Benchmarks: Across datasets such as Real Estate, AutoMPG, Autos, Servo, Yacht, and Machine, GBSVR yielded higher and lower error metrics at all noise-corruption levels (0–20%), typically with one-tenth the training time.
- Stock Forecasting: For Apple, Google, NVIDIA, and Tesla using sliding-window (5→1) prediction, GBSVR improved by 2–5% and reduced MAE/RMSE against baselines.
- Wind Speed Prediction: On 20-min and 30-min horizon tasks with 36,000 samples, GBSVR improved by 2–4% and reduced RMSE by 5–10% compared to SVR/NuSVR.
Across all experimental conditions, GBSVR was more accurate and an order of magnitude faster to train than standard SVR algorithms. While no formal p-value tests were reported, the consistency and magnitude of improvements across datasets and noise regimes indicate that the gains are practically significant (Rastogi et al., 13 Mar 2025).