Papers
Topics
Authors
Recent
Search
2000 character limit reached

Granular Ball SVR

Updated 8 February 2026
  • Granular Ball SVR is a novel regression algorithm that uses compact granular balls to replace individual data points, significantly reducing computational costs.
  • It employs a two-stage approach where high-quality granular balls are generated through iterative K-means splits before modifying the standard SVR formulation.
  • Empirical results show that GBSVR delivers faster training times and improved accuracy metrics (R², MAE, RMSE) compared to traditional SVR, especially in noisy and large-scale datasets.

Granular Ball Support Vector Regression (GBSVR) is a regression algorithm developed to address both the computational inefficiency and sensitivity to outliers inherent in traditional Support Vector Regression (SVR) frameworks. GBSVR introduces the concept of "granular regression balls"—compact, representative subsets derived from the data—which serve as the basic units for model training, replacing individual data points. This methodology produces substantial reductions in computational complexity and enhances robustness to noise and outliers, particularly for large-scale or heteroscedastic datasets (Rastogi et al., 13 Mar 2025).

1. Granular Regression Ball Framework

A granular regression ball, denoted as B(c,r)B(c, r), aggregates data points based on proximity in the feature space. It is defined by its center cRlc \in \mathbb{R}^l and radius r0r \geq 0, containing all points xx such that d(x,c)rd(x, c) \leq r, with d(x,c)d(x, c) as the Euclidean distance. Given uu points {xi}i=1u\{x_i\}_{i=1}^u within a ball:

c=1ui=1uxi,r=max1iud(xi,c)c = \frac{1}{u} \sum_{i=1}^u x_i, \qquad r = \max_{1 \leq i \leq u} d(x_i, c)

For greater robustness to outliers, the radius can alternatively employ the mean distance:

r=1ui=1ud(xi,c)r = \frac{1}{u} \sum_{i=1}^u d(x_i, c)

Each ball is constructed to be “pure” with respect to discretized target labels, produced via quantile-based binning of yy-values. The quality of a granular regression ball (GRB) GRBjGRB_j is quantified as:

quality(GRBj)={(xi,yi)GRBj:label(yi)=majority label in GRBj}GRBj\mathrm{quality}(GRB_j) = \frac{|\{(x_i, y_i) \in GRB_j : \text{label}(y_i) = \text{majority label in } GRB_j\}|}{|GRB_j|}

The ball generation proceeds by recursively splitting the lowest-quality or largest balls via K-means (K=2K=2), until every ball achieves quality T\geq T and has at least pp points.

2. Modified Support Vector Regression Formulation

Standard SVR solves an ϵ\epsilon-insensitive regression with a cubic computational cost in the number of samples (O(m3)O(m^3)). GBSVR instead replaces the mm data points with nmn \ll m granular balls {(ci,ri,y^i)}i=1n\{ (c_i, r_i, \hat{y}_i) \}_{i=1}^n, where y^i\hat{y}_i is the mean target within GRBiGRB_i:

y^i=1GRBi(x,y)GRBiy\hat{y}_i = \frac{1}{|GRB_i|} \sum_{(x, y) \in GRB_i} y

The model enforces the SVR margin constraint such that the farthest point in each ball remains within the SVR ϵ\epsilon-tube:

wri+y^iwcibϵ,wci+by^iwriϵ\|w\| r_i + \hat{y}_i - w \cdot c_i - b \leq \epsilon, \qquad w \cdot c_i + b - \hat{y}_i - \|w\| r_i \leq \epsilon

Introducing slack variables ξi,ξi\xi_i, \xi_i^* results in the soft-margin GBSVR optimization:

min12w2+Ci=1n(ξi+ξi)\min \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n (\xi_i + \xi_i^*)

subject to: wri+y^iwcibϵ+ξi\|w\| r_i + \hat{y}_i - w c_i - b \leq \epsilon + \xi_i

wci+by^iwriϵ+ξiw c_i + b - \hat{y}_i - \|w\| r_i \leq \epsilon + \xi_i^*

ξi,ξi0\xi_i, \xi_i^* \geq 0

The dual problem involves variables for centers and radii, with the following compact form for A=i=1n(αiαi)ciA = \sum_{i=1}^n (\alpha_i - \alpha_i^*) c_i and B=i=1n(αiαi)riB = \sum_{i=1}^n (\alpha_i - \alpha_i^*) r_i:

max12A212B2+AB+i=1n(αiαi)y^iϵi=1n(αi+αi)\max -\frac{1}{2}\|A\|^2 - \frac{1}{2} B^2 + \|A\| B + \sum_{i=1}^n (\alpha_i - \alpha_i^*) \hat{y}_i - \epsilon \sum_{i=1}^n (\alpha_i + \alpha_i^*)

subject to i=1n(αiαi)=0,0αi,αiC\text{subject to } \sum_{i=1}^n (\alpha_i - \alpha_i^*) = 0, \quad 0 \leq \alpha_i, \alpha_i^* \leq C

3. Ball Construction and Training Algorithms

The GBSVR methodology involves two algorithmic stages:

  • Granular Regression Ball Generation:
  1. Discretize targets {yi}\{y_i\} into kk bins (quantiles) for label assignment.
  2. Initialize all points in one ball.
  3. Iteratively split the lowest-quality or smallest ball using K=2K=2 K-means, until thresholds on purity (TT) and minimum size (pp) are met.
  4. For each ball, compute center cic_i, radius rir_i, and mean target y^i\hat{y}_i.
  • GBSVR Training:
  1. Input granular balls {(ci,ri,y^i)}\{(c_i, r_i, \hat{y}_i)\}, regularization CC, and tube width ϵ\epsilon.
  2. Solve the dual quadratic program for {αi,αi}\{ \alpha_i, \alpha_i^* \}.
  3. Recover w,bw, b via closed-form expressions.

The replacement of mm samples with nmn \ll m balls reduces the problem size and computational cost.

4. Discretization and Purity Measurement

To enable the construction of pure balls, the continuous target variable YY is sorted and partitioned into kk non-overlapping quantiles, assigning labels i{1,,k}\ell_i \in \{1, \ldots, k\} to each target. This approach turns the regression target into a pseudo-classification problem, clarifying the definition of ball “purity” and guiding splits. Balls are further divided until their quality reaches a user-determined threshold TT and size p\geq p.

5. Computational Complexity and Runtime Characteristics

The substitution of data points with granular regression balls directly impacts computational efficiency. The standard SVR quadratic program has mm variables, leading to an O(m3)O(m^3) cost. GBSVR works with only $2n$ variables (for αi,αi\alpha_i, \alpha_i^*), resulting in a reduced asymptotic cost O(n3)O(n^3), where typically n/m=20%40%n/m = 20\% – 40\%, yielding a 5–10× speed-up during training. Empirically, GBSVR training was 8–12× faster than SVR and NuSVR on UCI datasets (inputs: 159–414 samples). For example, on the Servo dataset GBSVR completed training in approximately 1.0 s, compared to 9.3 s for SVR and 4.7 s for NuSVR (Rastogi et al., 13 Mar 2025).

Method Dataset Size Training Time (s)
GBSVR 159–414 1.0 (Servo)
SVR 159–414 9.3 (Servo)
NuSVR 159–414 4.7 (Servo)

6. Empirical Evaluation Across Domains

GBSVR’s empirical evaluation covers synthetic, benchmark, and real-world datasets:

  • Synthetic Data: On regression functions of Type A (y=sin(πx)/(πx)+ηy = \sin(\pi x)/(\pi x) + \eta) and Type B (y=cos(πx)+ηy = \cos(\pi x) + \eta) with six heteroscedastic noise models, GBSVR achieved superior R2R^2, MAE, MSE, and RMSE to SVR and NuSVR, especially under high noise conditions.
  • UCI Benchmarks: Across datasets such as Real Estate, AutoMPG, Autos, Servo, Yacht, and Machine, GBSVR yielded higher R2R^2 and lower error metrics at all noise-corruption levels (0–20%), typically with one-tenth the training time.
  • Stock Forecasting: For Apple, Google, NVIDIA, and Tesla using sliding-window (5→1) prediction, GBSVR improved R2R^2 by 2–5% and reduced MAE/RMSE against baselines.
  • Wind Speed Prediction: On 20-min and 30-min horizon tasks with 36,000 samples, GBSVR improved R2R^2 by 2–4% and reduced RMSE by 5–10% compared to SVR/NuSVR.

Across all experimental conditions, GBSVR was more accurate and an order of magnitude faster to train than standard SVR algorithms. While no formal p-value tests were reported, the consistency and magnitude of improvements across datasets and noise regimes indicate that the gains are practically significant (Rastogi et al., 13 Mar 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Granular Ball SVR (GBSVR).