Granular Ball SVR

Updated 8 February 2026

Granular Ball SVR is a novel regression algorithm that uses compact granular balls to replace individual data points, significantly reducing computational costs.
It employs a two-stage approach where high-quality granular balls are generated through iterative K-means splits before modifying the standard SVR formulation.
Empirical results show that GBSVR delivers faster training times and improved accuracy metrics (R², MAE, RMSE) compared to traditional SVR, especially in noisy and large-scale datasets.

Granular Ball Support Vector Regression (GBSVR) is a regression algorithm developed to address both the computational inefficiency and sensitivity to outliers inherent in traditional Support Vector Regression (SVR) frameworks. GBSVR introduces the concept of "granular regression balls"—compact, representative subsets derived from the data—which serve as the basic units for model training, replacing individual data points. This methodology produces substantial reductions in computational complexity and enhances robustness to noise and outliers, particularly for large-scale or heteroscedastic datasets (Rastogi et al., 13 Mar 2025).

1. Granular Regression Ball Framework

A granular regression ball, denoted as $B(c, r)$ , aggregates data points based on proximity in the feature space. It is defined by its center $c \in \mathbb{R}^l$ and radius $r \geq 0$ , containing all points $x$ such that $d(x, c) \leq r$ , with $d(x, c)$ as the Euclidean distance. Given $u$ points $\{x_i\}_{i=1}^u$ within a ball:

$c = \frac{1}{u} \sum_{i=1}^u x_i, \qquad r = \max_{1 \leq i \leq u} d(x_i, c)$

For greater robustness to outliers, the radius can alternatively employ the mean distance:

$r = \frac{1}{u} \sum_{i=1}^u d(x_i, c)$

Each ball is constructed to be “pure” with respect to discretized target labels, produced via quantile-based binning of $y$ -values. The quality of a granular regression ball (GRB) $GRB_j$ is quantified as:

$\mathrm{quality}(GRB_j) = \frac{|\{(x_i, y_i) \in GRB_j : \text{label}(y_i) = \text{majority label in } GRB_j\}|}{|GRB_j|}$

The ball generation proceeds by recursively splitting the lowest-quality or largest balls via K-means ( $K=2$ ), until every ball achieves quality $\geq T$ and has at least $p$ points.

2. Modified Support Vector Regression Formulation

Standard SVR solves an $\epsilon$ -insensitive regression with a cubic computational cost in the number of samples ( $O(m^3)$ ). GBSVR instead replaces the $m$ data points with $n \ll m$ granular balls $\{ (c_i, r_i, \hat{y}_i) \}_{i=1}^n$ , where $\hat{y}_i$ is the mean target within $GRB_i$ :

$\hat{y}_i = \frac{1}{|GRB_i|} \sum_{(x, y) \in GRB_i} y$

The model enforces the SVR margin constraint such that the farthest point in each ball remains within the SVR $\epsilon$ -tube:

$\|w\| r_i + \hat{y}_i - w \cdot c_i - b \leq \epsilon, \qquad w \cdot c_i + b - \hat{y}_i - \|w\| r_i \leq \epsilon$

Introducing slack variables $\xi_i, \xi_i^*$ results in the soft-margin GBSVR optimization:

$\min \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n (\xi_i + \xi_i^*)$

subject to: $\|w\| r_i + \hat{y}_i - w c_i - b \leq \epsilon + \xi_i$

$w c_i + b - \hat{y}_i - \|w\| r_i \leq \epsilon + \xi_i^*$

$\xi_i, \xi_i^* \geq 0$

The dual problem involves variables for centers and radii, with the following compact form for $A = \sum_{i=1}^n (\alpha_i - \alpha_i^*) c_i$ and $B = \sum_{i=1}^n (\alpha_i - \alpha_i^*) r_i$ :

$\max -\frac{1}{2}\|A\|^2 - \frac{1}{2} B^2 + \|A\| B + \sum_{i=1}^n (\alpha_i - \alpha_i^*) \hat{y}_i - \epsilon \sum_{i=1}^n (\alpha_i + \alpha_i^*)$

$\text{subject to } \sum_{i=1}^n (\alpha_i - \alpha_i^*) = 0, \quad 0 \leq \alpha_i, \alpha_i^* \leq C$

3. Ball Construction and Training Algorithms

The GBSVR methodology involves two algorithmic stages:

Granular Regression Ball Generation:

Discretize targets $\{y_i\}$ into $k$ bins (quantiles) for label assignment.
Initialize all points in one ball.
Iteratively split the lowest-quality or smallest ball using $K=2$ K-means, until thresholds on purity ( $T$ ) and minimum size ( $p$ ) are met.
For each ball, compute center $c_i$ , radius $r_i$ , and mean target $\hat{y}_i$ .

GBSVR Training:

Input granular balls $\{(c_i, r_i, \hat{y}_i)\}$ , regularization $C$ , and tube width $\epsilon$ .
Solve the dual quadratic program for $\{ \alpha_i, \alpha_i^* \}$ .
Recover $w, b$ via closed-form expressions.

The replacement of $m$ samples with $n \ll m$ balls reduces the problem size and computational cost.

4. Discretization and Purity Measurement

To enable the construction of pure balls, the continuous target variable $Y$ is sorted and partitioned into $k$ non-overlapping quantiles, assigning labels $\ell_i \in \{1, \ldots, k\}$ to each target. This approach turns the regression target into a pseudo-classification problem, clarifying the definition of ball “purity” and guiding splits. Balls are further divided until their quality reaches a user-determined threshold $T$ and size $\geq p$ .

5. Computational Complexity and Runtime Characteristics

The substitution of data points with granular regression balls directly impacts computational efficiency. The standard SVR quadratic program has $m$ variables, leading to an $O(m^3)$ cost. GBSVR works with only $2n$ variables (for $\alpha_i, \alpha_i^*$ ), resulting in a reduced asymptotic cost $O(n^3)$ , where typically $n/m = 20\% – 40\%$ , yielding a 5–10× speed-up during training. Empirically, GBSVR training was 8–12× faster than SVR and NuSVR on UCI datasets (inputs: 159–414 samples). For example, on the Servo dataset GBSVR completed training in approximately 1.0 s, compared to 9.3 s for SVR and 4.7 s for NuSVR (Rastogi et al., 13 Mar 2025).

Method	Dataset Size	Training Time (s)
GBSVR	159–414	1.0 (Servo)
SVR	159–414	9.3 (Servo)
NuSVR	159–414	4.7 (Servo)

6. Empirical Evaluation Across Domains

GBSVR’s empirical evaluation covers synthetic, benchmark, and real-world datasets:

Synthetic Data: On regression functions of Type A ( $y = \sin(\pi x)/(\pi x) + \eta$ ) and Type B ( $y = \cos(\pi x) + \eta$ ) with six heteroscedastic noise models, GBSVR achieved superior $R^2$ , MAE, MSE, and RMSE to SVR and NuSVR, especially under high noise conditions.
UCI Benchmarks: Across datasets such as Real Estate, AutoMPG, Autos, Servo, Yacht, and Machine, GBSVR yielded higher $R^2$ and lower error metrics at all noise-corruption levels (0–20%), typically with one-tenth the training time.
Stock Forecasting: For Apple, Google, NVIDIA, and Tesla using sliding-window (5→1) prediction, GBSVR improved $R^2$ by 2–5% and reduced MAE/RMSE against baselines.
Wind Speed Prediction: On 20-min and 30-min horizon tasks with 36,000 samples, GBSVR improved $R^2$ by 2–4% and reduced RMSE by 5–10% compared to SVR/NuSVR.

Across all experimental conditions, GBSVR was more accurate and an order of magnitude faster to train than standard SVR algorithms. While no formal p-value tests were reported, the consistency and magnitude of improvements across datasets and noise regimes indicate that the gains are practically significant (Rastogi et al., 13 Mar 2025).

Markdown Report Issue Upgrade to Chat

References (1)

GBSVR: Granular Ball Support Vector Regression (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Granular Ball SVR (GBSVR).

Granular Ball SVR

1. Granular Regression Ball Framework

2. Modified Support Vector Regression Formulation

3. Ball Construction and Training Algorithms

4. Discretization and Purity Measurement

5. Computational Complexity and Runtime Characteristics

6. Empirical Evaluation Across Domains

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Granular Ball SVR

1. Granular Regression Ball Framework

2. Modified Support Vector Regression Formulation

3. Ball Construction and Training Algorithms

4. Discretization and Purity Measurement

5. Computational Complexity and Runtime Characteristics

6. Empirical Evaluation Across Domains

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research