Radius-Based Classification Methods
- Radius-based classification is a set of methods that leverage geometric distances (using spheres, balls, or RBF kernels) to assign class labels and assess model reliability.
- Techniques include local manifold approximation, normalized RBF networks, and SVM-based feature elimination, each balancing theoretical risk bounds and empirical performance.
- Applications span exoplanet composition inference, adversarial robustness in NLP, and genomic feature selection, demonstrating practical gains especially in high-dimensional or data-scarce settings.
Radius-based classification encompasses a family of statistical and machine learning methods that use geometric notions of distance or radius—frequently through spheres, balls, or radial basis kernels—as the basis for assigning class labels, optimizing feature selection, assessing model robustness, and interpreting decisions. These approaches connect geometric intuition with formal risk bounds in theory and often yield practical gains in nonlinear, high-dimensional, or data-scarce settings. Radius-based techniques appear in local manifold approximation, RBF kernel networks, hyperspectral data analysis, support vector machines, adversarial robustness analyses, and planetary composition inference.
1. Geometric Foundations and Theoretical Justification
At the core of radius-based classification is the idea of quantifying similarity, support, or safety in terms of the Euclidean distance (or more generally, a norm-induced metric) to certain geometric objects—balls, spheres, or decision boundaries. In several core frameworks, this geometric radius connects directly to generalization theory via margin–radius products, VC dimension, and risk bounds, as formalized in the SVM context by the data radius and margin , with misclassification risk and VC-dimension scaling with (Aksu, 2012).
Local manifold approximation approaches, such as the LOcal Manifold Approximation (LOMA) and its SPherical Approximation (SPA) instantiation, formalize classification through the computation of a test point’s Euclidean distance to locally fitted supports—commonly spheres—to classwise neighborhoods. This distance plays the role of a radius that encapsulates the notion of nearness or support for each class, and the test point is assigned to the class whose locally approximated support is closest (Li et al., 2019).
2. Radius-Based Kernels and Networks
Radial basis function (RBF) kernels and networks are canonical forms of radius-based classification. In both traditional RBF networks and normalized variants (nRBFN), decision functions are built as weighted combinations of spherical (Gaussian) basis functions centered at prototypical points or “centers”. The exponential decay in the RBF kernel,
directly encodes classification as a function of the “radius” from each center in feature space (Hu et al., 2017).
Normalized RBFN extends this with column-wise normalization, establishing connections with spectral graph theory and providing guarantees on the tradeoff between fitting error and overfitting risk. Parameters such as the number of centers, kernel width , and regularization directly modulate the effective “sphere of influence” and risk profile.
Cluster-based random RBF kernels (CRRBF) build on the radius-based kernel principle by aggregating per-cluster Gaussian subkernels, with each cluster defined over a subset of hyperspectral bands and assigned a random width. The sum of these subkernels forms a flexible, easily-tunable kernel that often matches or exceeds the empirical performance of cross-validated single-width RBFs (Niazmardi, 2024).
3. Applications of Radius in Robustness and Interpretability
Radius-based concepts extend naturally into robustness guarantees and interpretability, particularly in neural NLP models. The Maximal Safe Radius (MSR) quantifies the minimal distance in embedding space to a decision boundary—the largest for which all -balls around an input leave the classification unchanged. Provable lower and upper bounds on the MSR can be established by linear relaxation (providing certification) and adversarial search (e.g., via Monte Carlo Tree Search), yielding quantifiable robustness margins for each sample (Malfa et al., 2020). Empirical evaluation reveals that embedding dimension fundamentally controls the certifiable safe radius—low-dimensional, prediction-trained embeddings yield larger radii and thus more robust models.
MSR-based saliency maps provide a theoretically-grounded interpretability tool: words or features with small certified radii are those whose perturbation most readily flips the decision, highlighting vulnerable or “decisive” components in the input.
4. Data Radius in Feature Selection and SVM Generalization
The notion of data radius, formalized as the smallest enclosing sphere in feature space, is a key component for characterizing the generalization properties of linear and kernel SVMs. Theoretical generalization bounds—the VC dimension, leave-one-out upper bounds, and expected misclassification error—are all proportional to , where is the data radius and the weight vector of the classifier (Aksu, 2012).
Radius-based feature elimination methods leverage these bounds. For instance, hard-margin and soft-margin feature elimination incorporate both the margin and the data radius into their criteria, selecting feature subsets that minimize the post-elimination bound, often via 1D “Little Optimization” or QP1 retraining in scalar-projected space. These radius-weighted elimination criteria consistently produce lower generalization error curves and improve stability compared to margin-only methods.
5. Local Support Estimation via Spherical Approximation
LOMA and its SPA specialization formalize class assignment by explicitly constructing local spherical models of class support. For each class, a sphere is fitted to a local neighborhood (via PCA for tangent subspace estimation and least-squares for center/radius). The classification decision reduces to assigning the test point to the class whose fitted sphere is closest in Euclidean distance. The approach is strongly justified theoretically: in the large-sample, low-noise limit, the SPA classifier is asymptotically optimal—even when class supports are nonlinear, overlapping, or intersecting (Li et al., 2019). Empirically, SPA demonstrates superior performance on data-scarce, high-dimensional, and highly nonlinear/intersecting benchmarks.
The fundamental step in SPA and similar approaches is the calculation of the distance from a point to the nearest point on a locally constructed sphere, the precise “radius” metric for decision-making.
6. Radius-Based Classification in Exoplanet Science
Radius-based classification principles extend significantly beyond conventional pattern recognition. In planetary science, the mass–radius relation underpins classification of solid exoplanets. The observed planetary radius (or mass for a given radius) is compared against theoretical mass–radius relations for iron, silicate, water, and carbon-rich compositions, derived from universal equations of state (modified polytropes). Radius thresholds at fixed mass separate iron-dominated, rocky (silicate), and water/ice planets, and define a “super-Earth” class as those planets with radii below the pure-water curve at a given mass (0707.2895).
Measurement uncertainty formalizes classification granularity: with ~5% errors, planets can be assigned to broad compositional categories by their measured radii. This radius-based approach bypasses the need for detailed interior modeling in most cases.
| Application Area | Characteristic Radius Use | Reference |
|---|---|---|
| Local manifold approximation | Distance to locally fitted spheres for support estimation | (Li et al., 2019) |
| RBF networks/kernels | Gaussian basis centered at prototypes, width as radius | (Hu et al., 2017, Niazmardi, 2024) |
| SVM generalization/feature elimination | Minimal enclosing sphere in feature space | (Aksu, 2012) |
| NLP robustness | Minimal embedding perturbation to flip class (Maximal Safe Radius) | (Malfa et al., 2020) |
| Exoplanet composition | Observed planetary radius vs. theoretical thresholds | (0707.2895) |
7. Empirical Performance and Parameter Sensitivity
Empirical analyses across domains establish that radius-based methods often outperform alternatives in regimes characterized by limited data, high dimensions, or complex nonlinear class supports. SPA outperforms NN, SVM, and DNN approaches especially with limited samples and class overlap (Li et al., 2019). Cluster-based RBF kernels (CRRBF) maintain robust accuracy with essentially only one open parameter (the number of clusters), showing less variance and much-reduced tuning burden compared to cross-validated RBFs in high-dimensional imaging (Niazmardi, 2024). Radius-regularized feature elimination yields systematically lower test errors and less variance in genomic feature selection tasks (Aksu, 2012). In NLP, radius-based robustness quantification via MSR reveals that most standard networks are highly sensitive (i.e., have small safe radii) to substitution perturbations, though low-dimensional, prediction-specialized embeddings yield increased certifiable robustness (Malfa et al., 2020).
A plausible implication is that the geometric perspective of “radius to decision boundary”—whether implemented through local support fits, kernel architectures, or robustness calculations—unifies the analysis and improvement of classification systems across a broad array of fields.
References:
- (Aksu, 2012) Fast SVM-based Feature Elimination Utilizing Data Radius, Hard-Margin, Soft-Margin
- (Hu et al., 2017) Spectral-graph Based Classifications: Linear Regression for Classification and Normalized Radial Basis Function Network
- (Li et al., 2019) Classification via local manifold approximation
- (Niazmardi, 2024) Cluster-based Random Radial Basis Kernel Function for Hyperspectral Data Classification
- (Malfa et al., 2020) Assessing Robustness of Text Classification through Maximal Safe Radius Computation
- (0707.2895) Mass-Radius Relationships for Solid Exoplanets