Papers
Topics
Authors
Recent
Search
2000 character limit reached

Overlap Number of Balls (ONB) Analysis

Updated 6 April 2026
  • ONB is a geometric and combinatorial metric that measures class overlap by counting the minimum number of non-overlapping, class-pure hyperspheres needed to cover each class.
  • It employs a safe-radius approach and a greedy ball-covering algorithm to capture local point density, boundary complexity, and data separation nuances.
  • ONB values correlate with classification difficulty, where high overlap leads to many small balls and signal greater challenges in instance-based learning.

The Overlap Number of Balls (ONB) quantifies the geometric and combinatorial overlap between distinct classes or groups within a set—whether in statistical learning, combinatorial allocation, or urn models—by measuring the minimal number of non-overlapping, class-pure balls (hyperspheres) required to cover each class such that no ball contains points from more than one class. The ONB framework is used extensively in data complexity analysis, overlap quantification, and applied probability, with variations tailored to classification geometry, probability of collisions in random allocations, and partition thresholds in combinatorial settings (Pascual-Triana et al., 2024, Pascual-Triana et al., 2020, Gouet et al., 2019, Czabarka et al., 2012).

1. Mathematical Definitions and Algorithmic Construction

The core ONB construction begins with a labeled dataset D={(x1,y1),,(xn,yn)}Rd×{1,,k}D = \{(x_1, y_1), \ldots, (x_n, y_n)\} \subset \mathbb{R}^d \times \{1, \ldots, k\} for kk classes. For a given xix_i of class yiy_i, the "safe-radius" rir_i is defined as the minimal distance to any point of a different class: ri=minj:yjyixixj.r_i = \min_{j: y_j \ne y_i} \|x_i - x_j\|. The corresponding closed ball is Bi={xRd:xxiri}B_i = \{x \in \mathbb{R}^d : \|x - x_i\| \le r_i\}. The set of all points of class kk is XkX_k. To cover XkX_k, the algorithm greedily selects balls that maximize the number of yet-uncovered points in kk0, breaking ties by radius if needed, and removes newly covered points from consideration. The process iterates until all points in kk1 are covered. The number of selected balls, kk2, is the ONB for class kk3 (Pascual-Triana et al., 2024, Pascual-Triana et al., 2020).

Associated to each covering ball are three attributes:

  • Radius (kk4): indicates local class-separation.
  • Covered instances (kk5): quantifies local point density.
  • Density (kk6): signals tightness of local packing, relevant for outlier and boundary detection.

2. ONB as a Data Complexity Metric

ONB metrics offer a tunable, geometry-aware quantification of class overlap and boundary complexity. Heavy class overlap leads to small kk7 and large kk8, as many small balls are required to maintain class-purity. Well-separated classes yield large kk9 and minimal xix_i0. Main variants summarized in (Pascual-Triana et al., 2020) include:

Metric Formula Typical Use
xix_i1 xix_i2 Global overlap
xix_i3 xix_i4 Class-level overlap
Distance choices Euclidean (xix_i5) or Manhattan (xix_i6) Data-dependent

Empirically, the Manhattan-distance class-averaged ONB xix_i7 demonstrates the strongest negative correlation with 1NN geometric mean performance across both synthetic and real-world datasets (xix_i8 in balanced artificial data) (Pascual-Triana et al., 2020).

3. Theoretical Properties and Interpretations

Several monotonicity and tradeoff properties hold:

  • Monotonicity: As class overlap increases, ONB increases; as classes become more separable, ONB decreases.
  • Radius–Overlap Trade-off: The average ball radius xix_i9 for covering class yiy_i0 is inversely related to overlap—more overlap means smaller yiy_i1.
  • Bounds: yiy_i2, with yiy_i3 under maximal overlap (each point requires its own ball), and yiy_i4 for fully disjoint classes (Pascual-Triana et al., 2024, Pascual-Triana et al., 2020).
  • Boundary Complexity: ONB simultaneously captures local (microscopic) and global (macroscopic) structural complexity at class boundaries.

ONB values correlate strongly with classification difficulty. Instance-based methods (e.g., kNN) suffer most in high-ONB regimes, where boundaries are intricate or classes interpenetrate. This behavior is validated empirically, as ONB provides better prediction of classifier performance than alternatives such as MST- or nearest-neighbor-based complexity measures (Pascual-Triana et al., 2020).

4. Computational Complexity and Practical Implementations

The dominant complexity arises from distance computations and the covering procedure:

  • Pairwise distances: yiy_i5 for yiy_i6 points in yiy_i7 dimensions.
  • Cover construction: per class, each step may require yiy_i8 scans, possibly up to yiy_i9 steps, yielding rir_i0 complexity total in the worst case.
  • Accelerations: For moderate rir_i1, practical implementations leverage spatial indices (e.g., kd-trees) or approximate nearest-neighbor techniques to expedite range queries and nearest-opposite computation (Pascual-Triana et al., 2024, Pascual-Triana et al., 2020).
  • Parameterization: The metric is robust to distance choice and agnostic to scale, but boundary region identification may require percentile-based thresholding of radius, coverage, or density.

5. Extensions: Singular Models and Generalizations

The ONB paradigm generalizes naturally to:

  • Multi-label: Restricting ball covers to label-overlap constraints (Pascual-Triana et al., 2020).
  • Multi-instance: Treating each bag as a composite entity, with covering applied in bag space.
  • Multi-view: Requiring that balls capture proximity in all feature spaces jointly.
  • Singular problems: Ball coverage schemes can be adapted to account for more intricate or non-Euclidean relational structures.

In applied probability, analogous "ONB" statistics arise in urn models, where overflow quantifies the number of assignments of balls to urns (with capacity rir_i2) that result in overfilling. Exact asymptotic formulas and limit laws (Poisson or Gaussian) for these collision/overflow statistics are derived under varying scaling regimes for rir_i3 balls and rir_i4 urns (Gouet et al., 2019).

6. Combinatorial Thresholds and ONB in Allocation Problems

In combinatorics, the ONB concept maps to sharp thresholds for the emergence of overlapping box occupancies:

  • Model-dependent thresholds: For rir_i5 balls and rir_i6 boxes (distinguishable/indistinguishable, surjective, etc.), the ONB represents the maximal box count rir_i7 where the probability of any two boxes coinciding in occupancy remains bounded away from zero.
  • Sample results (Czabarka et al., 2012):
Model Threshold for ONB
Compositions rir_i8
Integer partitions rir_i9
Surjections / set partitions ri=minj:yjyixixj.r_i = \min_{j: y_j \ne y_i} \|x_i - x_j\|.0

Each model exhibits a sharp phase transition: as ri=minj:yjyixixj.r_i = \min_{j: y_j \ne y_i} \|x_i - x_j\|.1 crosses the threshold, the probability of occupancy overlap jumps from ri=minj:yjyixixj.r_i = \min_{j: y_j \ne y_i} \|x_i - x_j\|.2 to ri=minj:yjyixixj.r_i = \min_{j: y_j \ne y_i} \|x_i - x_j\|.3.

7. Applications: Fairness, Bias Reduction, and Data Preprocessing

In the context of fair machine learning, the ONB has been adapted into the Fair-ONB method, which targets bias reduction by undersampling regions of greatest overlap—those closest to decision boundaries or with minimal class-purity—according to ball attributes (radius, coverage, and density) (Pascual-Triana et al., 2024). The procedure identifies high-overlap ("worst") regions by percentile filtering and removes or relabels associated instances, thus enhancing model fairness with minimal predictive performance degradation.

ONB and its filtered variants have been empirically validated to:

  • Improve class-balanced representation across protected groups.
  • Reduce bias in algorithmic decisions rooted in training set geometry.
  • Offer instance selection strategies superior to random or naive undersampling in maximizing fairness while preserving classification utility (Pascual-Triana et al., 2024).

References:

  • (Pascual-Triana et al., 2024) Fair Overlap Number of Balls (Fair-ONB): A Data-Morphology-based Undersampling Method for Bias Reduction
  • (Pascual-Triana et al., 2020) Revisiting Data Complexity Metrics Based on Morphology for Overlap and Imbalance: Snapshot, New Overlap Number of Balls Metrics and Singular Problems Prospect
  • (Gouet et al., 2019) Asymptotics of the overflow in urn models
  • (Czabarka et al., 2012) Threshold functions for distinct parts: revisiting Erdos-Lehner

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Overlap Number of Balls (ONB).