Fraction of Borderline Points (N1)
- Fraction of Borderline Points (N₁) is a metric that quantifies the proportion of training points critical for defining class boundaries in nearest-neighbor classification.
- It measures boundary complexity by identifying points on non-empty Voronoi facets, reflecting the geometric and combinatorial structure of the dataset.
- Algorithmic improvements enable efficient discovery of these border points, aiding classifier robustness assessment and dataset reduction strategies.
The Fraction of Borderline Points (N₁) is a metric in nearest-neighbor classification denoting the proportion of training samples that are essential for defining the decision boundaries of a classifier. These key points—variously termed "border points" or "relevant points"—are those whose removal would alter the classifier’s output for at least one query in . The N₁ statistic provides a quantitative index of the geometric and combinatorial complexity of the class boundaries in a given dataset (Flores-Velazco, 2022).
1. Formal Definition of Borderline (Relevant) Points
Given a labeled dataset of size , with class labels for each , a point is deemed a border (or relevant) point if there exists another sample with and a query for which
This condition holds exactly when 0 and 1 span a non-empty 2-dimensional Voronoi face (a "wall") that separates regions associated with different class labels. Alternatively, a point is relevant if its deletion from 3 would result in misclassification of some query 4 under the nearest-neighbor classifier. This equates the concept with those points lying on class-separating facets of the Voronoi diagram of 5 (Flores-Velazco, 2022).
2. Algorithmic Identification of the Border Set
Let 6 denote the set of all border points, with 7. The border set can be found by an output-sensitive search procedure, improving upon prior 8 algorithms with a method that avoids the initial 9 minimum spanning tree computation. The high-level steps are:
- Choose an arbitrary seed 0 and initialize 1.
- Iterate until no new points enter 2:
- For each 3 in 4 (already processed):
- Let 5 be the same-class subset as 6.
- Invert points of 7 through a sphere centered at 8, yielding set 9.
- Find extreme points of 0, using, e.g., Chan’s output-sensitive convex hull algorithm.
- Map these extreme points back to 1 and add them to 2.
- For each 3 in 4 (already processed):
- Output 3 as the set of all border points.
The above procedure ensures that inversion only reports actual border points (bichromatic Voronoi walls), that all connected boundary components are completely discovered by repeated inversion, and that moving across single same-class regions is possible to visit disconnected walls, enabling single-pass complete discovery (Flores-Velazco, 2022).
3. Computational Complexity and Implementation Details
The algorithmic bottleneck is finding extreme points in high-dimensional inversion sets. Each inversion operation costs 4, due to 5 points and up to 6 extreme points per inversion, with at most 7 border points requiring expansion. Thus, the total runtime is 8. For 9, using Chan’s randomized hull algorithm gives an expected runtime of 0. In general, for fixed 1, the time complexity becomes
2
Key implementation concerns include numerically stable sphere inversion and efficient convex hull or extreme-point routines in 3 dimensions. Randomized routines, such as Chan’s, provide practical improvements but depend on random sampling. For large datasets, subsampling or approximate extreme-point queries can yield an approximate border set, as can deployment of fast approximate nearest-neighbor structures to speed up inversion-related emptiness checks (Flores-Velazco, 2022).
4. The N₁ Statistic: Definition and Interpretation
Once the border set 4 is determined, the fraction of borderline points
5
is calculated, where 6. This metric quantifies the fraction of training data lying precisely on class-separating Voronoi facets and thus actually influencing the classifier’s decisions in 7.
Interpretively, 8 is an index of boundary complexity:
- 9 indicates simple, well-separated classes with few true border points.
- 0 implies highly interwoven or noisy class structure, where most samples are critical to correct classification, and nearest-neighbor classifiers may be fragile.
Common practice assumes general position (no 1 points co-spherical) to ensure Voronoi faces are well-defined (Flores-Velazco, 2022).
5. Empirical Behavior and Practical Significance
The behavior of 2 (and thus 3) is dataset-dependent:
- In pathological worst-case scenarios, such as highly interleaved or noisy class distributions yielding 4, the algorithm’s 5 complexity is prohibitive.
- For most real-world datasets where classes cluster or boundaries are low-dimensional, 6, making the procedure feasible and 7 a meaningful regularity indicator.
- In large-scale regimes, approximate algorithms or subsampling provide practical estimates of 8.
Practical interpretation of 9 as a boundary complexity index makes it diagnostically useful for evaluating the suitability of nearest-neighbor classifiers and for informing dataset reduction strategies by identifying the minority of points truly necessary for boundary accuracy (Flores-Velazco, 2022).
6. Summary Table
| Quantity | Definition | Interpretation |
|---|---|---|
| 0 | Set of all border (relevant) points | Points defining class-separating Voronoi facets |
| 1 | Number of border points | Critical size parameter for complexity and runtime |
| 2 | Fraction of border points | Index of class boundary complexity |
The fraction of borderline points (N₁), defined as 3, is thus a precisely characterized, geometrically motivated, and computationally tractable metric for assessing and exploiting the structure of training sets in nearest-neighbor classification (Flores-Velazco, 2022).