Papers
Topics
Authors
Recent
2000 character limit reached

PointNN Selector: Nearest-Neighbor Condensation

Updated 26 November 2025
  • PointNN Selector is an algorithm that selects a sparse set of labeled points ensuring perfect nearest-neighbor classification using a separation constraint.
  • It modifies the classic FCNN heuristic by incorporating a user-defined separation parameter δ to control density and prevent over-representation in training sets.
  • The method provides a constant-factor approximation to the minimum consistent subset, with theoretical guarantees based on doubling dimension and nearest-enemy complexity.

A PointNN Selector is a selection algorithm for identifying representative or relevant points from a dataset embedded in a metric space, with formal guarantees on both accuracy and subset size. It is most prominently defined in the context of nearest-neighbor condensation, where the goal is to minimize the subset of labeled points required for perfect classification under the nearest neighbor rule. The term encompasses a specific algorithmic modification of the classic Fast Condensed Nearest Neighbor (FCNN) heuristic, introducing separation constraints to prevent pathological clusterings and to obtain size and approximation guarantees (Flores-Velazco, 2020). The PointNN Selector framework is distinguished by its theoretical foundation, making it applicable as a robust solution for subset selection tasks in geometric and learning contexts.

1. Problem Definition and Formalism

The PointNN Selector algorithm addresses the minimum consistent subset (Min-CS) problem for nearest-neighbor classification in metric spaces (X,d)(X,d). Given a labeled dataset PXP \subset X with class labels :P{1,2,,c}\ell:P\to\{1,2,\dots,c\}, and the nearest neighbor function nnP(q)=argminpPd(q,p)nn_P(q) = \arg\min_{p\in P} d(q,p), the aim is to find a subset SPS\subseteq P such that every pPp\in P is classified correctly, i.e., (nnS(p))=(p)\ell(nn_S(p)) = \ell(p). The size of SS should be as small as possible, ideally approaching the minimum required for perfect consistency.

Key notations and structural parameters central to the problem include:

  • Margin γ\gamma: The smallest distance between a point and its nearest enemy,

γ=minpPd(p,neP(p))\gamma = \min_{p \in P} d(p, \mathrm{ne}_P(p))

where neP(p)=argminqP,(q)(p)d(p,q)\mathrm{ne}_P(p) = \arg\min_{q \in P,\, \ell(q) \neq \ell(p)} d(p,q).

  • Nearest-enemy complexity κ\kappa: The number of distinct nearest-enemy points in PP.
  • Doubling dimension ddim\mathrm{ddim}: The smallest integer such that every ball of radius rr can be covered by 2ddim2^{\mathrm{ddim}} balls of radius r/2r/2.
  • Diameter Δ=maxp,qPd(p,q)\Delta = \max_{p,q\in P} d(p,q), assumed normalized to 1.

2. Classic FCNN and Its Limitations

The original FCNN heuristic (Angiulli 2007) builds the subset RR iteratively:

  1. Initialize RR with the centroid of each class.
  2. For each pRp \in R, find misclassified points in PRP\setminus R for which pp is the nearest representative.
  3. Add the closest such misclassified point to RR.
  4. Repeat until no misclassifications remain.

While this approach preserves nearest-neighbor accuracy, its output size can become pathological (arbitrarily large in κ\kappa), especially when points are densely packed near class boundaries (Flores-Velazco, 2020).

3. PointNN Selector: Algorithmic Description

The PointNN Selector is a modification of FCNN, introducing a user-specified separation parameter δ>0\delta > 0, usually set to the empirical margin γ\gamma. The algorithm is as follows:

  1. RR \leftarrow centroids of all classes.
  2. For each pRp\in R, enqueue any qPRq\in P\setminus R with nnR(q)=pnn_R(q) = p and (q)(p)\ell(q) \neq \ell(p).
  3. While the queue QQ is not empty:
    • Dequeue qq.
    • If d(q,r)δd(q, r) \geq \delta for all rRr\in R, add qq to RR.
    • For the new qq, enqueue any additional misclassified points for which qq is now the closest.

The algorithm ensures RR is always δ\delta-separated; no two selected points are closer than δ\delta. This enforces a packing constraint, preventing arbitrarily high local density in RR (Flores-Velazco, 2020).

4. Theoretical Guarantees

The PointNN Selector is the first variant in this family to provide provable worst-case size bounds and approximation guarantees for the Min-CS problem:

  • Packing Bound: In a metric space of doubling dimension ddim\mathrm{ddim} and diameter 1, the size of RR is

Rκlog2(1/δ)4ddim+1.|R| \leq \kappa\cdot \lceil \log_2(1/\delta)\rceil \cdot 4^{\mathrm{ddim}+1}.

  • Approximation Guarantee: Compared to the minimum-size consistent subset OPT, the PointNN Selector produces a 2ddim+12^{\mathrm{ddim}+1}-approximation:

R2ddim+1OPT.|R|\leq 2^{\mathrm{ddim}+1}\cdot |OPT|.

  • These results are obtained by partitioning RR by nearest-enemy and distance scale, showing that within each, packing numbers in doubling spaces limit cardinality.

If δγ\delta\leq\gamma, the algorithm always achieves exact consistency (zero error) for the training set. If δ>γ\delta>\gamma, a small number of boundary misclassifications may occur.

5. Parameter Selection and Practical Considerations

  • Separation Parameter δ\delta: In practice, set to the empirical margin γ\gamma to guarantee consistency and optimal separation.
  • Algorithmic Complexity: Each insertion spends O(R)O(|R|) time checking the separation constraint; total runtime is O(nR)O(n|R|), matching FCNN asymptotically.
  • Queue Mechanics: The FIFO structure ensures that additions are well-ordered, and that density control is maintained throughout progress.

A typical application involves running PointNN Selector on a dataset to produce a sparse, robust, and representative set of exemplars, with size and approximation guarantees, to accelerate nearest-neighbor queries or to serve as condensed training sets for resource-constrained deployments.

FCNN PointNN Selector
Size bound None (unbounded) O(κlog(1/δ)4ddim+1)O(\kappa\log(1/\delta)4^{\mathrm{ddim}+1})
Approximation to Min-CS Heuristic only O(2ddim+1)O(2^{\mathrm{ddim}+1}) factor
Runtime O(nR)O(n|R|) O(nR)O(n|R|)
Consistency on PP Always if full Always if δγ\delta\leq\gamma

FCNN demonstrates no non-trivial worst-case size bound, while PointNN Selector achieves a provable packing bound and constant-factor approximation for the NP-hard Min-CS problem. Both share similar asymptotic runtimes.

7. Interpretive Notes and Implications

The introduction of a separation constraint enables PointNN Selector to be robust to pathological input configurations and prevents over-representation of localized high-density regions. This suggests the method is suited to high-dimensional, potentially low-margin datasets where classic condensation algorithms fail by redundancy or overselection.

A plausible implication is that PointNN Selector is broadly applicable as a core method for prototype selection in metric learning, geometric data condensation, and for accelerating the inference speed of nearest-neighbor-based classifiers, while maintaining formal error and size guarantees. Its parameters expose an explicit trade-off between sparsity and fidelity, controlled via the separation δ\delta. The algorithm’s performance is determined by the underlying geometry (doubling dimension) and the labeling complexity (through κ\kappa).

References: The main definition, results, and algorithm are presented in "Social Distancing is Good for Points too!" (Flores-Velazco, 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to PointNN Selector.