PointNN Selector: Nearest-Neighbor Condensation

Updated 26 November 2025

PointNN Selector is an algorithm that selects a sparse set of labeled points ensuring perfect nearest-neighbor classification using a separation constraint.
It modifies the classic FCNN heuristic by incorporating a user-defined separation parameter δ to control density and prevent over-representation in training sets.
The method provides a constant-factor approximation to the minimum consistent subset, with theoretical guarantees based on doubling dimension and nearest-enemy complexity.

A PointNN Selector is a selection algorithm for identifying representative or relevant points from a dataset embedded in a metric space, with formal guarantees on both accuracy and subset size. It is most prominently defined in the context of nearest-neighbor condensation, where the goal is to minimize the subset of labeled points required for perfect classification under the nearest neighbor rule. The term encompasses a specific algorithmic modification of the classic Fast Condensed Nearest Neighbor (FCNN) heuristic, introducing separation constraints to prevent pathological clusterings and to obtain size and approximation guarantees (Flores-Velazco, 2020). The PointNN Selector framework is distinguished by its theoretical foundation, making it applicable as a robust solution for subset selection tasks in geometric and learning contexts.

1. Problem Definition and Formalism

The PointNN Selector algorithm addresses the minimum consistent subset (Min-CS) problem for nearest-neighbor classification in metric spaces $(X,d)$ . Given a labeled dataset $P \subset X$ with class labels $\ell:P\to\{1,2,\dots,c\}$ , and the nearest neighbor function $nn_P(q) = \arg\min_{p\in P} d(q,p)$ , the aim is to find a subset $S\subseteq P$ such that every $p\in P$ is classified correctly, i.e., $\ell(nn_S(p)) = \ell(p)$ . The size of $S$ should be as small as possible, ideally approaching the minimum required for perfect consistency.

Key notations and structural parameters central to the problem include:

Margin $\gamma$ : The smallest distance between a point and its nearest enemy,

$\gamma = \min_{p \in P} d(p, \mathrm{ne}_P(p))$

where $\mathrm{ne}_P(p) = \arg\min_{q \in P,\, \ell(q) \neq \ell(p)} d(p,q)$ .

Nearest-enemy complexity $\kappa$ : The number of distinct nearest-enemy points in $P$ .
Doubling dimension $\mathrm{ddim}$ : The smallest integer such that every ball of radius $r$ can be covered by $2^{\mathrm{ddim}}$ balls of radius $r/2$ .
Diameter $\Delta = \max_{p,q\in P} d(p,q)$ , assumed normalized to 1.

2. Classic FCNN and Its Limitations

The original FCNN heuristic (Angiulli 2007) builds the subset $R$ iteratively:

Initialize $R$ with the centroid of each class.
For each $p \in R$ , find misclassified points in $P\setminus R$ for which $p$ is the nearest representative.
Add the closest such misclassified point to $R$ .
Repeat until no misclassifications remain.

While this approach preserves nearest-neighbor accuracy, its output size can become pathological (arbitrarily large in $\kappa$ ), especially when points are densely packed near class boundaries (Flores-Velazco, 2020).

3. PointNN Selector: Algorithmic Description

The PointNN Selector is a modification of FCNN, introducing a user-specified separation parameter $\delta > 0$ , usually set to the empirical margin $\gamma$ . The algorithm is as follows:

$R \leftarrow$ centroids of all classes.
For each $p\in R$ , enqueue any $q\in P\setminus R$ with $nn_R(q) = p$ and $\ell(q) \neq \ell(p)$ .
While the queue $Q$ $Q$ is not empty:
- Dequeue $q$ .
- If $d(q, r) \geq \delta$ for all $r\in R$ , add $q$ to $R$ .
- For the new $q$ , enqueue any additional misclassified points for which $q$ is now the closest.

The algorithm ensures $R$ is always $\delta$ -separated; no two selected points are closer than $\delta$ . This enforces a packing constraint, preventing arbitrarily high local density in $R$ (Flores-Velazco, 2020).

4. Theoretical Guarantees

The PointNN Selector is the first variant in this family to provide provable worst-case size bounds and approximation guarantees for the Min-CS problem:

Packing Bound: In a metric space of doubling dimension $\mathrm{ddim}$ and diameter 1, the size of $R$ is

$|R| \leq \kappa\cdot \lceil \log_2(1/\delta)\rceil \cdot 4^{\mathrm{ddim}+1}.$

Approximation Guarantee: Compared to the minimum-size consistent subset OPT, the PointNN Selector produces a $2^{\mathrm{ddim}+1}$ -approximation:

$|R|\leq 2^{\mathrm{ddim}+1}\cdot |OPT|.$

These results are obtained by partitioning $R$ by nearest-enemy and distance scale, showing that within each, packing numbers in doubling spaces limit cardinality.

If $\delta\leq\gamma$ , the algorithm always achieves exact consistency (zero error) for the training set. If $\delta>\gamma$ , a small number of boundary misclassifications may occur.

5. Parameter Selection and Practical Considerations

Separation Parameter $\delta$ : In practice, set to the empirical margin $\gamma$ to guarantee consistency and optimal separation.
Algorithmic Complexity: Each insertion spends $O(|R|)$ time checking the separation constraint; total runtime is $O(n|R|)$ , matching FCNN asymptotically.
Queue Mechanics: The FIFO structure ensures that additions are well-ordered, and that density control is maintained throughout progress.

A typical application involves running PointNN Selector on a dataset to produce a sparse, robust, and representative set of exemplars, with size and approximation guarantees, to accelerate nearest-neighbor queries or to serve as condensed training sets for resource-constrained deployments.

	FCNN	PointNN Selector
Size bound	None (unbounded)	$O(\kappa\log(1/\delta)4^{\mathrm{ddim}+1})$
Approximation to Min-CS	Heuristic only	$O(2^{\mathrm{ddim}+1})$ factor
Runtime	$O(n\|R\|)$	$O(n\|R\|)$
Consistency on $P$	Always if full	Always if $\delta\leq\gamma$

FCNN demonstrates no non-trivial worst-case size bound, while PointNN Selector achieves a provable packing bound and constant-factor approximation for the NP-hard Min-CS problem. Both share similar asymptotic runtimes.

7. Interpretive Notes and Implications

The introduction of a separation constraint enables PointNN Selector to be robust to pathological input configurations and prevents over-representation of localized high-density regions. This suggests the method is suited to high-dimensional, potentially low-margin datasets where classic condensation algorithms fail by redundancy or overselection.

A plausible implication is that PointNN Selector is broadly applicable as a core method for prototype selection in metric learning, geometric data condensation, and for accelerating the inference speed of nearest-neighbor-based classifiers, while maintaining formal error and size guarantees. Its parameters expose an explicit trade-off between sparsity and fidelity, controlled via the separation $\delta$ . The algorithm’s performance is determined by the underlying geometry (doubling dimension) and the labeling complexity (through $\kappa$ ).

References: The main definition, results, and algorithm are presented in "Social Distancing is Good for Points too!" (Flores-Velazco, 2020).

Markdown Report Issue Upgrade to Chat

References (1)

Social Distancing is Good for Points too! (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PointNN Selector.