Condensed Nearest Neighbour (CNN)
- Condensed Nearest Neighbour (CNN) is a prototype selection method that reduces training set size while preserving 1-NN classification accuracy.
- The algorithm iteratively adds misclassified points to form a condensed set, ensuring perfect classification on training data.
- Theoretical bounds link the number of prototypes to data geometry, and variants like WNN and SFCNN offer enhanced efficiency and compression.
Condensed Nearest Neighbour (CNN) is a class of prototype selection techniques designed to reduce the size of training sets for nearest-neighbour (NN) classification. The canonical CNN algorithm operates by iteratively collecting a subset of the original data such that the reduced set (the condensed set) yields perfect classification of all training points under the 1-NN rule. This process is foundational in prototype reduction, memory-constrained learning, and provides a well-defined testbed for theory at the intersection of learning geometry, combinatorial optimization, and computational efficiency.
1. Formal Description and Algorithmic Workflow
Let , where is a finite label set. The CNN algorithm constructs a prototype set such that the 1-NN classifier induced by is consistent with the original training set: for every , the label assigned by the prototype set’s nearest neighbour operator equals . The standard algorithm (Christiansen, 2013, Bhatia et al., 2010) proceeds as follows:
6
CNN iterates through in fixed order: each misclassified training point under current is added to . The process terminates within 0 steps, and at convergence 1 is consistent with respect to 2.
2. Theoretical Properties and Upper Bounds
CNN guarantees termination, but its storage complexity is nontrivial to analyze. Christiansen (Christiansen, 2013) establishes a margin-based upper bound via a connection to the multiclass perceptron mistake bound. For each “neighborly” feature map 3 (satisfying that the 1-NN rule is reproduced via a restricted class of linear multiclass perceptron decision functions), define:
- Separability margin 4: the minimal margin such that for all 5 and all 6,
7
- Radius 8: maximal separation,
9
The main result is that CNN accumulates at most
0
prototypes, where the infimum is over all neighborly feature maps 1. This bound is independent of the training set size and depends only on geometric characteristics of the data embedding (Christiansen, 2013).
3. Algorithmic Complexity and Practical Implementation
Let 2 denote the final size of the condensed set. Each pass through 3 examines 4 points, with a nearest-neighbour search in 5 (cost 6 for linear scan, 7 for spatial structures). Typically, 8, so total complexity is subquadratic. Major implementation specifics include:
- Prototype storage in spatial indices (e.g., kd-trees) for efficient NN queries.
- Initialization heuristics: seeding 9 with one point per class improves boundary coverage.
- Order randomization: different scan orders can yield different 0, reflecting recognized order-sensitivity.
- Early stopping: practical variants terminate if a pass yields too few additions.
Empirically, CNN can reduce storage by an order-of-magnitude while maintaining high fidelity to the original classifier (Bhatia et al., 2010, Gottlieb et al., 2023).
4. Extensions, Variants, and Related Condensation Schemes
The nearest-neighbor condensation problem is NP-hard; seeking a minimum consistent subset is computationally intractable (Flores-Velazco et al., 2020). Notable variants and connections include:
- Fast CNN (FCNN) (Flores-Velazco, 2020): Adds batches of misclassified representatives per iteration. It can fail to guarantee polynomial-size upper bounds due to geometrically adversarial configurations.
- Social-distanced FCNN (SFCNN) (Flores-Velazco, 2020): Adds only a single new point per iteration, yielding a tight upper bound 1, where 2 is the margin and 3 is the doubling dimension.
- Weighted Distance Nearest-Neighbor Condensing (WNN) (Gottlieb et al., 2023): Each prototype is assigned an individual positive weight, and classification is by weighted distance. WNN can, in some cases, reduce the prototype set to size 4—substantially less than possible with standard (unweighted) condensation—while retaining generalization guarantees equivalent to the unweighted case.
- Reduced Nearest Neighbour (RNN): Post-processes CNN’s output by greedily removing redundant prototypes, further shrinking the set at extra computational expense (Bhatia et al., 2010).
- Coreset approaches: 5-selective/consistent subsets can yield theoretical coresets for the NN rule, offering refined guarantees on approximation and subquadratic construction in fixed dimension (Flores-Velazco et al., 2020).
5. Statistical and Geometric Insights
CNN’s upper bound and empirical efficacy are tightly linked to geometric data properties. A large separation margin leads directly to small prototype sets. The number of classes and feature dimension impact the bound only indirectly, through their influence on available feature maps and achievable margins. Conversely, densely packed, low-margin, or high-noise datasets force CNN to retain larger subsets. Order-sensitivity can cause relevant boundary points to be missed in a single pass, motivating multiple passes or ensemble practices (e.g., combining outputs from several random seeds) (Bhatia et al., 2010).
The extension to weighted distances (WNN) introduces an extra axis of flexibility that, in pathological constructions, can yield exponentially better compression (Gottlieb et al., 2023). Moreover, for general metric-space training data where the Bayes classifier has margin and the feature space is separable, WNN and its greedy heuristics can achieve Bayes consistency.
6. Practical Performance, Empirical Evidence, and Limitations
Experimental results demonstrate dramatic data reduction: on standard UCI datasets, CNN typically retains only 5–20% of the original set while preserving >95% of the 1-NN test accuracy; WNN and SFCNN can approach the theoretical optimum much more closely (Bhatia et al., 2010, Gottlieb et al., 2023, Flores-Velazco, 2020). For high-dimensional or complex data, performance remains dependent on the choice of distance metric and implementation of the base NN search.
The NP-hardness of optimal condensation means all practical methods are heuristic. For many real-world settings, boundary coverage is the principal challenge: CNN may miss necessary prototypes in poorly separated regions or under adversarial scan orders. This issue can be partly remedied by ensemble condensation schemes or by switching to weighted and relaxed variants.
7. Connections to Broader Literature and Applications
CNN is a central structureless approach to instance selection, distinct from structure-based NN acceleration methods such as kd-trees or cover-trees, which target query-time complexity rather than storage reduction (Bhatia et al., 2010). Prototype selection via CNN serves as a precursor to model compression, memory-efficient inference, and as an ingredient in coreset construction for scalable machine learning (Flores-Velazco et al., 2020). Its theoretical underpinnings, particularly the perceptron connection, clarify conditions for effective sample compression and inform methods in active learning and statistical geometry. The adaptability of condensation to weighted distances, as in WNN, opens further potential for tailored memory-reduction in complex metric spaces (Gottlieb et al., 2023).