Reduced Nearest Neighbour with Weighted Condensing
- Reduced Nearest Neighbour (RNN) is a method that generalizes classical NN condensing by assigning positive weights to samples, enhancing data compression.
- It utilizes a weighted distance metric to improve compression ratios and maintain classification accuracy, with guarantees such as Bayes consistency.
- A greedy heuristic algorithm efficiently selects representative points, balancing computational complexity with near-optimal performance.
Weighted Distance Nearest-Neighbor Condensing (WNN) is a generalization of classical nearest-neighbor condensing that enables efficient sample reduction in metric-space classification by introducing a positive weighting function over the condensed subset. Each element of the condensed set is assigned an individual weight, and weighted distance governs both assignment and prediction. This approach leads to greatly improved sample compression, maintains generalization guarantees comparable to standard nearest-neighbor (NN) condensing, and is provably Bayes-consistent under broad conditions (Gottlieb et al., 2023).
1. Formal Definition and Problem Formulation
Let be a separable metric space and
a labeled sample. The condensed set is with a positive weighting function
extended to by for . The weighted distance between two points is
When classifying a query , the weighted distance to a condensed point 0 is 1, as 2. The associated classifier is
3
A pair 4 is a consistent WNN condensing if for every sample 5,
6
The principal optimization is to find, out of all consistent pairs, one minimizing 7.
2. Theoretical Properties and Generalization Bounds
Separation of Power
A strict power separation exists between unweighted and weighted condensing. For any 8, there are 9-point datasets for which any consistent unweighted NN cover requires 0 points, whereas weighted condensing can achieve consistency with 1. The construction involves two interleaved geometric "bananas" of opposite labels, where large weights at the two extremes enable circular decision regions under WNN, compressing the data to two points.
Generalization Bounds
A sample-compression argument yields the following generalization bound. For any empirically consistent WNN classifier 2 with 3 on a size-4 i.i.d. sample, with probability at least 5:
6
If reconstruction is permutation-invariant, the bound sharpens to
7
These bounds are quantitatively on par with those for unweighted NN condensing.
3. Greedy Heuristic Algorithm for Weighted Condensing
The “Greedy Weighted Condensing” heuristic selects at each iteration the sample point whose “ball of radius = distance to nearest enemy” covers the largest number of uncovered points of the same label, and assigns its weight accordingly.
Algorithmic Structure
6
At each iteration, the algorithm solves
8
subject to 9. This matches a greedy set-cover approximation (Chvátal’s algorithm) in which each center covers same-label points within its enemy-exclusion radius.
Computational Complexity
Naive implementation requires 0 distance computations per iteration and up to 1 iterations for an overall 2 complexity. Use of spatial data structures and careful maintenance of nearest-enemy distances can reduce empirical computational burden to 3 or better.
4. Bayes Consistency and Statistical Guarantees
Let 4 denote the minimal distance to any sample of opposite label (and 5 outside 6). Consider the (intractable) condensing rule:
7
Theorem (Bayes consistency):
Suppose 8 is separable, 9 has an atomless distribution, and 0 is piecewise-continuous. Then as 1, the risk of 2 approaches zero (Bayes risk), almost surely.
A plausible implication is that WNN condensing attains asymptotically minimal possible classification error.
Corollary (Greedy heuristic):
Assuming an additional mild tail condition on the metric distribution (e.g., bounded support or Gaussian tails), the greedy heuristic achieves Bayes-consistency due to its solution size being at most 3 times larger than optimal.
5. Empirical Results and Comparative Analysis
Small-scale Evaluation
The following table summarizes condensed-set sizes obtained by four methods across three two-class datasets:
| Dataset | Points | MSS | RSS | IP (opt. NN) | WNN |
|---|---|---|---|---|---|
| Circle | 200 | 52 | 45 | 7 | 12 |
| Banana | 200 | 74 | 66 | 32 | 35 |
| Iris | 100 | 11 | 9 | 2 | 4 |
- MSS: modified selected subset
- RSS: recent selective subset
- IP: integer-programming optimum for unweighted NN
- WNN: greedy weighted
WNN condensing outperforms MSS and RSS in sample compression, closely approximating the optimal unweighted solution.
Large-scale Evaluation
For notMNIST (≈19,000 samples, 10 classes, dimensionality reduced via UMAP), with a 70/30 train/test split (10-fold), the following outcomes were observed:
- Test error: WNN matches 1-NN (no compression); both MSS and RSS yield increased error.
- Compression ratio: WNN retains ~20% (80% compression), with MSS slightly better compression but higher error, while RSS is inferior in both metrics.
This demonstrates that WNN yields significant reduction in stored samples without compromising classification accuracy.
6. Limitations, Open Problems, and Future Directions
Current Limitations
- The greedy heuristic lacks a constant-factor approximation guarantee for minimal weighted condensing.
- Computational complexity remains substantial for very large datasets in the absence of specialized search structures.
Open Questions
- Complexity theory for weighted condensing—approximation hardness is unresolved.
- Existence of improved approximation algorithms, e.g., 4-approximate solutions for minimal weighted condensing.
- Extension to multiclass classification and to generalized distance metrics dependent on the weights.
Potential Extensions
- Developing fast search structures tailored for weighted-distance nearest-neighbor queries.
- Designing alternate greedy or local-search heuristic algorithms with provable approximation guarantees.
- Integrating metric learning with weighted condensing to adapt the base metric 5 for further performance improvement.
Weighted Distance Nearest-Neighbor Condensing thus provides a strict generalization of standard NN condensing, preserves generalization properties, and consistently yields smaller condensed representations without loss of accuracy (Gottlieb et al., 2023).