Nearest-Mean-of-Exemplars Classification

Updated 7 March 2026

Nearest-Mean-of-Exemplars classification represents each class by a prototype computed as the mean of exemplar embeddings, ensuring intra-class compactness and inter-class separability.
It enables efficient incremental and continual learning by reducing storage and computational costs while mitigating catastrophic forgetting.
The approach is widely applied in deep embedding, zero-shot, and open-set recognition, achieving competitive accuracy on benchmarks like CIFAR and MNIST.

The nearest-mean-of-exemplars (NME) classification paradigm designates each class by a single representative prototype, typically the empirical mean of a set of exemplar embeddings or features. Classification proceeds by assigning queries to the class whose prototype is nearest under a chosen metric, most commonly the Euclidean or cosine distance. Originally rooted in classic pattern recognition as the nearest-class-mean (NCM) classifier, NME has become widely adopted in deep learning for incremental learning, continual learning, open-set recognition, zero-shot learning, and efficient data embedding. By summarizing entire classes by low-dimensional means, NME methods achieve significant computational, storage, and practical advantages while imposing strong inductive biases such as intra-class compactness and inter-class separability.

1. Formal Definition and Core Principles

Given a collection of labeled feature vectors $\{(x_i, y_i)\}_{i=1}^N$ , where $x_i \in \mathbb{R}^d$ and $y_i \in \{1, \dots, K\}$ , the prototype (mean) of class $k$ is computed as:

$m_k = \frac{1}{N_k} \sum_{i: y_i = k} x_i, \quad N_k = |\{i : y_i = k\}|$

At test time, for any query $x$ , the decision rule is:

$\hat y = \arg\min_{k} \, d(x, m_k)$

where $d(\cdot,\cdot)$ is typically the Euclidean distance $\|x - m_k\|_2$ or its squared form for computational efficiency. This "prototype classifier" reduces both data storage (no need to keep all exemplars) and classification time (no need to compute distances to every training example).

To generalize to neural network representations, a feature extractor $f_\phi(\cdot)$ is trained or fixed, and all $x_i \in \mathbb{R}^d$ 0 are mapped to $x_i \in \mathbb{R}^d$ 1 before forming class means and performing inference (Huo et al., 20 Oct 2025, Cheng et al., 2018).

Probabilistic variants often use a softmax over negative distances, introducing a temperature $x_i \in \mathbb{R}^d$ 2:

$x_i \in \mathbb{R}^d$ 3

2. Incremental and Continual Learning: Exemplar-based and Exemplar-free

NME classifiers are foundational in class-incremental and online continual learning because they simplify the integration of new classes and mitigate catastrophic forgetting. Two primary strategies exist:

Exemplar-based NME: For each class $x_i \in \mathbb{R}^d$ 4, a memory buffer $x_i \in \mathbb{R}^d$ 5 of up to $x_i \in \mathbb{R}^d$ 6 exemplars is maintained. The prototype $x_i \in \mathbb{R}^d$ 7 is the mean of feature embeddings from $x_i \in \mathbb{R}^d$ 8. When new classes arrive, exemplars closest to the new class mean are selected (e.g., herding), and, if budget is exceeded, those farthest from the mean are discarded. During continual learning, exemplars for previously seen classes are replayed, and the NME rule is applied at inference (Ren et al., 2020).
Exemplar-free NME: Incremental learning is performed without storing any training exemplars. Instead, per-class running means and counts $x_i \in \mathbb{R}^d$ 9 are maintained using an online mean update:

$y_i \in \{1, \dots, K\}$ 0

This model is robust and outperforms exemplar-based methods for small memory budgets ( $y_i \in \{1, \dots, K\}$ 1 images), with competitive performance for much larger budgets. Both memory cost and update time scale as $y_i \in \{1, \dots, K\}$ 2. No raw input data are stored, addressing privacy and storage concerns (He et al., 2022).

Hybrid approaches sometimes use a combination of feature means and reduced exemplar sets, with candidate selection to address class-imbalance or bias (He et al., 2022).

3. NME in Deep Embedding and Metric Learning

NME is often used as the classifier on top of deep embedding architectures, facilitating both analytic understanding and training stability.

Anchor-based NME [Editor's term]: Fixed class prototypes ("anchors") $y_i \in \{1, \dots, K\}$ $y_{i} \in {1, \dots, K}$ 3 are predefined on the unit hypersphere and not updated during training (Hao et al., 2018). The network is optimized to map each sample to its class anchor while maximizing angular separation between anchors. The resulting loss is a softmax over negative distances to the anchors, with variants using Euclidean or cosine metrics:
- Euclidean: $y_i \in \{1, \dots, K\}$ 4
- Cosine: $y_i \in \{1, \dots, K\}$ 5
- Strong intra-class compactness and inter-class separability are enforced by the structure of anchor vectors.
Exemplar-centered MCML/Metric Collapse: Instead of prototypes being the class mean, a small set of exemplars per class are designated, and the embedding space is trained to maximize class-wise collapse onto exemplars. The loss function only requires $y_i \in \{1, \dots, K\}$ 6 computations (with $y_i \in \{1, \dots, K\}$ 7 samples and $y_i \in \{1, \dots, K\}$ 8 exemplars, $y_i \in \{1, \dots, K\}$ 9), offering a speed-up over $k$ 0-NN and pairwise methods (Min et al., 2017).

4. Application Domains

NME underpins multiple learning paradigms:

Incremental and continual learning: Enables rapid class extension and efficient inference without retraining the entire model; critical for edge or privacy-constrained systems (He et al., 2022, Ren et al., 2020, Cheng et al., 2018).
Open-set recognition: Facilitates detecting samples not belonging to any known class by low maximum class probability or divergence between feature- and logit-based distributions (Huo et al., 20 Oct 2025).
Zero-shot learning: Predicts prototypes for unseen classes in embedding space by regressing from semantic to visual features, then classifies novel instances by nearest prototype (Changpinyo et al., 2016).
Dimensionality reduction and visualization: Results in compact and interpretable representations for classification and exploration (Min et al., 2017).

5. Empirical Results and Performance Considerations

The efficacy of NME classifiers is substantiated across domains:

Benchmarks: On MNIST, CIFAR-10, and CIFAR-100, anchor-based NME achieves error rates rivaling softmax- and margin-based losses, often improving by 1–2% on challenging datasets (Hao et al., 2018).
Incremental Learning: Exemplar-free NME outperforms all exemplar-based replay baselines for small replay budgets and narrows the gap to the offline upper-bound for large class sets, e.g., Split CIFAR-100 and Food-1k (He et al., 2022).
Open-set Recognition: Achieves AUROC of 93.41 and 95.35 on wildlife datasets with a post-hoc NME probability that does not require retraining (Huo et al., 20 Oct 2025).
Zero-shot Learning: Nearest-mean-of-exemplars approach scales to thousands of unseen classes (e.g., on ImageNet) with practical accuracy and efficiency, using regressed class prototypes (Changpinyo et al., 2016).
Computational Complexity: At test time, cost is $k$ 1, where $k$ 2 is the number of classes and $k$ 3 is the feature dimension. This offers substantial speedups over kNN and pairwise metric methods, especially when the number of prototypes per class is reduced (Min et al., 2017).

6. Theoretical and Practical Limitations

Unimodal Assumption: All NME approaches model classes as single clusters; performance degrades on highly multi-modal class distributions or classes with complex structure (Cheng et al., 2018).
Feature Drift: Incrementally learned prototypes may become suboptimal as feature extractors are updated, potentially requiring periodic recomputation or advanced rehearsal strategies (Ren et al., 2020).
Scalability: For large numbers of classes, the linear growth in prototype storage and distance computations may become a bottleneck, partly alleviated by candidate selection or hierarchical structures (Huo et al., 20 Oct 2025).
Adaptation and Generalization: Fixed-feature or anchor-based models may underperform on genuinely novel or outlier classes if the embedding does not provide sufficient class separation (He et al., 2022, Hao et al., 2018).

7. Extensions and Research Directions

Recent and emerging directions include:

Margin-based and prototype-separation losses: Adding explicit inter-class margin terms to further enforce separation between class means in embedding space (Hao et al., 2018, Ren et al., 2020).
Cosine-based distances: Empirically, cosine similarity sometimes outperforms Euclidean metrics in high-dimensional spaces, especially for normalized features (Hao et al., 2018, Ren et al., 2020).
Generative Replay: To compress prototype memory, generative models may be used to synthesize exemplars for old classes (Ren et al., 2020).
Agreement-driven open-set detection: Combining NME-based probabilities with logits-based softmax for robust open-set recognition (Huo et al., 20 Oct 2025).
Zero-shot extrapolation: Employing kernel regression to predict unseen class means from semantic embeddings, leading to rapid extension to unseen categories (Changpinyo et al., 2016).

The nearest-mean-of-exemplars methodology thus constitutes a unifying principle underpinning efficient, scalable, and incrementally extensible classification systems across deep learning, continual learning, metric learning, and beyond.