The Extreme Value Machine (1506.06112v4)

Published 19 Jun 2015 in cs.LG

Abstract: It is often desirable to be able to recognize when inputs to a recognition function learned in a supervised manner correspond to classes unseen at training time. With this ability, new class labels could be assigned to these inputs by a human operator, allowing them to be incorporated into the recognition function --- ideally under an efficient incremental update mechanism. While good algorithms that assume inputs from a fixed set of classes exist, e.g., artificial neural networks and kernel machines, it is not immediately obvious how to extend them to perform incremental learning in the presence of unknown query classes. Existing algorithms take little to no distributional information into account when learning recognition functions and lack a strong theoretical foundation. We address this gap by formulating a novel, theoretically sound classifier --- the Extreme Value Machine (EVM). The EVM has a well-grounded interpretation derived from statistical Extreme Value Theory (EVT), and is the first classifier to be able to perform nonlinear kernel-free variable bandwidth incremental learning. Compared to other classifiers in the same deep network derived feature space, the EVM is accurate and efficient on an established benchmark partition of the ImageNet dataset.

Citations (269)

View on Semantic Scholar

Summary

The paper presents a novel classifier that leverages Extreme Value Theory to model margin distributions for both known and unknown classes.
The paper employs a greedy set cover approach to reduce model size by retaining key extreme vectors, enhancing scalability and efficiency.
The paper demonstrates incremental learning by updating models with new classes without costly retraining, achieving competitive accuracy on benchmarks.

The Extreme Value Machine: A Novel Classifier for Open Set and Incremental Learning

The paper "The Extreme Value Machine" by Rudd et al. introduces a theoretically grounded approach to tackle the challenges of open set recognition and incremental learning through the application of Extreme Value Theory (EVT). While conventional classifiers like neural networks and kernel machines are effective for fixed class problems, they fall short when handling scenarios with unknown classes during query time. The Extreme Value Machine (EVM) is designed to address these limitations by providing a robust way to handle novel classes and enable efficient model updates.

Overview of the Extreme Value Machine

EVM leverages EVT to form a classifier capable of non-linear, kernel-free, incremental learning. The fundamental proposition of EVM is to model the distribution of extreme values, i.e., the margins between known classes and novel inputs, using EVT. This approach allows the classifier to effectively determine whether a new input belongs to a known class or should be considered as part of an unknown class set. The EVM is distinguished by its ability to perform incremental updates upon encountering new classes without resorting to computationally expensive retraining processes.

Methodological Contributions

The key methodological contribution of this work lies in its novel use of EVT to model the probability of sample inclusion (PSI) concerning class boundaries. By deriving PSI from EVT, the EVM constructs decision boundaries that reflect a theoretically justified distributional form, i.e., the Weibull distribution. These boundaries define the likelihood of a data sample being associated with a known class, facilitating the recognition of novel inputs when they fall outside these bounded regions.

Margin Distribution Modeling: The EVM calculates margin distributions for each class using EVT, allowing it to efficiently recognize new classes by determining the tails of these distributions, a process inherently suitable for open set tasks.
Model Reduction via Set Cover: To ensure computational efficiency and scalability, the EVM uses a greedy approximation approach to minimize the model's size by retaining only significant 'extreme vectors' that summarize the decision boundaries.
Incremental Learning Capability: The design of EVM inherently supports adding new data batches to update the model without discarding prior knowledge, thus making it well-suited for real-world tasks where data continuously evolves.

Numerical Results

The paper provides empirical evidence from benchmark tests, demonstrating the EVM's advantage over previously established algorithms. On the open set recognition task using the OLETTER dataset, the EVM achieves comparable F1-scores to the current state-of-the-art while maintaining a significantly reduced model size. Similarly, in open world recognition experiments using ImageNet partitions, EVM outperforms the Nearest Non-Outlier (NNO) algorithm across all tested conditions with improved accuracy and scalability as more classes are incrementally added.

Theoretical and Practical Implications

From a theoretical standpoint, the application of EVT in EVM offers a mathematically rigorous foundation for dealing with open space risk. This aligns with the probabilistic nature of real-world data distributions and provides a means to incorporate uncertainty in class boundary definitions systematically. Practically, the reduced computational overhead and improved model compactness suggest that EVM is suitable for deployment in resource-constrained environments or applications that require real-time updates, such as robotics or autonomous systems.

Future Developments

Potential avenues for further research include the refinement of model reduction techniques to balance precision and resource constraints better and the extension of EVM to incorporate more sophisticated soft-margin strategies. Additionally, exploring hybrid models that integrate EVT-based boundaries with other machine learning paradigms could be promising for enhancing flexibility and accuracy.

In conclusion, the Extreme Value Machine represents a significant stride in the development of classifiers capable of open set recognition and incremental learning, with solid grounding in statistical theory offering both practical utility and theoretical insights in AI and machine learning.