One-Class Classification: Taxonomy of Study and Review of Techniques (1312.0049v1)

Published 30 Nov 2013 in cs.LG and cs.AI

Abstract: One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.

Citations (543)

View on Semantic Scholar

Summary

The paper presents a comprehensive taxonomy for OCC, categorizing approaches based on training data availability, algorithms, and application domains.
It evaluates methods including OSVMs, neural networks, and ensemble techniques while addressing unique performance measurement challenges with limited negative data.
The paper highlights practical applications like intrusion detection, fault diagnosis, and credit scoring, and suggests future research into cost-sensitive and ensemble learning.

Summary of "One-Class Classification: Taxonomy of Study and Review of Techniques"

This paper presents a comprehensive paper on the domain of One-Class Classification (OCC), an area of machine learning aimed at developing classification models based on the limited availability or the complete absence of negative class samples. OCC is frequently applied in domains such as outlier detection, novelty detection, and concept learning, where classification boundaries need to be defined using only the positive class data.

Introduction to One-Class Classification

OCC differs from the conventional multi-class classification paradigm by focusing on well-characterized positive class instances while the negative class is either absent or not well-represented. This paper discusses the complexity of OCC, highlighting its practical implications in scenarios like fault detection in machines and disease diagnostics, where negative samples might be rare or expensive to obtain.

Comparison with Multi-Class Classification

OCC is more challenging than traditional multi-class classification due to its reliance on defining boundaries using limited data. The paper revisits the foundational work by previous researchers who variously labeled this theme as "single-class classification" or "novelty detection." The authors note the inherent complexity within OCC tasks, such as error estimation and dimensionality issues that present unique challenges distinct from multi-class classification.

Measuring Performance

The paper surveys techniques for measuring the performance of OCC algorithms, explaining that traditional performance metrics often cannot be directly applied. It highlights the necessity of estimating new metrics or using limited outlier data as proxies to evaluate OCC classifiers adequately.

Several existing reviews are outlined to show the evolution and breadth of paper within OCC. This includes methods tailored to specific applications, like mobile masquerader detection and credit scoring.

Proposed Taxonomy

A novel taxonomy for OCC is proposed by the authors, primarily categorizing research based on the availability of training data, methodology used, and application domains. These categories illuminate the diverse approaches and applications across fields, from OSVM-based models to non-SVM classifiers employing neural networks and decision trees.

Availability of Training Data: Methods range from those using only positive examples to those employing a mix of positive, unlabeled, and sparsely-sampled negative examples.
Algorithms Used: The paper examines One-class Support Vector Machines (OSVMs) and alternative methods, like ensemble techniques, neural networks, and decision trees.
Application Domains: OCC application areas are shown to be diverse, including text classification, intrusion detection, and spectrum analysis.

Algorithms and Techniques

The paper provides an extensive analysis of various OCC algorithms, examining the development of OSVMs and other classifier methodologies. It identifies gaps in knowledge and offers insights into less-explored areas, such as innovative kernel functions and enhanced ensemble methods.

Applications and Future Directions

The authors address various OCC applications across fields such as text classification and information retrieval, proposing it as a robust tool in handling sparse data scenarios and novelty detection. They suggest that future research should focus on ensemble methods, cost-sensitive learning, and innovative kernel usage. Additionally, they advocate for open-source OCC resources for benchmark comparisons.

Conclusion

The paper acknowledges that while OCC has matured significantly, there remain multiple open research challenges, including scaling, handling outliers, and enhancing model robustness. The authors suggest that future research endeavors can leverage the detailed taxonomy and survey provided to foster novel advancements in OCC.

This work underscores the complexity yet potential of OCC methodologies across various applications, providing a detailed roadmap for researchers to engage with unresolved questions and foster innovative solutions in the field.

PDF Markdown