EMNIST: an extension of MNIST to handwritten letters

Published 17 Feb 2017 in cs.CV | (1702.05373v2)

Abstract: The MNIST dataset has become a standard benchmark for learning, classification and computer vision systems. Contributing to its widespread adoption are the understandable and intuitive nature of the task, its relatively small size and storage requirements and the accessibility and ease-of-use of the database itself. The MNIST database was derived from a larger dataset known as the NIST Special Database 19 which contains digits, uppercase and lowercase handwritten letters. This paper introduces a variant of the full NIST dataset, which we have called Extended MNIST (EMNIST), which follows the same conversion paradigm used to create the MNIST dataset. The result is a set of datasets that constitute a more challenging classification tasks involving letters and digits, and that shares the same image structure and parameters as the original MNIST task, allowing for direct compatibility with all existing classifiers and systems. Benchmark results are presented along with a validation of the conversion process through the comparison of the classification results on converted NIST digits and the MNIST digits.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (697)

View on Semantic Scholar

Summary

The paper extends MNIST with comprehensive handwritten letters and digits, providing a renewed benchmark for classifier evaluation.
It details a robust conversion process from NIST Special Database 19 to create dataset splits that maintain compatibility with MNIST-based systems.
Benchmark tests demonstrate competitive accuracies—up to 97.22% for digits and promising results for letters—highlighting practical improvements for modern ML models.

Overview of "EMNIST: an extension of MNIST to handwritten letters"

The paper "EMNIST: an extension of MNIST to handwritten letters" authored by Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andre van Schaik, addresses the need for a more rigorous and comprehensive benchmark dataset for machine learning and computer vision tasks. Building on the foundational MNIST dataset, the Extended MNIST (EMNIST) dataset introduces a set of benchmarks that encompasses both digits and handwritten letters. This dataset follows the same format and structure as MNIST, thereby ensuring compatibility with existing classifiers and systems designed for MNIST.

Context and Motivation

The MNIST dataset, a subset of the NIST Special Database 19, has been a staple in the machine learning community for handwritten digit classification since its introduction in 1998. However, due to its longstanding widespread use and the high accuracies reported by contemporary models (often exceeding 99.7%), MNIST has transitioned from a challenging benchmark to a routine validation task. The rapid advancements in neural networks and deep learning necessitate more complex datasets to robustly evaluate new algorithms.

Datasets and Conversion Process

The EMNIST dataset was derived from the NIST Special Database 19, which includes not only handwritten digits but also uppercase and lowercase characters. The conversion process closely mirrored the steps used for creating the original MNIST dataset, ensuring a seamless drop-in replacement for MNIST-compatible systems. The primary steps included Gaussian filtering, extraction and centering of characters, and down-sampling to a 28x28 pixel resolution. This conversion preserved the structural characteristics of the images, facilitating straightforward integration into existing frameworks.

Dataset Splits and Composition

EMNIST comprises six subdivisions:

EMNIST By_Class: Contains all 814,255 samples across 62 classes (digits 0-9, uppercase A-Z, and lowercase a-z).
EMNIST By_Merge: A 47-class dataset that merges similar uppercase and lowercase letters.
EMNIST Balanced: A 47-class subset ensuring an even number of samples per class.
EMNIST Digits: A purely digit-based subset with balanced class samples.
EMNIST Letters: Merges uppercase and lowercase letters into a single 26-class dataset.
EMNIST MNIST: Specifically matches the composition and format of the original MNIST dataset for direct comparison.

These splits allow for a broad spectrum of classification tasks, from balanced letter recognition to mixed-case letter differentiation, providing a comprehensive evaluation toolkit for various learning models.

Benchmark Results

The performance of classifiers, particularly an OPIUM-based classifier, was assessed on the EMNIST datasets. The classifiers used linear and pseudo-inverse update techniques to establish baseline performances.

EMNIST Balanced: Achieved a maximum mean accuracy of 78.02% with 10,000 hidden layer neurons.
EMNIST By_Merge: Reached 80.87% accuracy, outperforming the By_Class dataset due to reduced ambiguity between merged classes.
EMNIST Letters: Showcased an impressive 85.15% accuracy, with confusion primarily between visually similar characters.
EMNIST Digits: Attained a commendable 97.22% accuracy, illustrating the efficacy of the dataset conversion and further validation against the original MNIST, which achieved up to 97.50%.

Implications and Future Work

The introduction of EMNIST offers a more demanding benchmark for modern machine learning algorithms, including deep learning architectures. Its diverse classification scenarios and larger sample sizes present opportunities for more nuanced model evaluation and can drive the development of more sophisticated and generalized learning systems.

Future work could explore even deeper neural architectures and other advanced learning methodologies on EMNIST, further pushing the boundaries of what is achievable in handwritten character and digit recognition. Moreover, the consideration of new data hierarchies, such as the By_Author split, could open up additional avenues for research in writer identification and other complex classification tasks.

Conclusion

The EMNIST dataset stands as a significant extension of the MNIST dataset, addressing the need for more challenging and varied benchmarks in the evaluation of contemporary machine learning systems. It retains the accessible and standardized structure of MNIST while introducing a broader scope of classes and more rigorous classification challenges. This ensures that EMNIST not only builds on the legacy of MNIST but also provides a forward-looking benchmark that aligns with the current trajectory of advancements in machine learning and neural networks.

Markdown Report Issue