Large-Scale Long-Tailed Recognition in an Open World (1904.05160v2)

Published 10 Apr 2019 in cs.CV and cs.LG

Abstract: Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at https://liuziwei7.github.io/projects/LongTail.html.

Authors (6)

Ziwei Liu (368 papers)
Zhongqi Miao (8 papers)
Xiaohang Zhan (27 papers)
Jiayun Wang (21 papers)
Boqing Gong (100 papers)
Stella X. Yu (65 papers)

Citations (1,075)

View on Semantic Scholar

Summary

Large-Scale Long-Tailed Recognition in an Open World: Summary and Implications

The paper "Large-Scale Long-Tailed Recognition in an Open World" addresses the challenge of effectively recognizing and classifying objects in datasets that exhibit long-tailed distribution and open-endedness. This task is particularly representative of real-world visual data, where few classes have abundant samples while many other classes are underrepresented, and numerous unseen classes exist. The work introduces the notion of Open Long-Tailed Recognition (OLTR), which amalgamates imbalanced classification, few-shot learning, and open-set recognition into a unified framework.

Key Contributions

Formal Definition of OLTR: The authors define OLTR as the problem of learning from naturally distributed data that reflects a long-tail distribution and is open-ended. They propose evaluating the classification accuracy over a balanced test set which includes head, tail, and open classes.
Dynamic Meta-Embedding: The cornerstone of the proposed OLTR algorithm is the concept of dynamic meta-embedding. This approach synthesizes a direct image feature with an associated memory feature, enabling the system to adaptively leverage prior knowledge and enhance recognition robustness, especially for underrepresented classes. Memory features are grounded in discriminative centroids derived from training data, facilitating knowledge transfer between head and tail classes while employing a reachability-based confidence calibration to handle open-set instances effectively.
Modulated Attention: The technique of modulated attention functions by applying spatial attention selectively on self-attention maps, thereby maintaining the discrimination between head and tail classes. This strategy improves spatial feature selection, effectively enhancing classification without sacrificing head class accuracy.
Comprehensive Benchmarking: The paper introduces three extensive OLTR datasets: ImageNet-LT, Places-LT, and MS1M-LT, created to reflect real-world long-tail distributions in object-centric, scene-centric, and face-centric domains, respectively. Benchmarks are established for proper evaluation, showcasing the superior performance of the proposed method compared to state-of-the-art techniques across these diverse datasets.

Numerical Results

ImageNet-LT: The proposed method reaches an overall classification accuracy of 35.6% in a closed-set setting and maintains a superior performance across many-shot, medium-shot, and few-shot classes. In open-set recognition, the method achieves an F-measure of 0.474, significantly outperforming prior methods.
Places-LT: With an overall closed-set accuracy of 35.9% and an F-measure of 0.464, the dynamic meta-embedding approach demonstrates its efficacy in scene-centric contexts as well.
MS1M-LT: The method exhibits robust performance across different face recognition tasks, including many-shot, few-shot, one-shot, and zero-shot identifications, with tangible improvements in identification rates over comparable approaches.

Implications of the Research

The proposed method's ability to balance the treatment of many-shot, medium-shot, and few-shot classes, while also effectively recognizing unseen classes, positions it as a significant step forward in visual recognition tasks. Practically, the method's robustness to real-world complexities makes it directly transferable to applications such as object detection in dynamic environments, face recognition in social networks, and scene analysis in autonomous driving.

Theoretical Implications: The dynamic meta-embedding framework integrates principles from metric learning and meta-learning, enriching the theoretical landscape of machine learning. This hybrid approach offers a new perspective on how to efficiently use memory mechanisms and spatial attention to improve both closed-world and open-world classification tasks.

Future Directions

Future developments could explore enhancing the feature disentanglement capabilities of the dynamic meta-embedding approach. Furthermore, expanding the application of this framework to other domains such as speech recognition or natural language processing could validate its versatility and effectiveness further. Extending the research to address fairness issues, particularly in datasets with sensitive attributes, represents another promising direction.

Conclusion

This paper provides a thorough exploration and innovative solution to the challenges posed by long-tail and open-ended visual recognition tasks. By introducing the OLTR framework and demonstrating its efficiency through dynamic meta-embedding and modulated attention, it lays down a robust foundation for further research and practical advancements in the field of machine learning and computer vision.

PDF Markdown

Related Papers

GitHub

Large-Scale Long-Tailed Recognition in an Open World