Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning (1901.01703v7)

Published 7 Jan 2019 in cs.CV

Abstract: In existing visual representation learning tasks, deep convolutional neural networks (CNNs) are often trained on images annotated with single tags, such as ImageNet. However, a single tag cannot describe all important contents of one image, and some useful visual information may be wasted during training. In this work, we propose to train CNNs from images annotated with multiple tags, to enhance the quality of visual representation of the trained CNN model. To this end, we build a large-scale multi-label image database with 18M images and 11K categories, dubbed Tencent ML-Images. We efficiently train the ResNet-101 model with multi-label outputs on Tencent ML-Images, taking 90 hours for 60 epochs, based on a large-scale distributed deep learning framework,i.e.,TFplus. The good quality of the visual representation of the Tencent ML-Images checkpoint is verified through three transfer learning tasks, including single-label image classification on ImageNet and Caltech-256, object detection on PASCAL VOC 2007, and semantic segmentation on PASCAL VOC 2012. The Tencent ML-Images database, the checkpoints of ResNet-101, and all the training codehave been released at https://github.com/Tencent/tencent-ml-images. It is expected to promote other vision tasks in the research and industry community.

Citations (85)

View on Semantic Scholar

Summary

The paper introduces a massive multi-label image database with 18M images and 11K classes, offering richer annotations than traditional single-label datasets.
It details an efficient distributed training framework using ResNet-101 and a novel loss function to mitigate class imbalance.
Comprehensive evaluations on classification, detection, and segmentation tasks demonstrate significant improvements in visual representation learning.

Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning

The paper presents an extensive exploration of large-scale multi-label visual representation learning, focusing on the development and utilization of the Tencent ML-Images database. This substantial dataset comprises approximately 18 million images spanning over 11,000 categories.

Motivation and Contributions

Visual representation learning using CNNs has predominantly relied on datasets annotated with single tags, such as ImageNet. However, real-world images often contain multiple objects, and single-label annotations can lead to the omission of valuable information. To address this, the Tencent ML-Images database is introduced, emphasizing multi-label annotations to enhance visual representation quality.

Key contributions include:

Database Construction: Tencent ML-Images amalgamates images from existing sources like ImageNet and Open Images. The integration involved merging class vocabularies, removing redundancies, and supplementing annotations using semantic hierarchy and class co-occurrence data.
Efficient Training Framework: The ResNet-101 model was trained using this multi-label dataset through an optimized distributed deep learning framework, TFplus, which incorporates MPI and NCCL for accelerated computation.
Imbalance Mitigation: The paper introduces a novel loss function to address class imbalance issues prevalent in large-scale datasets, by strategically weighting loss components.
Comprehensive Evaluation: The model's visual representation quality was rigorously tested across several transfer learning tasks, including image classification, object detection, and semantic segmentation, using benchmark datasets like ImageNet and PASCAL VOC.

Numerical Results and Analysis

The ResNet-101 model trained on Tencent ML-Images demonstrated superior performance in transfer learning scenarios. Notably, the fine-tuned model showed significant improvements on tasks such as ImageNet classification, achieving top-1 and top-5 accuracies that surpassed benchmarks set by models pre-trained on datasets like JFT-300M.

Implications and Future Directions

The research highlights the benefits and feasibility of multi-label databases for visual learning tasks. It underscores the potential for more nuanced and comprehensive representations which can improve performance across diverse computer vision challenges. From a practical standpoint, the public release of the dataset and codebase is a significant step towards fostering future advancements in AI, enabling both academic and industrial entities to build upon this foundation.

In conclusion, the Tencent ML-Images dataset, coupled with the methodologies discussed, represents a pivotal resource in the field of visual representation learning. Future research might explore expanding class vocabularies or incorporating additional semantic layers to further enhance model performance. The dataset's impact on improving AI's interpretability and adaptability to complex visual environments is anticipated to be substantial.

Related Papers

GitHub

GitHub - Tencent/tencent-ml-images: Largest multi-label image database; ResNet-101 model; 80.73% top-1 acc on ImageNet (3,050 stars)

Tweets

https://twitter.com/L_badikho/status/1084958310892212231

https://twitter.com/jm_alexia/status/1374315612713979911

https://twitter.com/mattwallace/status/1506330313038868485

https://twitter.com/shahabks/status/1209009228490215424

https://twitter.com/phirabu/status/1138308027856171008

https://twitter.com/dtsbourg/status/1084947409430470656