Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 84 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 96 tok/s Pro
GPT OSS 120B 462 tok/s Pro
Kimi K2 189 tok/s Pro
2000 character limit reached

ImageNet-21K Pretraining for the Masses (2104.10972v4)

Published 22 Apr 2021 in cs.CV and cs.LG

Abstract: ImageNet-1K serves as the primary dataset for pretraining deep learning models for computer vision tasks. ImageNet-21K dataset, which is bigger and more diverse, is used less frequently for pretraining, mainly due to its complexity, low accessibility, and underestimation of its added value. This paper aims to close this gap, and make high-quality efficient pretraining on ImageNet-21K available for everyone. Via a dedicated preprocessing stage, utilization of WordNet hierarchical structure, and a novel training scheme called semantic softmax, we show that various models significantly benefit from ImageNet-21K pretraining on numerous datasets and tasks, including small mobile-oriented models. We also show that we outperform previous ImageNet-21K pretraining schemes for prominent new models like ViT and Mixer. Our proposed pretraining pipeline is efficient, accessible, and leads to SoTA reproducible results, from a publicly available dataset. The training code and pretrained models are available at: https://github.com/Alibaba-MIIL/ImageNet21K

Citations (607)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces a novel semantic softmax scheme that leverages hierarchical label structures to improve pretraining efficiency.
  • The authors design a comprehensive pipeline that cleans, standardizes, and optimizes the large ImageNet-21K dataset for broad accessibility.
  • Experimental results show that the proposed approach outperforms traditional ImageNet-1K pretraining, benefiting both large-scale and mobile-oriented models.

Analyzing ImageNet-21K Pretraining for Broad Accessibility

The paper "ImageNet-21K Pretraining for the Masses" addresses a significant gap in the application and accessibility of the ImageNet-21K dataset for pretraining in computer vision tasks. Traditionally, ImageNet-1K has been the default dataset for pretraining deep learning models due to its size, simplicity, and standardized structure. However, ImageNet-21K offers a much larger and more diverse set of classes, which can potentially enhance model performance across various tasks.

Key Contributions

The authors introduce a comprehensive and efficient pipeline for pretraining on the ImageNet-21K dataset, aiming to make this resource more accessible to researchers and practitioners. The pipeline involves:

  1. Dataset Preparation: The preprocessing includes cleaning invalid classes, forming a standardized train-validation split, and resizing images to reduce the dataset's memory footprint.
  2. Utilizing Semantic Structures: By leveraging the WordNet semantic tree, the authors transform ImageNet-21K into a multi-label dataset. However, they observe that the straightforward multi-label training does not outperform single-label approaches due to optimization issues like extreme imbalancing.
  3. Semantic Softmax Training: Introducing a novel "semantic softmax" scheme, the authors take advantage of hierarchical label structures. This method involves multiple softmax layers corresponding to different levels of label hierarchies, avoiding extreme multi-tasking challenges in regular multi-label approaches.
  4. Semantic Knowledge Distillation: To further improve pretraining quality, the paper integrates semantic softmax with a knowledge distillation framework. This allows non-conventional labels to be predicted more accurately by considering hierarchical consistencies.

Experimental Study

The authors provide extensive empirical validation, showing that semantic softmax pretraining consistently outperforms standard ImageNet-1K pretraining across a wide range of downstream tasks, including image classification, multi-label classification, and video recognition. The paper also demonstrates the scalability and efficiency of their pipeline by successfully pretraining both large models such as TResNet-L and mobile-oriented models like MobileNetV3, suggesting widespread applicability.

Implications and Future Directions

The research has several practical implications:

  • Enhanced Model Performance: The use of ImageNet-21K with the proposed pipeline significantly boosts performance across various computer vision models and tasks, even benefiting smaller, mobile-optimized models.
  • Accessible Pretraining: By offering a streamlined and efficient method for using the ImageNet-21K dataset, the paper democratizes access to rich pretraining resources that previously required significant computational power and resources.
  • Framework Generalizability: While this work focuses on ImageNet-21K, the principles and methodologies could be extrapolated to other large-scale datasets, fostering enhanced model pretraining strategies across different domains.

For future work, the integration of semantic approaches and hierarchical structures in model training presents a rich area for exploration. Further research could explore optimal ways to combine these strategies with other advanced training techniques for maximized efficiency and accuracy.

In conclusion, this paper provides a substantial contribution to the understanding and application of large-scale datasets in neural network pretraining. By effectively harnessing the complex structures within ImageNet-21K, the work opens up new possibilities for efficient, high-quality model development in the field of computer vision.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.