Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 84 tok/s

Gemini 2.5 Pro 37 tok/s Pro

GPT-5 Medium 18 tok/s Pro

GPT-5 High 15 tok/s Pro

GPT-4o 86 tok/s Pro

GPT OSS 120B 468 tok/s Pro

Kimi K2 229 tok/s Pro

2000 character limit reached

Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results (2204.03475v2)

Published 7 Apr 2022 in cs.CV and cs.LG

Abstract: ImageNet serves as the primary dataset for evaluating the quality of computer-vision models. The common practice today is training each architecture with a tailor-made scheme, designed and tuned by an expert. In this paper, we present a unified scheme for training any backbone on ImageNet. The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks. It requires no adjustments or hyper-parameters tuning between different models, and is efficient in terms of training times. We test USI on a wide variety of architectures, including CNNs, Transformers, Mobile-oriented and MLP-only. On all models tested, USI outperforms previous state-of-the-art results. Hence, we are able to transform training on ImageNet from an expert-oriented task to an automatic seamless routine. Since USI accepts any backbone and trains it to top results, it also enables to perform methodical comparisons, and identify the most efficient backbones along the speed-accuracy Pareto curve. Implementation is available at:https://github.com/Alibaba-MIIL/Solving_ImageNet

Citations (11)

View on Semantic Scholar

Collections

Summary

The paper presents USI, a unified training scheme that integrates vanilla knowledge distillation to achieve state-of-the-art ImageNet performance across various backbones.
It details a robust methodology that employs AdamW optimization with techniques like Mixup and Cutmix, simplifying the tuning process for diverse neural architectures.
Results show significant accuracy improvements on models such as ResNet50 and LeViT-384, demonstrating the scalability and efficiency of the unified approach.

Overview of "Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results"

Introduction

The paper presents a novel approach for training image classification models on the ImageNet dataset. Traditionally, training distinct neural architectures on ImageNet necessitates custom-tailored strategies that demand extensive expertise and parameter tuning. This paper introduces a unified training scheme, termed USI (Unified Scheme for ImageNet), that levies knowledge distillation, alongside contemporary optimization techniques, to address these challenges. The principal innovation lies in its ability to train diverse architectural backbones, from CNNs to Transformers, under a singular configuration, outperforming existing bespoke solutions.

Methodology

USI's efficacy emerges from the use of vanilla knowledge distillation (KD) in its training regimen. The KD process leverages a teacher model to enrich the training data with nuanced predictions that ground-truth labels lack. These predictions encapsulate inter-class correlations and provide superior supervisory signals that improve classification robustness. Remarkably, USI obviates the need for extensive tuning by delivering consistent, state-of-the-art results across a multiplicity of model architectures.

The paper outlines the detailed algorithmic framework, specifying hyperparameters, optimization strategies, and augmentation techniques that collectively constitute the USI methodology. It circumvents the traditional model-specific methodologies by employing a consistent training recipe that integrates KD with AdamW optimizations and advanced augmentations like Mixup and Cutmix.

Results

USI is benchmarked against an array of models spanning CNNs, Transformers, Mobile-oriented architectures, and MLP-only networks. For all configurations tested, USI not only simplifies the training pipeline but also achieves accuracy that surpasses or matches the best-reported results for those models. Key findings include:

USI achieves 81.0% top-1 accuracy on ResNet50 (prior best was 80.4%).
For LeViT-384, USI attains 82.7%, advancing beyond its previously optimized configuration.
On the broader spectrum of models, USI maintains its superiority by integrating the KD process without needing source data pretraining or transferring ImageNet knowledge from larger datasets.

Insights and Implications

The USI methodology embodies a significant stride towards democratizing model training on large-scale datasets like ImageNet. By eliminating model-specific tuning, it makes high-performance training accessible and practical, even in resource-constrained settings. Moreover, USI facilitates a fair and methodical comparison of backbones, leveraging the speed-accuracy Pareto frontier to determine optimal architectural choices relative to computational assets available.

This research infers that knowledge distillation, when correctly harnessed, presents a robust mechanism for optimizing learning dynamics in deep networks. It implies potential scalability and applicability beyond ImageNet, whereby KD may underpin adaptive schemes in other AI domains.

Future Directions

While USI is primarily validated on ImageNet, extrapolating its principles to other datasets, particularly those requiring transfer learning, could further cement its versatility. Future investigations might explore scaling USI's KD framework to unsupervised or semi-supervised domains, reinforcing its efficacy as a broader AI training paradigm.

The introduction of USI henceforth advocates for a paradigm shift in model training: from painstaking, bespoke strategy formulation to streamline, uniform procedures that serve heterogeneous model landscapes with unrivaled efficiency and effectiveness.