Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks (2009.08453v2)

Published 17 Sep 2020 in cs.CV, cs.AI, and cs.LG

Abstract: We introduce a simple yet effective distillation framework that is able to boost the vanilla ResNet-50 to 80%+ Top-1 accuracy on ImageNet without tricks. We construct such a framework through analyzing the problems in the existing classification system and simplify the base method ensemble knowledge distillation via discriminators by: (1) adopting the similarity loss and discriminator only on the final outputs and (2) using the average of softmax probabilities from all teacher ensembles as the stronger supervision. Intriguingly, three novel perspectives are presented for distillation: (1) weight decay can be weakened or even completely removed since the soft label also has a regularization effect; (2) using a good initialization for students is critical; and (3) one-hot/hard label is not necessary in the distillation process if the weights are well initialized. We show that such a straight-forward framework can achieve state-of-the-art results without involving any commonly-used techniques, such as architecture modification; outside training data beyond ImageNet; autoaug/randaug; cosine learning rate; mixup/cutmix training; label smoothing; etc. Our method obtains 80.67% top-1 accuracy on ImageNet using a single crop-size of 224x224 with vanilla ResNet-50, outperforming the previous state-of-the-arts by a significant margin under the same network structure. Our result can be regarded as a strong baseline using knowledge distillation, and to our best knowledge, this is also the first method that is able to boost vanilla ResNet-50 to surpass 80% on ImageNet without architecture modification or additional training data. On smaller ResNet-18, our distillation framework consistently improves from 69.76% to 73.19%, which shows tremendous practical values in real-world applications. Our code and models are available at: https://github.com/szq0214/MEAL-V2.

Citations (61)

Summary

  • The paper introduces a streamlined knowledge distillation method that boosts ResNet-50 to over 80% Top-1 accuracy on ImageNet without extra training tricks.
  • It leverages similarity loss and averaged soft labels from teacher ensembles to simplify the distillation process while significantly improving performance.
  • The approach emphasizes the importance of proper initialization and weight decay adjustments, providing practical insights for optimizing standard architectures.

Overview of MEAL V2: Enhancing ResNet-50's Accuracy on ImageNet

The paper, titled "MEAL V2: Boosting Vanilla ResNet-50 to 80\%+ Top-1 Accuracy on ImageNet without Tricks," presents a streamlined approach to knowledge distillation that significantly enhances the performance of standard architectures on large-scale datasets such as ImageNet. By leveraging a teacher-student paradigm, the authors aim to increase ResNet-50's efficiency without resorting to popular and intricate training techniques.

Key Contributions

The authors introduce a simplified distillation framework that pushes ResNet-50's Top-1 accuracy beyond 80\% on ImageNet. This advancement is achieved through several strategic choices:

  1. Simplicity and Effectiveness: The framework simplifies ensemble knowledge distillation by applying similarity loss and a discriminator solely to final outputs. It also uses averaged softmax probabilities from teacher ensembles for robust supervision. The method abstains from common enhancements such as architecture modifications or additional training data.
  2. Novel Distillation Perspectives:
    • Weight Decay: Suggest weakening or removing weight decay due to the regularization effect of the soft label.
    • Initialization: Highlighting the importance of a well-initialized student.
    • Soft Labels: Demonstrating the non-necessity of one-hot labels if student weights are appropriately initialized.
  3. Strong Numerical Results:
    • The framework achieves 80.67\% Top-1 accuracy on a singular 224×224 crop, surpassing prior state-of-the-art methods by a noteworthy margin without architectural alterations.
    • On ResNet-18, the framework improves performance from 69.76\% to 73.19\%.

Implications

Theoretical Implications: The proposed method offers insights into the role of knowledge distillation in enhancing neural network performance, addressing traditional training pitfalls like semantically similar classes and the limitations imposed by fixed one-hot labels.

Practical Implications: By refining vanilla architectures without increasing complexity, this method is applicable in real-world scenarios where computational resources are constrained, and model simplicity is preferred.

Future Developments in AI

Given the promising results of MEAL V2, future work could explore:

  • Extending Framework Capabilities: Investigating the framework's adaptability to other architectures and application domains.
  • Further Optimization: Exploring other regularization techniques that complement or replace existing components.
  • Scalability: Evaluating the performance of the proposed framework on even larger datasets and more complex classification tasks.

In conclusion, MEAL V2 offers a robust and efficient approach to model improvement through knowledge distillation, removing the necessity for complex tricks. This contribution not only advances the state-of-the-art but also reinforces the potential of knowledge distillation in achieving superior performance on constrained architectures, paving the way for its application in diverse AI systems.

Github Logo Streamline Icon: https://streamlinehq.com