Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

Published 6 Jan 2020 in cs.CV, cs.LG, and stat.ML | (2001.01536v3)

Abstract: In real-world scenarios, data tends to exhibit a long-tailed distribution, which increases the difficulty of training deep networks. In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that networks trained on less imbalanced subsets of the distribution often yield better performances than their jointly-trained counterparts. We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. Specifically, the proposed framework involves two levels of adaptive learning schedules: Self-paced Expert Selection and Curriculum Instance Selection, so that the knowledge is adaptively transferred to the 'Student'. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.

Abstract PDF Upgrade to Chat

Citations (260)

View on Semantic Scholar

Summary

The paper proposes the Learning From Multiple Experts (LFME) framework, which distills knowledge from models trained on less imbalanced data subsets to improve classification on long-tailed datasets.
LFME employs a two-level adaptive strategy combining self-paced expert selection and curriculum instance selection to dynamically manage knowledge transfer and data difficulty.
Experiments demonstrate that LFME achieves state-of-the-art performance on benchmark datasets, significantly enhancing many-shot classes while remaining competitive on medium and few-shot categories.

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

This paper addresses a significant challenge in deep learning, particularly in dealing with long-tailed distributions typical in real-world datasets. The authors propose a novel framework termed Learning From Multiple Experts (LFME) to improve classification performance in these contexts. The LFME framework builds on the observation that neural networks trained on less imbalanced subsets demonstrate enhanced performance compared to those trained on the full distribution. This observation leads to the central hypothesis that knowledge distilled from models trained on such subsets can construct a more robust unified model.

The proposed method employs a two-level adaptive learning framework, entailing Self-paced Expert Selection and Curriculum Instance Selection, to facilitate the effective transfer of knowledge from individually-trained expert models to a unified student model. The paper introduces the concept of "Experts," models trained on subsets of data that are less subject to distribution imbalance. These subsets, referred to as cardinality-adjacent subsets, are formed by segmenting the entirety of the long-tailed dataset. The subset segmentation utilizes four metrics to quantify and reduce "longtailness," allowing better model training conditions than when handling the entire distribution. The unified student model is then refined through dual adaptive scheduling: model-level knowledge distillation and instance-level curriculum learning.

The model-level self-paced expert selection uniquely moderates the impact of knowledge transfer from each expert, dynamically adjusting the distillation process based on the student's validation performance. By doing this, it ensures that the student model does not just mimic the experts but ideally surpasses their individual performances. At the instance level, data is sorted and presented in increasing difficulty, building an effective curriculum that mitigates the challenges when tackling harder classification tasks.

The experiments conducted on benchmark long-tailed datasets ImageNet-LT, Places-LT, and CIFAR100-LT indicate that LFME achieves excellent performance, surpassing several state-of-the-art methods. Notably, it enhances many-shot class performance while maintaining competitiveness across medium and few-shot classifications. Furthermore, the LFME framework extends seamlessly to integrate with pre-existing state-of-the-art methods, offering further improvement potential without significant overhead.

The implications of this research are noteworthy both in theoretical and practical terms. Theoretically, it introduces a paradigm shift in knowledge distillation strategies, advocating for selective, performance-aware distillation over straightforward mimicking. Practically, this approach effectively bridges the disparity across classes in long-tailed data settings, which has broad applicability across various domains where balanced data collection remains a challenge.

Future research could explore extending the principles of LFME to other domains and architectures, improving expert model selection criteria, and further optimizing student model learning schedules to accommodate more diverse datasets and tasks. The insights from this framework promise to enhance model robustness and adaptability, crucial factors for broader AI system deployment in real-world applications.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

We haven't generated follow-up questions for this paper yet.

Generate Now

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

Summary

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

Summary

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research