Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification (2001.01536v3)

Published 6 Jan 2020 in cs.CV, cs.LG, and stat.ML

Abstract: In real-world scenarios, data tends to exhibit a long-tailed distribution, which increases the difficulty of training deep networks. In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that networks trained on less imbalanced subsets of the distribution often yield better performances than their jointly-trained counterparts. We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. Specifically, the proposed framework involves two levels of adaptive learning schedules: Self-paced Expert Selection and Curriculum Instance Selection, so that the knowledge is adaptively transferred to the 'Student'. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Liuyu Xiang (18 papers)
  2. Guiguang Ding (79 papers)
  3. Jungong Han (111 papers)
Citations (260)

Summary

Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification

This paper addresses a significant challenge in deep learning, particularly in dealing with long-tailed distributions typical in real-world datasets. The authors propose a novel framework termed Learning From Multiple Experts (LFME) to improve classification performance in these contexts. The LFME framework builds on the observation that neural networks trained on less imbalanced subsets demonstrate enhanced performance compared to those trained on the full distribution. This observation leads to the central hypothesis that knowledge distilled from models trained on such subsets can construct a more robust unified model.

The proposed method employs a two-level adaptive learning framework, entailing Self-paced Expert Selection and Curriculum Instance Selection, to facilitate the effective transfer of knowledge from individually-trained expert models to a unified student model. The paper introduces the concept of "Experts," models trained on subsets of data that are less subject to distribution imbalance. These subsets, referred to as cardinality-adjacent subsets, are formed by segmenting the entirety of the long-tailed dataset. The subset segmentation utilizes four metrics to quantify and reduce "longtailness," allowing better model training conditions than when handling the entire distribution. The unified student model is then refined through dual adaptive scheduling: model-level knowledge distillation and instance-level curriculum learning.

The model-level self-paced expert selection uniquely moderates the impact of knowledge transfer from each expert, dynamically adjusting the distillation process based on the student's validation performance. By doing this, it ensures that the student model does not just mimic the experts but ideally surpasses their individual performances. At the instance level, data is sorted and presented in increasing difficulty, building an effective curriculum that mitigates the challenges when tackling harder classification tasks.

The experiments conducted on benchmark long-tailed datasets ImageNet-LT, Places-LT, and CIFAR100-LT indicate that LFME achieves excellent performance, surpassing several state-of-the-art methods. Notably, it enhances many-shot class performance while maintaining competitiveness across medium and few-shot classifications. Furthermore, the LFME framework extends seamlessly to integrate with pre-existing state-of-the-art methods, offering further improvement potential without significant overhead.

The implications of this research are noteworthy both in theoretical and practical terms. Theoretically, it introduces a paradigm shift in knowledge distillation strategies, advocating for selective, performance-aware distillation over straightforward mimicking. Practically, this approach effectively bridges the disparity across classes in long-tailed data settings, which has broad applicability across various domains where balanced data collection remains a challenge.

Future research could explore extending the principles of LFME to other domains and architectures, improving expert model selection criteria, and further optimizing student model learning schedules to accommodate more diverse datasets and tasks. The insights from this framework promise to enhance model robustness and adaptability, crucial factors for broader AI system deployment in real-world applications.