Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification
This paper addresses a significant challenge in deep learning, particularly in dealing with long-tailed distributions typical in real-world datasets. The authors propose a novel framework termed Learning From Multiple Experts (LFME) to improve classification performance in these contexts. The LFME framework builds on the observation that neural networks trained on less imbalanced subsets demonstrate enhanced performance compared to those trained on the full distribution. This observation leads to the central hypothesis that knowledge distilled from models trained on such subsets can construct a more robust unified model.
The proposed method employs a two-level adaptive learning framework, entailing Self-paced Expert Selection and Curriculum Instance Selection, to facilitate the effective transfer of knowledge from individually-trained expert models to a unified student model. The paper introduces the concept of "Experts," models trained on subsets of data that are less subject to distribution imbalance. These subsets, referred to as cardinality-adjacent subsets, are formed by segmenting the entirety of the long-tailed dataset. The subset segmentation utilizes four metrics to quantify and reduce "longtailness," allowing better model training conditions than when handling the entire distribution. The unified student model is then refined through dual adaptive scheduling: model-level knowledge distillation and instance-level curriculum learning.
The model-level self-paced expert selection uniquely moderates the impact of knowledge transfer from each expert, dynamically adjusting the distillation process based on the student's validation performance. By doing this, it ensures that the student model does not just mimic the experts but ideally surpasses their individual performances. At the instance level, data is sorted and presented in increasing difficulty, building an effective curriculum that mitigates the challenges when tackling harder classification tasks.
The experiments conducted on benchmark long-tailed datasets ImageNet-LT, Places-LT, and CIFAR100-LT indicate that LFME achieves excellent performance, surpassing several state-of-the-art methods. Notably, it enhances many-shot class performance while maintaining competitiveness across medium and few-shot classifications. Furthermore, the LFME framework extends seamlessly to integrate with pre-existing state-of-the-art methods, offering further improvement potential without significant overhead.
The implications of this research are noteworthy both in theoretical and practical terms. Theoretically, it introduces a paradigm shift in knowledge distillation strategies, advocating for selective, performance-aware distillation over straightforward mimicking. Practically, this approach effectively bridges the disparity across classes in long-tailed data settings, which has broad applicability across various domains where balanced data collection remains a challenge.
Future research could explore extending the principles of LFME to other domains and architectures, improving expert model selection criteria, and further optimizing student model learning schedules to accommodate more diverse datasets and tasks. The insights from this framework promise to enhance model robustness and adaptability, crucial factors for broader AI system deployment in real-world applications.