Disentangling Label Distribution for Long-tailed Visual Recognition (2012.00321v2)

Published 1 Dec 2020 in cs.CV and cs.LG

Abstract: The current evaluation protocol of long-tailed visual recognition trains the classification model on the long-tailed source label distribution and evaluates its performance on the uniform target label distribution. Such protocol has questionable practicality since the target may also be long-tailed. Therefore, we formulate long-tailed visual recognition as a label shift problem where the target and source label distributions are different. One of the significant hurdles in dealing with the label shift problem is the entanglement between the source label distribution and the model prediction. In this paper, we focus on disentangling the source label distribution from the model prediction. We first introduce a simple but overlooked baseline method that matches the target label distribution by post-processing the model prediction trained by the cross-entropy loss and the Softmax function. Although this method surpasses state-of-the-art methods on benchmark datasets, it can be further improved by directly disentangling the source label distribution from the model prediction in the training phase. Thus, we propose a novel method, LAbel distribution DisEntangling (LADE) loss based on the optimal bound of Donsker-Varadhan representation. LADE achieves state-of-the-art performance on benchmark datasets such as CIFAR-100-LT, Places-LT, ImageNet-LT, and iNaturalist 2018. Moreover, LADE outperforms existing methods on various shifted target label distributions, showing the general adaptability of our proposed method.

Authors (6)

Youngkyu Hong (3 papers)
Seungju Han (33 papers)
Kwanghee Choi (27 papers)
Seokjun Seo (8 papers)
Beomsu Kim (28 papers)
Buru Chang (21 papers)

Citations (212)

View on Semantic Scholar

Summary

The paper introduces a novel framework for long-tailed visual recognition by framing it as a label shift problem.
It proposes a baseline method (PC Softmax) and a new LADE loss that outperform existing approaches on major benchmarks.
The approach enhances model adaptability to diverse, imbalanced data distributions, paving the way for realistic evaluations and future research.

Disentangling Label Distribution for Long-tailed Visual Recognition

The paper "Disentangling Label Distribution for Long-tailed Visual Recognition" addresses the significant challenge of long-tailed label distributions in visual recognition tasks. Recognizing that the standard evaluation protocol—training on a long-tailed source distribution and evaluating on a uniform target distribution—lacks practicality, this paper frames long-tailed recognition as a label shift problem, where the source and target distributions differ.

Core Contributions

New Problem Formulation: The paper introduces a novel perspective by treating long-tailed visual recognition as a label shift problem, moving away from the traditional assumption of uniform target distributions. This reconceptualization acknowledges that real-world data exhibits diverse distribution patterns that models must adapt to.
Baseline Method - PC Softmax: A straightforward yet compelling baseline, Post-Compensated (PC) Softmax, is proposed. By post-processing model predictions to align with the target distribution, PC Softmax provides a simple adaptation mechanism, outperforming existing state-of-the-art methods on benchmark datasets.
Novel LADE Loss: The pivotal contribution is the LAbel distribution DisEntangling (LADE) loss function, designed to disentangle source label distribution from model predictions during training. Based on Donsker-Varadhan representation, LADE directly and effectively resolves the adverse entanglement, achieving superior performance across several benchmarks like CIFAR-100-LT, Places-LT, ImageNet-LT, and iNaturalist 2018.

Key Results

The LADE method consistently outperformed existing approaches across all evaluations, showcasing its adaptability to various shifted target label distributions. Particularly, LADE delivered optimal accuracies on the benchmark datasets, proving robust against both small-scale and large-scale datasets. These results underscore the flexibility and generalizability of LADE over classic uniform distribution assumptions.

Implications and Future Directions

The implications of this work are two-fold. Practically, the methods outlined provide a potent framework for developing models that perform robustly under long-tailed conditions, a common characteristic of real-world data. Theoretically, the disentanglement strategy opens new avenues for exploring label shifts and adapting models dynamically based on distributional shifts.

Future research might explore extensions to other domains, such as object detection and segmentation, where data imbalance remains a critical challenge. Additionally, integrating this methodology with generative approaches could further enhance adaptability to unseen data distributions.

Conclusion

Through innovative approaches like PC Softmax and LADE, the paper effectively tackles the complexities of long-tailed label distributions in visual recognition tasks. This advance paves the way for more realistic and practical model evaluations, aligning better with the distributional characteristics exhibited by natural data. The theoretical grounding in Donsker-Varadhan representation reinforces the rigor of these methods, promising significant advancements in AI's ability to handle diverse and imbalanced datasets.

PDF Markdown