Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distribution Alignment: A Unified Framework for Long-tail Visual Recognition (2103.16370v1)

Published 30 Mar 2021 in cs.CV, cs.AI, and cs.LG

Abstract: Despite the recent success of deep neural networks, it remains challenging to effectively model the long-tail class distribution in visual recognition tasks. To address this problem, we first investigate the performance bottleneck of the two-stage learning framework via ablative study. Motivated by our discovery, we propose a unified distribution alignment strategy for long-tail visual recognition. Specifically, we develop an adaptive calibration function that enables us to adjust the classification scores for each data point. We then introduce a generalized re-weight method in the two-stage learning to balance the class prior, which provides a flexible and unified solution to diverse scenarios in visual recognition tasks. We validate our method by extensive experiments on four tasks, including image classification, semantic segmentation, object detection, and instance segmentation. Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework. The code and models will be made publicly available at: https://github.com/Megvii-BaseDetection/DisAlign

Citations (265)

Summary

  • The paper's main contribution is a two-stage framework that aligns class distributions to improve long-tail visual recognition.
  • It leverages an adaptive calibration function and generalized re-weighting to boost accuracy for under-represented tail classes.
  • Evaluations on ImageNet-LT, iNaturalist 2018, ADE20k, and LVIS demonstrate state-of-the-art results with reduced hyper-parameter tuning.

Summary of "Distribution Alignment: A Unified Framework for Long-tail Visual Recognition"

This paper addresses the prevalent challenge of long-tail class distribution in visual recognition tasks. While deep neural networks have demonstrated significant advancements in computer vision, they struggle with datasets characterized by long-tail distributions, where a few classes have many examples (head classes), and many classes have few examples (tail classes). To tackle this problem, the authors propose a unified framework based on a distribution alignment strategy that effectively calibrates classification scores.

The proposed approach is centered around a two-stage learning framework. The first stage focuses on learning a robust representation from imbalanced data using an instance-balanced sampling strategy. The second stage involves calibration to align the prediction distribution with a reference distribution that compensates for the imbalanced class frequencies. This alignment is achieved using an adaptive calibration function and a generalized re-weighting strategy. The adaptive calibration function modifies each class's scores with learnable parameters dependent on input features, while the generalized re-weighting exploits class frequency priors to adjust predictions favorably for under-represented categories.

The performance of this method was evaluated on several standard benchmarks: image classification on ImageNet-LT, iNaturalist 2018, and Places-LT; semantic segmentation on the ADE20k dataset; and both object detection and instance segmentation on the LVIS dataset. Across all these tasks, the proposed framework consistently surpassed prior work, achieving state-of-the-art results. Notably, this was accomplished using a simple and unified framework without extending the feature extraction phase beyond traditional deep learning architectures.

Key findings include a significant improvement in classification accuracy for under-represented tail classes while maintaining performance on head classes. This balance is crucial for real-world applications where large variations in data distributions are common, such as in biodiversity studies or large-scale scene recognition.

The results also highlighted the practical implications of the research. Firstly, the generalized re-weighting strategy can be easily adapted to diverse visual recognition tasks, simplifying the process of training on long-tail datasets. Secondly, the proposed distribution alignment approach can effectively reduce the efforts required for hyper-parameter tuning, which is frequently a bottleneck in optimizing deep learning models for varied data distributions.

In conclusion, the paper presents a robust and unified methodological advancement for long-tail learning in vision tasks through distribution alignment. It lays the groundwork for future research in AI by demonstrating an effective strategy to handle the natural imbalance in visual categories. Future work could further integrate this framework with emerging architectures or explore its applicability to other domains such as natural language processing, where class imbalance is also prevalent.