The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition (2210.05566v1)

Published 11 Oct 2022 in cs.CV

Abstract: Long-tail distribution is widely spread in real-world applications. Due to the extremely small ratio of instances, tail categories often show inferior accuracy. In this paper, we find such performance bottleneck is mainly caused by the imbalanced gradients, which can be categorized into two parts: (1) positive part, deriving from the samples of the same category, and (2) negative part, contributed by other categories. Based on comprehensive experiments, it is also observed that the gradient ratio of accumulated positives to negatives is a good indicator to measure how balanced a category is trained. Inspired by this, we come up with a gradient-driven training mechanism to tackle the long-tail problem: re-balancing the positive/negative gradients dynamically according to current accumulative gradients, with a unified goal of achieving balance gradient ratios. Taking advantage of the simple and flexible gradient mechanism, we introduce a new family of gradient-driven loss functions, namely equalization losses. We conduct extensive experiments on a wide spectrum of visual tasks, including two-stage/single-stage long-tailed object detection (LVIS), long-tailed image classification (ImageNet-LT, Places-LT, iNaturalist), and long-tailed semantic segmentation (ADE20K). Our method consistently outperforms the baseline models, demonstrating the effectiveness and generalization ability of the proposed equalization losses. Codes will be released at https://github.com/ModelTC/United-Perception.

Citations (6)

View on Semantic Scholar

Summary

The paper presents equalization losses that adjust gradient imbalances during training to tackle long-tailed object recognition challenges.
It details three loss functions—Sigmoid-EQL, Softmax-EQL, and Equalized Focal Loss—each tailored for different visual recognition tasks.
Experimental results demonstrate significant performance improvements on datasets like LVIS and ImageNet-LT, validating the effectiveness of the gradient-driven approach.

The Equalization Losses: Gradient-Driven Training for Long-tailed Object Recognition

This paper addresses the challenge of long-tailed distribution in object recognition, presenting a novel approach rooted in the manipulation of gradient imbalances. The authors propose a new family of loss functions termed "equalization losses," designed to dynamically re-balance the gradients during training, ultimately improving the recognition of tail categories.

Key Contributions

The primary contribution of the paper is the development of gradient-driven loss functions that address the disparity in positive and negative gradients during the training phase. This approach is implemented through several loss functions:

Sigmoid Equalization Loss (Sigmoid-EQL): It introduces a gradient-driven re-weighting mechanism that adjusts the positive and negative gradients independently within the binary cross-entropy framework. This loss function is adept for tasks involving independent binary classification like object detection.
Softmax Equalization Loss (Softmax-EQL): Specifically developed for image classification tasks necessitating cross-category rank preservation, this loss dynamically applies gradient-driven margin calibration to the softmax function.
Equalized Focal Loss (EFL): Extending the conventional focal loss, EFL incorporates a gradient-driven modulating mechanism with focusing and weighting factors. This is particularly beneficial for single-stage object detectors dealing with severe foreground-background imbalance.

Experimental Evaluation

The paper provides an extensive evaluation across various datasets and tasks, demonstrating the versatility and effectiveness of the proposed loss functions:

Two-Stage Long-tailed Object Detection: On the challenging LVIS dataset, Sigmoid-EQL consistently outperformed benchmarks, significantly improving AP, especially for rare categories, by efficiently adjusting the gradient imbalance without additional training time.
Single-Stage Long-tailed Object Detection: EFL was shown to cater to the unique challenges of one-stage detectors by focusing more on rare categories' samples. It displayed superior performance over other methods, including more complex training strategies, while maintaining simplicity and computational efficiency.
Long-Tailed Image Classification: Softmax-EQL surpassed existing state-of-the-art results on datasets like ImageNet-LT and iNaturalist2018, emphasizing the robustness of gradient-based margin calibration over frequency-based approaches.
Long-Tailed Semantic Segmentation: The equalization losses demonstrated strong improvement on the ADE20K-LT and ADE20K datasets. Both Sigmoid-EQL and Softmax-EQL were effective in these dense prediction tasks, hinting at the broad applicability of gradient-driven methods.

Implications and Future Directions

The approach highlighted in this paper not only advances the state of the art in long-tailed object recognition but also opens new paradigms for training models in imbalanced scenarios. By leveraging gradient statistics rather than frequency-based adjustments, the proposed method provides a more dynamic and adaptable framework, potentially applicable to other imbalance issues in machine learning. Future work might explore further integration of gradient-driven approaches with other training techniques and investigate applications beyond visual tasks. The paper serves as a critical reference point for the development of loss functions aimed at tackling imbalance challenges across diverse domains.

PDF Markdown

Related Papers

GitHub

GitHub - ModelTC/United-Perception: United Perception (426 stars)