Libra R-CNN: Towards Balanced Learning for Object Detection (1904.02701v1)

Published 4 Apr 2019 in cs.CV

Abstract: Compared with model architectures, the training process, which is also crucial to the success of detectors, has received relatively less attention in object detection. In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level. To mitigate the adverse effects caused thereby, we propose Libra R-CNN, a simple but effective framework towards balanced learning for object detection. It integrates three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, respectively for reducing the imbalance at sample, feature, and objective level. Benefitted from the overall balanced design, Libra R-CNN significantly improves the detection performance. Without bells and whistles, it achieves 2.5 points and 2.0 points higher Average Precision (AP) than FPN Faster R-CNN and RetinaNet respectively on MSCOCO.

Authors (6)

Jiangmiao Pang (77 papers)
Kai Chen (512 papers)
Jianping Shi (76 papers)
Huajun Feng (18 papers)
Wanli Ouyang (358 papers)
Dahua Lin (336 papers)

Citations (1,221)

View on Semantic Scholar

Summary

An Overview of Libra R-CNN: Towards Balanced Learning for Object Detection

The paper "Libra R-CNN: Towards Balanced Learning for Object Detection" presents an innovative approach to addressing key imbalances in the object detection training process. The research revisits the standard training protocols for object detectors and identifies three levels of imbalance: sample level, feature level, and objective level. The authors introduce Libra R-CNN, a framework designed to mitigate these imbalances using three novel components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss. This framework has demonstrated notable improvements in object detection performance on the MS COCO dataset.

Key Contributions

IoU-Balanced Sampling: IoU-balanced sampling is introduced to handle sample-level imbalance by increasing the probability of selecting hard negative samples based on their IoU with ground-truth annotations. This approach rectifies the tendency of random sampling to be dominated by easy samples, which contribute less to the optimization process. The IoU-balanced sampling method ensures a more representative set of training samples without incurring additional computational costs associated with conventional hard mining techniques like OHEM.
Balanced Feature Pyramid: The research identifies that conventional feature pyramids are sequentially constructed, often leading to over-focus on adjacent resolutions while diluting information from non-adjacent levels. The balanced feature pyramid method proposed in this paper integrates low-level and high-level features more effectively by rescaling and averaging them, followed by a refinement step using techniques such as convolution operations or non-local attention modules. This enhances the information flow across all levels, leading to more discriminative features and improved detection performance.
Balanced L1 Loss: To address the imbalance at the objective level, the authors develop a balanced L1 loss function. This loss function is designed to promote gradients from inliers (accurate samples) while controlling the influence of outliers (hard samples with large errors). The balanced L1 loss achieves this by using hyperparameters that amplify gradients from inputs close to zero, ensuring a more robust and stable training process that better balances the contributions of classification and localization tasks.

Experimental Results

The proposed Libra R-CNN framework was rigorously evaluated on the MS COCO dataset. The methodology demonstrated significant performance improvements over baseline models such as FPN Faster R-CNN and RetinaNet. For example:

Using a ResNet-50 backbone, Libra R-CNN achieved a 2.5-point improvement in Average Precision (AP) over FPN Faster R-CNN.
With a more powerful backbone like ResNeXt-101, Libra R-CNN reached an AP of 43.0.
The framework’s balanced design is further supported by consistent improvements across small, medium, and large object scales.

The ablation studies provided in the paper further validate the effectiveness of each component:

IoU-balanced sampling increased AP by 0.9 points.
Balanced feature pyramid contributed an additional 0.9 points.
Balanced L1 loss added another 0.8 points to the overall AP.

Implications and Future Directions

The Libra R-CNN framework offers several theoretical and practical implications for the field of object detection:

The methodology underscores the need to address imbalances at multiple stages of the training process, highlighting that improvements are not just confined to model architecture but also heavily dependent on balanced training practices.
It suggests that existing object detection frameworks can be markedly enhanced without significant changes to model design, but by adopting better sampling, feature integration, and loss balancing techniques.

Looking forward, future research could explore additional refinements to the balanced feature pyramid and further customize the balanced L1 loss for specific tasks or datasets. Also, more work can be done to generalize this approach to other domains within computer vision, such as semantic segmentation and instance segmentation, where similar imbalances might exist.

Conclusion

The paper successfully introduces and validates the Libra R-CNN framework, which significantly improves object detection by tackling imbalances at sample, feature, and objective levels. With its effective and efficient design, Libra R-CNN paves the way for more balanced and robust training practices in object detection, offering substantial performance enhancements over existing methods. This work will likely influence future research directions and practical deployments in the field of computer vision.

PDF Markdown