Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies (2403.07113v1)
Abstract: Object detection, a pivotal task in computer vision, is frequently hindered by dataset imbalances, particularly the under-explored issue of foreground-foreground class imbalance. This lack of attention to foreground-foreground class imbalance becomes even more pronounced in the context of single-stage detectors. This study introduces a benchmarking framework utilizing the YOLOv5 single-stage detector to address the problem of foreground-foreground class imbalance. We crafted a novel 10-class long-tailed dataset from the COCO dataset, termed COCO-ZIPF, tailored to reflect common real-world detection scenarios with a limited number of object classes. Against this backdrop, we scrutinized three established techniques: sampling, loss weighing, and data augmentation. Our comparative analysis reveals that sampling and loss reweighing methods, while shown to be beneficial in two-stage detector settings, do not translate as effectively in improving YOLOv5's performance on the COCO-ZIPF dataset. On the other hand, data augmentation methods, specifically mosaic and mixup, significantly enhance the model's mean Average Precision (mAP), by introducing more variability and complexity into the training data. (Code available: https://github.com/craston/object_detection_cib)
- Lukas Biewald. Experiment tracking with weights and biases, 2020. Software available from wandb.com.
- Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
- Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018.
- Prime sample attention in object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11583–11591, 2020.
- Towards accurate one-stage object detection with ap-loss. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5119–5127, 2019.
- William Falcon and The PyTorch Lightning team. PyTorch Lightning, March 2019.
- Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5356–5364, 2019.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE transactions on pattern analysis and machine intelligence, 37(9):1904–1916, 2015.
- Glenn Jocher. Yolov5 by ultralytics, 2020.
- Equalized focal loss for dense long-tailed object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6990–6999, 2022.
- Gradient harmonized single-stage detector. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 8577–8584, 2019.
- Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
- Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988, 2017.
- Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8759–8768, 2018.
- Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016.
- Fiftyone. GitHub. Note: https://github.com/voxel51/fiftyone, 2020.
- Generating positive bounding boxes for balanced training of object detectors. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 894–903, 2020.
- Factors in finetuning deep model for object detection with long-tail distribution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 864–873, 2016.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Dr loss: Improving object detection by distributional ranking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12164–12172, 2020.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
- Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
- Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 658–666, 2019.
- Relay backpropagation for effective learning of deep convolutional neural networks. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14, pages 467–482. Springer, 2016.
- Training region-based object detectors with online hard example mining. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 761–769, 2016.
- Long-tailed classification by keeping the good and removing the bad momentum causal effect. Advances in Neural Information Processing Systems, 33:1513–1524, 2020.
- Cspnet: A new backbone that can enhance learning capability of cnn. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 390–391, 2020.
- Seesaw loss for long-tailed instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9695–9704, 2021.
- The devil is in classification: A simple framework for long-tail instance segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 728–744. Springer, 2020.
- Omry Yadan. Hydra - a framework for elegantly configuring complex applications. Github, 2019.
- Fasa: Feature augmentation and sampling adaptation for long-tailed instance segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3457–3466, 2021.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Freeanchor: Learning to match anchors for visual object detection. Advances in neural information processing systems, 32, 2019.
- Distance-iou loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12993–13000, 2020.