Enhancing Object Detection Performance for Small Objects through Synthetic Data Generation and Proportional Class-Balancing Technique: A Comparative Study in Industrial Scenarios (2401.12729v2)
Abstract: Object Detection (OD) has proven to be a significant computer vision method in extracting localized class information and has multiple applications in the industry. Although many of the state-of-the-art (SOTA) OD models perform well on medium and large sized objects, they seem to under perform on small objects. In most of the industrial use cases, it is difficult to collect and annotate data for small objects, as it is time-consuming and prone to human errors. Additionally, those datasets are likely to be unbalanced and often result in an inefficient model convergence. To tackle this challenge, this study presents a novel approach that injects additional data points to improve the performance of the OD models. Using synthetic data generation, the difficulties in data collection and annotations for small object data points can be minimized and to create a dataset with balanced distribution. This paper discusses the effects of a simple proportional class-balancing technique, to enable better anchor matching of the OD models. A comparison was carried out on the performances of the SOTA OD models: YOLOv5, YOLOv7 and SSD, for combinations of real and synthetic datasets within an industrial use case.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, eds.), vol. 25, Curran Associates, Inc., 2012.
- P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv:1312.6229, 2013.
- R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” CoRR, vol. abs/1311.2524, 2013.
- R. Girshick, “Fast r-cnn,” in 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448, 2015.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137–1149, 2017.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788, 2016.
- “Papers with code - coco test-dev benchmark (object detection).”
- X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick, “Microsoft COCO captions: Data collection and evaluation server,” CoRR, vol. abs/1504.00325, 2015.
- A. Bochkovskiy, C. Wang, and H. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” CoRR, vol. abs/2004.10934, 2020.
- S. Borkman, A. Crespi, S. Dhakad, S. Ganguly, J. Hogins, Y.-C. Jhang, M. Kamalzadeh, B. Li, S. Leal, P. Parisi, et al., “Unity perception: Generate synthetic data for computer vision,” arXiv preprint arXiv:2107.04259, 2021.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37, Springer, 2016.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
- T. D’Angelo, M. Mendes, B. Keller, R. Ferreira, S. Delabrida, R. Rabelo, H. Azpurua, and A. Bianchi, “Deep learning-based object detection for digital inspection in the mining industry,” in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 633–640, 2019.
- R. Usamentiaga, D. G. Lema, O. D. Pedrayes, and D. F. Garcia, “Automated surface defect detection in metals: A comparative review of object detection and semantic segmentation using deep learning,” IEEE Transactions on Industry Applications, vol. 58, no. 3, pp. 4203–4213, 2022.
- K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang, R. Wang, X. Wang, and W. Ouyang, “T-cnn: Tubelets with convolutional neural networks for object detection from videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 10, pp. 2896–2907, 2018.
- J. Dai, K. He, and J. Sun, “Instance-aware semantic segmentation via multi-task network cascades,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3150–3158, 2016.
- S. Herath, M. Harandi, and F. Porikli, “Going deeper into action recognition: A survey,” Image and Vision Computing, vol. 60, pp. 4–21, 2017. Regularization Techniques for High-Dimensional Data Analysis.
- M. Menikdiwela, C. Nguyen, H. Li, and M. Shaw, “Cnn-based small object detection and visualization with feature activation mapping,” in 2017 International Conference on Image and Vision Computing New Zealand (IVCNZ), pp. 1–5, 2017.
- A. Benjumea, I. Teeti, F. Cuzzolin, and A. Bradley, “Yolo-z: Improving small object detection in yolov5 for autonomous vehicles,” 2023.
- W. Zhang, S. Wang, S. Thachan, J. Chen, and Y. Qian, “Deconv r-cnn for small object detection on remote sensing images,” in IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 2483–2486, 2018.
- X. Zhou, X. Xu, W. Liang, Z. Zeng, S. Shimizu, L. T. Yang, and Q. Jin, “Intelligent small object detection for digital twin in smart manufacturing with industrial cyber-physical systems,” IEEE Transactions on Industrial Informatics, vol. 18, no. 2, pp. 1377–1386, 2022.
- X. Zheng, S. Zheng, Y. Kong, and J. Chen, “Recent advances in surface defect inspection of industrial products using deep learning techniques,” The International Journal of Advanced Manufacturing Technology, vol. 113, 03 2021.
- F. López de la Rosa, J. L. Gómez-Sirvent, R. Sánchez-Reolid, R. Morales, and A. Fernández-Caballero, “Geometric transformation-based data augmentation on defect classification of segmented images of semiconductor materials using a resnet50 convolutional neural network,” Expert Systems with Applications, vol. 206, p. 117731, 2022.
- J. Wang and S. Lee, “Data augmentation methods applying grayscale images for convolutional neural networks in machine vision,” Applied Sciences, vol. 11, no. 15, 2021.
- D. H. Martins, A. A. de Lima, M. F. Pinto, D. d. O. Hemerly, T. d. M. Prego, L. Tarrataca, U. A. Monteiro, R. H. Gutiérrez, D. B. Haddad, et al., “Hybrid data augmentation method for combined failure recognition in rotating machines,” Journal of Intelligent Manufacturing, pp. 1–19, 2022.
- J. Salamon and J. P. Bello, “Deep convolutional neural networks and data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, 2017.
- O’Reilly Media, Inc., 2020.
- A. Figueira and B. Vaz, “Survey on synthetic data generation, evaluation methods and gans,” Mathematics, vol. 10, no. 15, p. 2733, 2022.
- P.-A. Andersen, T. Aune, and D. Hagen, “Development of a novel object detection system based on synthetic data generated from unreal game engine,” Applied Sciences, vol. 12, 08 2022.
- D. P. Kingma and M. Welling, “An introduction to variational autoencoders,” Foundations and Trends® in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, (Red Hook, NY, USA), Curran Associates Inc., 2020.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems (Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, eds.), vol. 27, Curran Associates, Inc., 2014.
- T. Karras, M. Aittala, S. Laine, E. Härkönen, J. Hellsten, J. Lehtinen, and T. Aila, “Alias-free generative adversarial networks,” 2021.
- R. Sagues-Tanco, L. Benages-Pardo, G. Lopez-Nicolas, and S. Llorente, “Fast synthetic dataset for kitchen object segmentation in deep learning,” IEEE Access, vol. 8, pp. 220496–220506, 01 2020.
- P. S. Rajpura, H. Bojinov, and R. S. Hegde, “Object detection using deep cnns trained on synthetic images,” arXiv preprint arXiv:1706.06782, 2017.
- B. Kiefer, D. Ott, and A. Zell, “Leveraging synthetic data in object detection on unmanned aerial vehicles,” in 2022 26th International Conference on Pattern Recognition (ICPR), pp. 3564–3571, 2022.
- J. Josifovski, M. Kerzel, C. Pregizer, L. Posniak, and S. Wermter, “Object detection and pose estimation based on convolutional neural networks trained with synthetic data,” in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 6269–6276, 2018.
- B. Planche, Z. Wu, K. Ma, S. Sun, S. Kluckner, O. Lehmann, T. Chen, A. Hutter, S. Zakharov, H. Kosch, et al., “Depthsynth: Real-time realistic synthetic data generation from cad models for 2.5 d recognition,” in 2017 International conference on 3d vision (3DV), pp. 1–10, IEEE, 2017.
- Cham: Springer International Publishing, 2021.
- J. Tremblay, A. Prakash, D. Acuna, M. Brophy, V. Jampani, C. Anil, T. To, E. Cameracci, S. Boochoon, and S. Birchfield, “Training deep networks with synthetic data: Bridging the reality gap by domain randomization,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 969–977, 2018.
- K. Man and J. Chahl, “A review of synthetic image data and its use in computer vision,” Journal of Imaging, vol. 8, no. 11, 2022.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” 2016.
- C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475, 2023.
- T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, C. L. Zitnick, and P. Dollár, “Microsoft coco: Common objects in context,” 2015.