Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS (2304.00501v7)

Published 2 Apr 2023 in cs.CV
A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS

Abstract: YOLO has become a central real-time object detection system for robotics, driverless cars, and video monitoring applications. We present a comprehensive analysis of YOLO's evolution, examining the innovations and contributions in each iteration from the original YOLO up to YOLOv8, YOLO-NAS, and YOLO with Transformers. We start by describing the standard metrics and postprocessing; then, we discuss the major changes in network architecture and training tricks for each model. Finally, we summarize the essential lessons from YOLO's development and provide a perspective on its future, highlighting potential research directions to enhance real-time object detection systems.

Overview of YOLO Architectures in Computer Vision

This essay provides a comprehensive overview of the "YOLO Architectures in Computer Vision" paper, focusing on the evolution from YOLOv1 to YOLOv8 and YOLO-NAS. The YOLO (You Only Look Once) framework has been pivotal in real-time object detection, securing its place in applications such as robotics, autonomous vehicles, and surveillance due to its balance between speed and accuracy.

Key Developments

  1. YOLOv1 to YOLOv3:
    • YOLOv1 introduced a novel approach, leveraging a single convolutional network for object detection without relying on sliding windows or region proposals.
    • YOLOv2 enhanced this by incorporating anchor boxes and dimension clustering to improve localization accuracy, achieving an AP of 78.6% on VOC2007.
    • YOLOv3 expanded capabilities with multi-scale predictions and a larger backbone (Darknet-53), marking a significant improvement on the COCO benchmark.
  2. YOLOv4 to YOLOv6:
    • YOLOv4 integrated "bag-of-freebies" and "bag-of-specials," optimizing training while incorporating architectural changes like CSPDarknet53 and PANet, achieving 43.5% on COCO.
    • YOLOv5, developed by Ultralytics in PyTorch, offered multiple scaled versions, further refining speed-accuracy tradeoffs.
    • Scaled-YOLOv4 introduced a scalable architecture for both cloud and embedded systems, achieving up to 56% AP on COCO.
  3. YOLOv7 to YOLOv8 and Beyond:
    • YOLOv7 optimized the architecture using E-ELAN and bag-of-freebies, maintaining state-of-the-art performance with reduced parameters.
    • YOLOv8 by Ultralytics included an anchor-free model and decoupled head, achieving an AP of 53.9% on COCO.
    • YOLO-NAS incorporated neural architecture search and hybrid quantization, delivering models tailored for real-time edge-device applications.

Innovations in Techniques

The paper details several innovations through the evolution of YOLO:

  • Transition from anchor-based to anchor-free models, optimizing both simplicity and speed while maintaining accuracy.
  • Incorporation of neural architecture search (NAS) in YOLO-NAS for automated architecture design.
  • Introduction of advanced label assignment techniques and decoupled heads in YOLOX and YOLOv8, addressing classification and regression task alignment.

Applications and Implications

The YOLO architectures have been instrumental across multiple domains:

  • Autonomous Vehicles: Facilitating rapid object recognition and decision-making.
  • Surveillance and Security: Enabling real-time monitoring with high accuracy.
  • Medical Imaging and Agriculture: Providing tools for enhanced diagnostics and precision farming.

With ongoing advancements, YOLO models are poised to enhance adaptability to hardware constraints, expand into multi-modal frameworks, and continue improving performance metrics.

Conclusion

The paper on YOLO's progression illustrates a robust trajectory of development that aligns with contemporary demands for real-time, efficient object detection solutions. The integration of cutting-edge architectures, innovative training methodologies, and broad adaptability underscores YOLO's relevance and potential in future computer vision technologies. As the framework evolves, its applications will likely broaden, encapsulating more complex tasks across diverse fields.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (138)
  1. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
  2. R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
  3. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  4. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37, Springer, 2016.
  5. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
  6. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
  7. M. Tan, R. Pang, and Q. V. Le, “Efficientdet: Scalable and efficient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790, 2020.
  8. B. Bhavya Sree, V. Yashwanth Bharadwaj, and N. Neelima, “An inter-comparative survey on state-of-the-art detectors—r-cnn, yolo, and ssd,” in Intelligent Manufacturing and Energy Sustainability: Proceedings of ICIMES 2020, pp. 475–483, Springer, 2021.
  9. T. Diwan, G. Anirudh, and J. V. Tembhurne, “Object detection using yolo: Challenges, architectural successors, datasets and applications,” multimedia Tools and Applications, vol. 82, no. 6, pp. 9243–9275, 2023.
  10. M. Hussain, “Yolo-v1 to yolo-v8, the rise of yolo and its complementary nature toward digital manufacturing and industrial defect detection,” Machines, vol. 11, no. 7, p. 677, 2023.
  11. W. Lan, J. Dang, Y. Wang, and S. Wang, “Pedestrian detection based on yolo network model,” in 2018 IEEE international conference on mechatronics and automation (ICMA), pp. 1547–1551, IEEE, 2018.
  12. W.-Y. Hsu and W.-Y. Lin, “Adaptive fusion of multi-scale yolo for pedestrian detection,” IEEE Access, vol. 9, pp. 110063–110073, 2021.
  13. A. Benjumea, I. Teeti, F. Cuzzolin, and A. Bradley, “Yolo-z: Improving small object detection in yolov5 for autonomous vehicles,” arXiv preprint arXiv:2112.11798, 2021.
  14. N. M. A. A. Dazlee, S. A. Khalil, S. Abdul-Rahman, and S. Mutalib, “Object detection for autonomous vehicles with sensor-based technology using yolo,” International Journal of Intelligent Systems and Applications in Engineering, vol. 10, no. 1, pp. 129–134, 2022.
  15. S. Liang, H. Wu, L. Zhen, Q. Hua, S. Garg, G. Kaddoum, M. M. Hassan, and K. Yu, “Edge yolo: Real-time intelligent object detection system based on edge-cloud cooperation in autonomous vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 25345–25360, 2022.
  16. Q. Li, X. Ding, X. Wang, L. Chen, J. Son, and J.-Y. Song, “Detection and identification of moving objects at busy traffic road based on yolo v4,” The Journal of the Institute of Internet, Broadcasting and Communication, vol. 21, no. 1, pp. 141–148, 2021.
  17. S. Shinde, A. Kothari, and V. Gupta, “Yolo based human action recognition and localization,” Procedia computer science, vol. 133, pp. 831–838, 2018.
  18. A. H. Ashraf, M. Imran, A. M. Qahtani, A. Alsufyani, O. Almutiry, A. Mahmood, M. Attique, and M. Habib, “Weapons detection for security and video surveillance using cnn and yolo-v5s,” CMC-Comput. Mater. Contin, vol. 70, pp. 2761–2775, 2022.
  19. Y. Zheng and H. Zhang, “Video analysis in sports by lightweight object detection network under the background of sports industry development,” Computational Intelligence and Neuroscience, vol. 2022, 2022.
  20. H. Ma, T. Celik, and H. Li, “Fer-yolo: Detection and classification based on facial expressions,” in Image and Graphics: 11th International Conference, ICIG 2021, Haikou, China, August 6–8, 2021, Proceedings, Part I 11, pp. 28–39, Springer, 2021.
  21. Y. Tian, G. Yang, Z. Wang, H. Wang, E. Li, and Z. Liang, “Apple detection during different growth stages in orchards using the improved yolo-v3 model,” Computers and electronics in agriculture, vol. 157, pp. 417–426, 2019.
  22. D. Wu, S. Lv, M. Jiang, and H. Song, “Using channel pruning-based yolo v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments,” Computers and Electronics in Agriculture, vol. 178, p. 105742, 2020.
  23. M. Lippi, N. Bonucci, R. F. Carpio, M. Contarini, S. Speranza, and A. Gasparri, “A yolo-based pest detection system for precision agriculture,” in 2021 29th Mediterranean Conference on Control and Automation (MED), pp. 342–347, IEEE, 2021.
  24. W. Yang and Z. Jiachun, “Real-time face detection based on yolo,” in 2018 1st IEEE international conference on knowledge innovation and invention (ICKII), pp. 221–224, IEEE, 2018.
  25. W. Chen, H. Huang, S. Peng, C. Zhou, and C. Zhang, “Yolo-face: a real-time face detector,” The Visual Computer, vol. 37, pp. 805–813, 2021.
  26. M. A. Al-Masni, M. A. Al-Antari, J.-M. Park, G. Gi, T.-Y. Kim, P. Rivera, E. Valarezo, M.-T. Choi, S.-M. Han, and T.-S. Kim, “Simultaneous detection and classification of breast masses in digital mammograms via a deep learning yolo-based cad system,” Computer methods and programs in biomedicine, vol. 157, pp. 85–94, 2018.
  27. Y. Nie, P. Sommella, M. O’Nils, C. Liguori, and J. Lundgren, “Automatic detection of melanoma with yolo deep convolutional neural networks,” in 2019 E-Health and Bioengineering Conference (EHB), pp. 1–4, IEEE, 2019.
  28. H. M. Ünver and E. Ayan, “Skin lesion segmentation in dermoscopic images with combination of yolo and grabcut algorithm,” Diagnostics, vol. 9, no. 3, p. 72, 2019.
  29. L. Tan, T. Huangfu, L. Wu, and W. Chen, “Comparison of retinanet, ssd, and yolo v3 for real-time pill identification,” BMC medical informatics and decision making, vol. 21, pp. 1–11, 2021.
  30. L. Cheng, J. Li, P. Duan, and M. Wang, “A small attentional yolo model for landslide detection from satellite remote sensing images,” Landslides, vol. 18, no. 8, pp. 2751–2765, 2021.
  31. M.-T. Pham, L. Courtrai, C. Friguet, S. Lefèvre, and A. Baussard, “Yolo-fine: One-stage detector of small objects under various backgrounds in remote sensing images,” Remote Sensing, vol. 12, no. 15, p. 2501, 2020.
  32. Y. Qing, W. Liu, L. Feng, and W. Gao, “Improved yolo network for free-angle remote sensing target detection,” Remote Sensing, vol. 13, no. 11, p. 2171, 2021.
  33. Z. Zakria, J. Deng, R. Kumar, M. S. Khokhar, J. Cai, and J. Kumar, “Multiscale and direction target detecting in remote sensing images via modified yolo-v4,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, pp. 1039–1048, 2022.
  34. P. Kumar, S. Narasimha Swamy, P. Kumar, G. Purohit, and K. S. Raju, “Real-time, yolo-based intelligent surveillance and monitoring system using jetson tx2,” in Data Analytics and Management: Proceedings of ICDAM, pp. 461–471, Springer, 2021.
  35. K. Bhambani, T. Jain, and K. A. Sultanpure, “Real-time face mask and social distancing violation detection system using yolo,” in 2020 IEEE Bangalore humanitarian technology conference (B-HTC), pp. 1–6, IEEE, 2020.
  36. J. Li, Z. Su, J. Geng, and Y. Yin, “Real-time detection of steel strip surface defects based on improved yolo detection network,” IFAC-PapersOnLine, vol. 51, no. 21, pp. 76–81, 2018.
  37. E. N. Ukhwah, E. M. Yuniarno, and Y. K. Suprapto, “Asphalt pavement pothole detection using deep learning method based on yolo neural network,” in 2019 International Seminar on Intelligent Technology and Its Applications (ISITIA), pp. 35–40, IEEE, 2019.
  38. Y. Du, N. Pan, Z. Xu, F. Deng, Y. Shen, and H. Kang, “Pavement distress detection and classification based on yolo network,” International Journal of Pavement Engineering, vol. 22, no. 13, pp. 1659–1672, 2021.
  39. R.-C. Chen et al., “Automatic license plate recognition via sliding-window darknet-yolo deep learning,” Image and Vision Computing, vol. 87, pp. 47–56, 2019.
  40. C. Dewi, R.-C. Chen, X. Jiang, and H. Yu, “Deep convolutional neural network for enhancing traffic sign recognition developed on yolo v4,” Multimedia Tools and Applications, vol. 81, no. 26, pp. 37821–37845, 2022.
  41. A. M. Roy, J. Bhaduri, T. Kumar, and K. Raj, “Wildect-yolo: An efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection,” Ecological Informatics, vol. 75, p. 101919, 2023.
  42. S. Kulik and A. Shtanko, “Experiments with neural net object detection system yolo on small training datasets for intelligent robotics,” in Advanced Technologies in Robotics and Intelligent Systems: Proceedings of ITR 2019, pp. 157–162, Springer, 2020.
  43. D. H. Dos Reis, D. Welfer, M. A. De Souza Leite Cuadros, and D. F. T. Gamarra, “Mobile robot navigation using an object recognition software with rgbd images and the yolo algorithm,” Applied Artificial Intelligence, vol. 33, no. 14, pp. 1290–1305, 2019.
  44. O. Sahin and S. Ozer, “Yolodrone: Improved yolo architecture for object detection in drone images,” in 2021 44th International Conference on Telecommunications and Signal Processing (TSP), pp. 361–365, IEEE, 2021.
  45. C. Chen, Z. Zheng, T. Xu, S. Guo, S. Feng, W. Yao, and Y. Lan, “Yolo-based uav technology: A review of the research and its applications,” Drones, vol. 7, no. 3, p. 190, 2023.
  46. M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” International journal of computer vision, vol. 88, no. 2, pp. 303–338, 2010.
  47. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, pp. 740–755, Springer, 2014.
  48. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
  49. A. L. Maas, A. Y. Hannun, A. Y. Ng, et al., “Rectifier nonlinearities improve neural network acoustic models,” in Proc. icml, vol. 30, p. 3, Atlanta, Georgia, USA, 2013.
  50. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
  51. M. Lin, Q. Chen, and S. Yan, “Network in network,” arXiv preprint arXiv:1312.4400, 2013.
  52. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al., “Imagenet large scale visual recognition challenge,” International journal of computer vision, vol. 115, no. 3, pp. 211–252, 2015.
  53. J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271, 2017.
  54. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
  55. I. Krasin, T. Duerig, N. Alldrin, V. Ferrari, S. Abu-El-Haija, A. Kuznetsova, H. Rom, J. Uijlings, S. Popov, A. Veit, et al., “Openimages: A public dataset for large-scale multi-label and multi-class image classification,” Dataset available from https://github. com/openimages, vol. 2, no. 3, p. 18, 2017.
  56. K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904–1916, 2015.
  57. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
  58. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
  59. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
  60. S. Liu, D. Huang, et al., “Receptive field block net for accurate and fast object detection,” in Proceedings of the European conference on computer vision (ECCV), pp. 385–400, 2018.
  61. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
  62. B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 447–456, 2015.
  63. Q. Zhao, T. Sheng, Y. Wang, Z. Tang, Y. Chen, L. Cai, and H. Ling, “M2det: A single-shot object detector based on multi-level feature pyramid network,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9259–9266, 2019.
  64. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, pp. 1026–1034, 2015.
  65. D. Misra, “Mish: A self regularized non-monotonic neural activation function,” arXiv preprint arXiv:1908.08681, vol. 4, no. 2, pp. 10–48550, 2019.
  66. N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-nms–improving object detection with one line of code,” in Proceedings of the IEEE international conference on computer vision, pp. 5561–5569, 2017.
  67. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
  68. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, pp. 6105–6114, PMLR, 2019.
  69. C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “Cspnet: A new backbone that can enhance learning capability of cnn,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp. 390–391, 2020.
  70. S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768, 2018.
  71. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
  72. G. Ghiasi, T.-Y. Lin, and Q. V. Le, “Dropblock: A regularization method for convolutional networks,” Advances in neural information processing systems, vol. 31, 2018.
  73. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  74. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
  75. M. A. Islam, S. Naha, M. Rochan, N. Bruce, and Y. Wang, “Label refinement network for coarse-to-fine semantic segmentation,” arXiv preprint arXiv:1703.00551, 2017.
  76. Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-iou loss: Faster and better learning for bounding box regression,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12993–13000, 2020.
  77. S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in International conference on machine learning, pp. 448–456, PMLR, 2015.
  78. I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” arXiv preprint arXiv:1608.03983, 2016.
  79. S. Wang, J. Zhao, N. Ta, X. Zhao, M. Xiao, and H. Wei, “A real-time deep learning forest fire monitoring algorithm based on an improved pruned+ kd model,” Journal of Real-Time Image Processing, vol. 18, no. 6, pp. 2319–2329, 2021.
  80. G. Jocher, “YOLOv5 by Ultralytics.” https://github.com/ultralytics/yolov5, 2020. Accessed: February 30, 2023.
  81. D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
  82. G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph, “Simple copy-paste is a strong data augmentation method for instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2918–2928, 2021.
  83. H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz, “mixup: Beyond empirical risk minimization,” arXiv preprint arXiv:1710.09412, 2017.
  84. A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin, “Albumentations: Fast and flexible image augmentations,” Information, vol. 11, no. 2, 2020.
  85. M. Contributors, “YOLOv5 by MMYOLO.” https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov5, 2023. Accessed: May 13, 2023.
  86. Ultralytics, “Model Structure.” https://docs.ultralytics.com/yolov5/tutorials/architecture_description/#1-model-structure, 2023. Accessed: May 14, 2023.
  87. C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Scaled-yolov4: Scaling cross stage partial network,” in Proceedings of the IEEE/cvf conference on computer vision and pattern recognition, pp. 13029–13038, 2021.
  88. X. Long, K. Deng, G. Wang, Y. Zhang, Q. Dang, Y. Gao, H. Shen, J. Ren, S. Han, E. Ding, et al., “Pp-yolo: An effective and efficient implementation of object detector,” arXiv preprint arXiv:2007.12099, 2020.
  89. C.-Y. Wang, I.-H. Yeh, and H.-Y. M. Liao, “You only learn one representation: Unified network for multiple tasks,” arXiv preprint arXiv:2105.04206, 2021.
  90. Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun, “Yolox: Exceeding yolo series in 2021,” arXiv preprint arXiv:2107.08430, 2021.
  91. H. Law and J. Deng, “Cornernet: Detecting objects as paired keypoints,” in Proceedings of the European conference on computer vision (ECCV), pp. 734–750, 2018.
  92. K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–6578, 2019.
  93. Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636, 2019.
  94. G. Song, Y. Liu, and X. Wang, “Revisiting the sibling head in object detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11563–11572, 2020.
  95. Y. Wu, Y. Chen, L. Yuan, Z. Liu, L. Wang, H. Li, and Y. Fu, “Rethinking classification and localization for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10186–10195, 2020.
  96. Z. Ge, S. Liu, Z. Li, O. Yoshie, and J. Sun, “Ota: Optimal transport assignment for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 303–312, 2021.
  97. C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, Z. Ke, Q. Li, M. Cheng, W. Nie, et al., “Yolov6: A single-stage object detection framework for industrial applications,” arXiv preprint arXiv:2209.02976, 2022.
  98. X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, and J. Sun, “Repvgg: Making vgg-style convnets great again,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13733–13742, 2021.
  99. M. Contributors, “YOLOv6 by MMYOLO.” https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov6, 2023. Accessed: May 13, 2023.
  100. C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, “Tood: Task-aligned one-stage object detection,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3490–3499, IEEE Computer Society, 2021.
  101. H. Zhang, Y. Wang, F. Dayoub, and N. Sunderhauf, “Varifocalnet: An iou-aware dense object detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523, 2021.
  102. Z. Gevorgyan, “Siou loss: More powerful learning for bounding box regression,” arXiv preprint arXiv:2205.12740, 2022.
  103. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666, 2019.
  104. X. Ding, H. Chen, X. Zhang, K. Huang, J. Han, and G. Ding, “Re-parameterizing your optimizers rather than architectures,” arXiv preprint arXiv:2205.15242, 2022.
  105. C. Shu, Y. Liu, J. Gao, Z. Yan, and C. Shen, “Channel-wise knowledge distillation for dense prediction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5311–5320, 2021.
  106. C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, “Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, 2022.
  107. M. Contributors, “YOLOv7 by MMYOLO.” https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov7, 2023. Accessed: May 13, 2023.
  108. C.-Y. Wang, H.-Y. M. Liao, and I.-H. Yeh, “Designing network design strategies through gradient path analysis,” arXiv preprint arXiv:2211.04800, 2022.
  109. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
  110. X. Xu, Y. Jiang, W. Chen, Y. Huang, Y. Zhang, and X. Sun, “Damo-yolo: A report on real-time object detection design,” arXiv preprint arXiv:2211.15444, 2022.
  111. Alibaba, “TinyNAS.” https://github.com/alibaba/lightweight-neural-architecture-search, 2023. Accessed: March 18, 2023.
  112. Z. Tan, J. Wang, X. Sun, M. Lin, H. Li, et al., “Giraffedet: A heavy-neck paradigm for object detection,” in International Conference on Learning Representations, 2021.
  113. G. Jocher, A. Chaurasia, and J. Qiu, “YOLO by Ultralytics.” https://github.com/ultralytics/ultralytics, 2023. Accessed: February 30, 2023.
  114. X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection,” Advances in Neural Information Processing Systems, vol. 33, pp. 21002–21012, 2020.
  115. M. Contributors, “YOLOv8 by MMYOLO.” https://github.com/open-mmlab/mmyolo/tree/main/configs/yolov8, 2023. Accessed: May 13, 2023.
  116. Y. Ma, D. Yu, T. Wu, and H. Wang, “Paddlepaddle: An open-source deep learning platform from industrial practice,” Frontiers of Data and Domputing, vol. 1, no. 1, pp. 105–115, 2019.
  117. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, pp. 764–773, 2017.
  118. W. Xinlong, Z. Rufeng, K. Tao, L. Lei, and S. Chunhua, “Solov2: Dynamic, faster and stronger,” in Proc. NIPS, 2020.
  119. R. Liu, J. Lehman, P. Molino, F. Petroski Such, E. Frank, A. Sergeev, and J. Yosinski, “An intriguing failing of convolutional neural networks and the coordconv solution,” Advances in neural information processing systems, vol. 31, 2018.
  120. X. Huang, X. Wang, W. Lv, X. Bai, X. Long, K. Deng, Q. Dang, S. Han, Q. Liu, X. Hu, et al., “Pp-yolov2: A practical object detector,” arXiv preprint arXiv:2104.10419, 2021.
  121. S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, G. Wang, Q. Dang, S. Wei, Y. Du, et al., “Pp-yoloe: An evolved version of yolo,” arXiv preprint arXiv:2203.16250, 2022.
  122. L. Rao, “Treenet: A lightweight one-shot aggregation convolutional network,” arXiv preprint arXiv:2109.12342, 2021.
  123. M. Contributors, “PP-YOLOE by MMYOLO.” https://github.com/open-mmlab/mmyolo/tree/main/configs/ppyoloe, 2023. Accessed: May 13, 2023.
  124. R. team, “YOLO-NAS by Deci Achieves State-of-the-Art Performance on Object Detection Using Neural Architecture Search.” https://deci.ai/blog/yolo-nas-object-detection-foundation-model/, 2023. Accessed: May 12, 2023.
  125. X. Chu, L. Li, and B. Zhang, “Make repvgg greater again: A quantization-aware approach,” arXiv preprint arXiv:2212.01593, 2022.
  126. S. Shao, Z. Li, T. Zhang, C. Peng, G. Yu, X. Zhang, J. Li, and J. Sun, “Objects365: A large-scale, high-quality dataset for object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 8430–8439, 2019.
  127. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  128. Y. Fang, B. Liao, X. Wang, J. Fang, J. Qi, R. Wu, J. Niu, and W. Liu, “You only look at one sequence: Rethinking transformer in vision through object detection,” Advances in Neural Information Processing Systems, vol. 34, pp. 26183–26197, 2021.
  129. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  130. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision, pp. 213–229, Springer, 2020.
  131. Z. Zhang, X. Lu, G. Cao, Y. Yang, L. Jiao, and F. Liu, “Vit-yolo: Transformer-based yolo for object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 2799–2808, 2021.
  132. Z. Guo, C. Wang, G. Yang, Z. Huang, and G. Li, “Msft-yolo: Improved yolov5 based on transformer for detecting defects of steel surface,” Sensors, vol. 22, no. 9, p. 3467, 2022.
  133. Y. Liu, G. He, Z. Wang, W. Li, and H. Huang, “Nrt-yolo: Improved yolov5 based on nested residual transformer for tiny remote sensing object detection,” Sensors, vol. 22, no. 13, p. 4953, 2022.
  134. G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3974–3983, 2018.
  135. S. Wang, S. Gao, L. Zhou, R. Liu, H. Zhang, J. Liu, Y. Jia, and J. Qian, “Yolo-sd: Small ship detection in sar images by multi-scale convolution and feature transformer module,” Remote Sensing, vol. 14, no. 20, p. 5268, 2022.
  136. S. Wei, X. Zeng, Q. Qu, M. Wang, H. Su, and J. Shi, “Hrsid: A high-resolution sar images dataset for ship detection and instance segmentation,” Ieee Access, vol. 8, pp. 120234–120254, 2020.
  137. H. Ouyang, “Deyo: Detr with yolo for step-by-step object detection,” arXiv preprint arXiv:2211.06588, 2022.
  138. Ultralytics, “YOLOv8—Ultralytics YOLOv8 Documentation.” https://docs.ultralytics.com/models/yolov8/, 2023. Accessed: January 7, 2024.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Juan Terven (4 papers)
  2. Diana Cordova-Esparza (1 paper)
Citations (668)