Papers
Topics
Authors
Recent
Search
2000 character limit reached

YOLO Evolution: A Comprehensive Benchmark and Architectural Review of YOLOv12, YOLO11, and Their Previous Versions

Published 31 Oct 2024 in cs.CV | (2411.00201v4)

Abstract: This study presents a comprehensive benchmark analysis of various YOLO (You Only Look Once) algorithms. It represents the first comprehensive experimental evaluation of YOLOv3 to the latest version, YOLOv12, on various object detection challenges. The challenges considered include varying object sizes, diverse aspect ratios, and small-sized objects of a single class, ensuring a comprehensive assessment across datasets with distinct challenges. To ensure a robust evaluation, we employ a comprehensive set of metrics, including Precision, Recall, Mean Average Precision (mAP), Processing Time, GFLOPs count, and Model Size. Our analysis highlights the distinctive strengths and limitations of each YOLO version. For example: YOLOv9 demonstrates substantial accuracy but struggles with detecting small objects and efficiency whereas YOLOv10 exhibits relatively lower accuracy due to architectural choices that affect its performance in overlapping object detection but excels in speed and efficiency. Additionally, the YOLO11 family consistently shows superior performance maintaining a remarkable balance of accuracy and efficiency. However, YOLOv12 delivered underwhelming results, with its complex architecture introducing computational overhead without significant performance gains. These results provide critical insights for both industry and academia, facilitating the selection of the most suitable YOLO algorithm for diverse applications and guiding future enhancements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Performance evaluation yolo v5 model for automatic crop and weed classification on uav images. Smart Agricultural Technology, 5:100231, 04 2023.
  2. Yolo-based deep learning model for pressure ulcer detection and classification. In Healthcare, volume 11, page 1222. MDPI, 2023.
  3. Towards real-time dpm object detector for driver assistance. In 2016 IEEE International Conference on Image Processing (ICIP), pages 3842–3846. IEEE, 2016.
  4. Object detection for inventory stock counting using yolov5. In 2022 IEEE 18th International Colloquium on Signal Processing & Applications (CSPA), pages 304–309. IEEE, 2022.
  5. Tree trunk detection of eastern red cedar in rangeland environment with deep learning technique. Croatian Journal of Forest Engineering, 44, 06 2023.
  6. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
  7. Mcs-yolo: A multiscale object detection method for autonomous driving road environment recognition. IEEE Access, 11:22342–22354, 2023.
  8. A small attentional yolo model for landslide detection from satellite remote sensing images. Landslides, 18(8):2751–2765, 2021.
  9. Efficient foreign object detection between psds and metro doors via deep neural networks. IEEE Access, PP:1–1, 03 2020.
  10. N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), volume 1, pages 886–893 vol. 1, 2005.
  11. Yolo-v4 deep learning model for medical face mask detection. In 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), pages 209–213. IEEE, 2021.
  12. Object detection using yolo: Challenges, architectural successors, datasets and applications. multimedia Tools and Applications, 82(6):9243–9275, 2023.
  13. Drone-computer communication based tomato generative organ counting model using yolo v5 and deep-sort. Agriculture, 12:1290, 08 2022.
  14. Benchcloudvision: A benchmark analysis of deep learning approaches for cloud detection and segmentation in remote sensing imagery. arXiv preprint arXiv:2402.13918, 2024.
  15. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627–1645, 2010.
  16. A review and comparative study on probabilistic object detection in autonomous driving. IEEE Transactions on Intelligent Transportation Systems, 23(8):9961–9980, 2021.
  17. A detection algorithm for cherry fruits based on the improved yolo-v4 model. Neural Computing and Applications, 35(19):13895–13906, 2023.
  18. A deep learning approach for face detection using yolo. In 2018 IEEE Punecon, pages 1–4. IEEE, 2018.
  19. Rich feature hierarchies for accurate object detection and semantic segmentation, 2014.
  20. Sensor technologies for intelligent transportation systems. Sensors, 18(4), 2018.
  21. Muhammad Hussain. Yolo-v1 to yolo-v8, the rise of yolo and its complementary nature toward digital manufacturing and industrial defect detection. Machines, 11(7):677, 2023.
  22. Muhammad Hussain. Yolov1 to v8: Unveiling each variant–a comprehensive review of yolo. IEEE Access, 12:42816–42833, 2024.
  23. Autonomous cars: Research results, issues, and future challenges. IEEE Communications Surveys & Tutorials, 21(2):1275–1313, 2019.
  24. Glenn Jocher. Ultralytics yolov5, 2020.
  25. Ultralytics yolov8, 2023.
  26. Ultralytics yolo11, 2024.
  27. Real-time object detection and segmentation technology: an analysis of the yolo algorithm. JMST Advances, 5(2):69–76, 2023.
  28. Toward accurate fused deposition modeling 3d printer fault detection using improved yolov8 with hyperparameter optimization. IEEE Access, PP:1–1, 01 2023.
  29. Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976, 2022.
  30. Cross-domain object detection for autonomous driving: A stepwise domain adaptative yolo approach. IEEE Transactions on Intelligent Vehicles, 7(3):603–615, 2022.
  31. Agricultural greenhouses detection in high-resolution satellite images based on convolutional neural networks: Comparison of faster r-cnn, yolo v3 and ssd. Sensors, 20(17):4938, 2020.
  32. Focal loss for dense object detection, 2018.
  33. A yolo-based pest detection system for precision agriculture. In 2021 29th Mediterranean Conference on Control and Automation (MED), pages 342–347. IEEE, 2021.
  34. Deep learning for generic object detection: A survey. International journal of computer vision, 128:261–318, 2020.
  35. SSD: Single Shot MultiBox Detector, page 21–37. Springer International Publishing, 2016.
  36. Real-time herb leaves localization and classification using yolo. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), pages 1–7. IEEE, 2021.
  37. Fruit detection and load estimation of an orange orchard using the yolo models through simple approaches in different imaging and illumination conditions. Computers and Electronics in Agriculture, 191:106533, 2021.
  38. A yolo based approach for traffic light recognition for adas systems. In 2022 2nd International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), pages 225–229. IEEE, 2022.
  39. Yolo based real-time human detection for smart video surveillance at the edge. In 2020 IEEE eighth international conference on communications and electronics (ICCE), pages 439–444. IEEE, 2021.
  40. Radu Oprea. Traffic signs detection europe dataset. https://universe.roboflow.com/radu-oprea-r4xnm/traffic-signs-detection-europe, feb 2024. visited on 2024-07-12.
  41. A survey on performance metrics for object-detection algorithms. In 2020 international conference on systems, signals and image processing (IWSSIP), pages 237–242. IEEE, 2020.
  42. Identification and separation of medicine through robot using yolo and cnn algorithms for healthcare. In 2023 International Conference on Artificial Intelligence for Innovations in Healthcare Industries (ICAIIHI), volume 1, pages 1–5. IEEE, 2023.
  43. Paul Paul Tsoi. YOLO11: The cutting-edge evolution in object detection — a brief review of the latest in the yolo series. https://medium.com, October 2024. Accessed: 2024-10-17.
  44. Yolo-fine: One-stage detector of small objects under various backgrounds in remote sensing images. Remote Sensing, 12(15):2501, 2020.
  45. A yolo-based model for breast cancer detection in mammograms. Cognitive Computation, 16(1):107–120, 2024.
  46. Sovit Rath. Yolov8 ultralytics: State-of-the-art yolo models. LearnOpenCV–Learn OpenCV, PyTorch, Keras, TensorflowWith Examples and Tutorials, 2023.
  47. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
  48. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7263–7271, 2017.
  49. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  50. Wildect-yolo: An efficient and robust computer vision-based accurate object localization model for automated endangered wildlife detection. Ecological Informatics, 75:101919, 2023.
  51. A computer vision-based object localization model for endangered wildlife detection. Ecological Economics, Forthcoming, 2022.
  52. SIDDHARTH SAH. Ships/vessels in aerial images. https://www.kaggle.com/datasets/siddharthkumarsah/ships-in-aerial-images/data, july 2023. visited on 2024-07-12.
  53. Yolov10 to its genesis: A decadal and comprehensive review of the you only look once series. arXiv preprint arXiv:2406.19407, 2024.
  54. Object detection for autonomous driving using yolo [you only look once] algorithm. In 2021 Third international conference on intelligent communication technologies and virtual mobile networks (ICICV), pages 1370–1374. IEEE, 2021.
  55. Autonomous vehicle-pedestrian interaction modeling platform: A case study in four major cities. Journal of Transportation Engineering Part A Systems, 06 2024.
  56. Enhancing traffic safety with parallel dense video captioning for end-to-end event analysis, 2024.
  57. Very deep convolutional networks for large-scale image recognition, 2015.
  58. A review on yolov8 and its advancements. In International Conference on Data Intelligence and Cognitive Informatics, pages 529–545. Springer, 2024.
  59. suranaree university of technology. africa wild life dataset. https://universe.roboflow.com/suranaree-university-of-technology-wqhl6/africa-wild-life, feb 2023. visited on 2024-07-12.
  60. Ultralytics. YOLOv5: A state-of-the-art real-time object detection system. https://docs.ultralytics.com, 2021. Accessed: insert date here.
  61. A comprehensive review of object detection in yolo: Evolution, variants, and applications.
  62. Virtual fencing using yolo framework in agriculture field. In 2021 Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV), pages 441–446. IEEE, 2021.
  63. P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, volume 1, pages I–I, 2001.
  64. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458, 2024.
  65. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7464–7475, 2023.
  66. Yolov9: Learning what you want to learn using programmable gradient information. arXiv preprint arXiv:2402.13616, 2024.
  67. Mushroom-yolo: A deep learning algorithm for mushroom growth recognition based on improved yolov5 in agriculture 4.0. In 2022 IEEE 20th International Conference on Industrial Informatics (INDIN), pages 239–244. IEEE, 2022.
  68. Mobilenet-yolo based wildlife detection model: A case study in yunnan tongbiguan nature reserve, china. Journal of Intelligent & Fuzzy Systems, 41(1):2171–2181, 2021.
  69. Video analysis in sports by lightweight object detection network under the background of sports industry development. Computational Intelligence and Neuroscience, 2022:1–10, 08 2022.
  70. Object detection in 20 years: A survey. Proceedings of the IEEE, 111(3):257–276, 2023.
Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

Overview

This paper compares many versions of a popular computer vision tool called YOLO (short for “You Only Look Once”). YOLO is used to find and label objects in pictures or videos, like spotting a stop sign on the road, a zebra in a photo, or a ship in satellite images. The paper especially focuses on testing the newest version, YOLO11, against earlier versions (like YOLOv3, v5, v8, v9, and v10) to see which ones are the most accurate and the fastest.

What questions did the researchers ask?

The researchers wanted to answer simple, practical questions:

  • Which YOLO version is the most accurate at finding objects?
  • Which versions run the fastest and use the least computer power?
  • How do different YOLO versions handle hard situations, like very small objects (tiny ships), overlapping objects (animals in wildlife photos), or many classes (lots of types of traffic signs)?
  • Is YOLO11 better than older versions, and if so, in what ways?

How did they do the study?

To keep things fair, the team used the same training setup and rules for all models and tested them on three different, challenging datasets:

  • Traffic Signs: Many types and sizes of signs; tricky because small signs can look similar.
  • African Wildlife: Four animal classes (buffalo, elephant, rhino, zebra); images often have overlapping animals.
  • Ships and Vessels: One class (“ship”); ships are small and can be rotated, which is hard for detectors.

They used models from the Ultralytics YOLO library so the setup was consistent. For each model, they measured:

  • Accuracy: Precision (how many detections are correct), Recall (how many real objects are found), mAP50 (overall accuracy at a medium strictness), and mAP50-95 (accuracy across very strict matching levels).
  • Speed: Preprocessing time (getting images ready), Inference time (the model’s “thinking” time), and Postprocessing time (cleaning up predictions). Think of this like a stopwatch for every step.
  • Computational load: GFLOPs (how much math the model needs) and model size in megabytes (like how big the app is on disk).

They trained and fine-tuned each model in the same way, then compared the results.

Main findings and why they matter

Big picture:

  • YOLO11 models (especially YOLO11m) performed best overall, balancing accuracy, speed, and size.
  • YOLOv9 was very accurate but struggled with small objects and was less efficient.
  • YOLOv10 was extremely fast and efficient, but its accuracy dropped in trickier cases like overlapping objects, because of its architectural choices (it avoids a common cleanup step called NMS by using a different training strategy).
  • Older models improved over time, but the newest versions generally do better across tasks.

Highlights:

  • YOLO11m was the “best balance” model. On average, it was both accurate and fast:
    • mAP50-95 scores: 0.795 (Traffic Signs), 0.81 (African Wildlife), 0.325 (Ships). The ships score is lower because small objects are hard, but 0.325 is still competitive for tiny targets.
    • Speed: about 2.4 milliseconds per image (very fast).
    • Size: about 38.8 MB (not too large).
    • GFLOPs: around 67.6 (a reasonable amount of computation).
  • YOLOv10 stood out for speed and efficiency. If you need real-time results with limited hardware, it’s excellent, but it can miss more objects when they overlap.
  • YOLOv9 was strong in accuracy, especially on simpler or larger objects, but used more compute and wasn’t as good at tiny objects.
  • Across all tests, YOLO11’s new building blocks (C3k2 and C2PSA) helped the model pay attention to the right parts of the image—like focusing on small or overlapping objects more effectively.

Why this matters:

  • Traffic signs: You need high accuracy and speed for self-driving and road safety. YOLO11m and YOLO11l did very well here.
  • Wildlife: Overlapping animals are tough. YOLO9 and YOLO11 families showed strong results, but large models can overfit on small datasets.
  • Ships: Tiny, rotated ships are tricky. YOLO11 still did well relative to the challenge, and its speed helps for scanning large satellite images quickly.

What’s the impact?

This study helps engineers, researchers, and companies pick the right YOLO version for their needs:

  • If you want the best all-around performer, choose YOLO11m: it’s fast, accurate, and not too heavy.
  • If your system is very limited (like a small device), YOLOv10’s speed is a big win.
  • If your task focuses on accuracy and you can afford more compute, consider YOLOv9 or larger YOLO11 variants.

For future work, the results suggest:

  • Keep improving detection of very small objects and overlapping objects.
  • Keep making models both faster and smarter, especially for real-time applications.
  • Use fair, consistent benchmarks (like in this paper) to compare new versions.

In short, YOLO has improved a lot over the years, and YOLO11 shows the best mix of accuracy and speed so far, making it a strong choice for many real-world tasks.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.