Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 52 tok/s Pro
GPT-5 Medium 12 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 110 tok/s Pro
GPT OSS 120B 470 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems (2307.13901v2)

Published 26 Jul 2023 in cs.CV

Abstract: We present YOLOBench, a benchmark comprised of 550+ YOLO-based object detection models on 4 different datasets and 4 different embedded hardware platforms (x86 CPU, ARM CPU, Nvidia GPU, NPU). We collect accuracy and latency numbers for a variety of YOLO-based one-stage detectors at different model scales by performing a fair, controlled comparison of these detectors with a fixed training environment (code and training hyperparameters). Pareto-optimality analysis of the collected data reveals that, if modern detection heads and training techniques are incorporated into the learning process, multiple architectures of the YOLO series achieve a good accuracy-latency trade-off, including older models like YOLOv3 and YOLOv4. We also evaluate training-free accuracy estimators used in neural architecture search on YOLOBench and demonstrate that, while most state-of-the-art zero-cost accuracy estimators are outperformed by a simple baseline like MAC count, some of them can be effectively used to predict Pareto-optimal detection models. We showcase that by using a zero-cost proxy to identify a YOLO architecture competitive against a state-of-the-art YOLOv8 model on a Raspberry Pi 4 CPU. The code and data are available at https://github.com/Deeplite/deeplite-torch-zoo

Citations (6)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces YOLOBench, a comprehensive benchmark evaluating over 550 YOLO-based models on diverse embedded hardware to quantify accuracy-latency trade-offs.
  • It employs an extensive architecture search including YOLOv3 to YOLOv8 and Pareto-optimal analysis to determine optimal configurations across multiple datasets and devices.
  • The findings reveal that even traditional YOLO models, when upgraded with modern detection methods, can achieve competitive performance, informing real-time deployment strategies.

Insights into YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems

The paper "YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems" focuses on an in-depth empirical evaluation of YOLO-based models in object detection, specifically tailored for embedded systems. It introduces YOLOBench, a comprehensive benchmark composed of over 550 YOLO-based object detection models applied on four datasets and four embedded hardware platforms. The aim is to provide researchers and developers with insightful data on the performance trade-offs between model accuracy and inference latency, crucial for deploying neural architectures on resource-constrained devices.

Overview of Methodology

The authors designed an extensive architecture search space comprising various combinations of existing YOLO architectures, namely YOLOv3 through YOLOv8, characterized by different backbone and neck structures. The models were evaluated across a spectrum of depth-width multipliers and input resolutions to determine optimal configurations for each device-dataset pair. This exhaustive approach highlights the versatility and enduring efficiency of several YOLO variants, including those traditionally considered outdated, by employing current detection head designs and training methodologies.

The paper utilizes Pareto-optimality analysis to highlight models effectively balancing between accuracy and latency. The embedded hardware platforms used for benchmarking include Intel and ARM CPUs, NVIDIA GPU, and Khadas VIM3 NPU, offering a diverse landscape for evaluating the practical performance of each YOLO configuration.

Numerical Results and Key Findings

The benchmark results unveil that historic architectures, such as YOLOv3 and YOLOv4, can achieve competitive accuracy-latency trade-offs if modern detection methods are incorporated. This finding suggests that incremental changes in neural architecture combined with state-of-the-art training techniques can extract significant performance gains even from older models. Furthermore, differences in device architecture and processing capabilities significantly influence which YOLO models and configurations perform optimally, as evidenced by the distinct Pareto-optimal models identified across platforms.

A compelling aspect of the paper is its exploration into zero-cost accuracy estimators within the field of neural architecture search. While many well-known estimators underperformed against straightforward baselines like MAC count, the NWOT metric emerged as a reliable predictor. Its capacity to suggest Pareto-optimal YOLO configurations without auxiliary training offers potential for streamlining the architectural design phase in embedded system deployment.

Implications and Future Directions

The paper's contributions are twofold: it furnishes a robust benchmark that aids informed decision-making for deploying object detection models on embedded systems, and it evaluates zero-cost predictors that could profoundly impact the efficiency of neural architecture search methods.

Practically, YOLOBench can serve as a reference for real-time applications across sectors such as autonomous vehicles and surveillance systems, where quick and efficient object detection is paramount. Theoretically, this resource can inspire further research into the scalability of YOLO models and stimulate development in architecture designs that factor in the specificities of embedded hardware.

Moving forward, integrating INT8 quantization results would extend YOLOBench’s applicability, reflecting the growing inclination towards maximizing performance through quantization-aware techniques. Additionally, exploring the synergy between different architecture spaces beyond YOLO and expanding the range of datasets can extend the benchmark’s impact.

In summary, "YOLOBench: Benchmarking Efficient Object Detectors on Embedded Systems" offers a data-centric foundation for enhancing YOLO-based object detectors' efficacy on embedded platforms, marking a progressive step in object detection research and deployment strategies.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com