Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 175 tok/s Pro
GPT OSS 120B 450 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems (2404.11488v1)

Published 17 Apr 2024 in cs.CV and cs.AI

Abstract: This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. A deep learning-based face mask detector for autonomous nano-drones (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12903–12904, 2022.
  2. Improving video object detection by seq-bboxmatching. In VISIGRAPP(5:VISAPP), pages 226–233, 2019.
  3. Memory enhanced global-local aggregation for video object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10334–10343, 2020.
  4. Yolox: Exceeding yolo series in 2021. arXivpreprintarXiv:2107.08430, 2021.
  5. Seq-nms for video object detection. CoRR, abs/1602.08465, 2016.
  6. End-to-end video object detection with spatial-temporal transformers. In Proceedings of the 29th ACM International Conference on Multimedia, page 1507–1516, New York, NY, USA, 2021. Association for Computing Machinery.
  7. Low-power license plate detection and recognition on a risc-v multi-core mcu-based vision system. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2021.
  8. Bio-inspired autonomous exploration policies with cnn-based object detection on nano-drones. In 2023 Design, Automation & Testin Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023.
  9. A motion-based seq-bbox matching method for video object detection. In 2021 IEEE Symposium on Computers and Communications (ISCC), pages 1–7, 2021.
  10. Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
  11. Looking fast and slow: Memory-guided mobile video object detection. arXiv preprint arXiv:1903.10172, 2019.
  12. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, Los Alamitos, CA, USA, 2021. IEEE Computer Society.
  13. Video object detection with a convolutional regression tracker. ISPRS Journal of Photogrammetry and Remote Sensing, 176:139–150, 2021.
  14. Adaptive deep learning for efficient visual pose estimation aboard ultra-low-power nano-drones. ArXiv, abs/2401.15236, 2024.
  15. The extreme edge at the bottom of the internet of things: A review. IEEE Sensors Journal, PP:1–1, 2019.
  16. RangiLyu. Nanodet-plus superfast and high accuracy lightweight anchor-free object detection model. 2021.
  17. Yolov3: An incremental improvement, 2018. cite arxiv:1804.02767Comment: Tech Report.
  18. Faster r-cnn: Towards real-time object detection with region proposal networks. pages 1137–1149, Los Alamitos, CA, USA, 2017. IEEE Computer Society.
  19. Vega: A ten-core soc for iot endnodes with dnn acceleration and cognitive wake-up from mram-based state-retentive sleep mode. IEEE Journal of Solid-State Circuits, 57(1):127–139, 2022.
  20. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
  21. Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
  22. Yolov: Making still image object detectors great at video object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2):2254–2262, 2023.
  23. An ultra-low-power design of smart wearable stereo camera. In SoutheastCon 2021, pages 1–8, 2021.
  24. Efficientdet: Scalable and efficient object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10778–10787, Los Alamitos, CA, USA, 2020. IEEE Computer Society.
  25. Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4489–4497, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
  26. Blockcopy: High-resolution video processing with block-sparse feature propagation and online policies. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5138–5147, 2021.
  27. Mobiledets:searching for object detection architectures for mobile accelerators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3825–3834, 2021.
  28. Bytetrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision, 2021.
  29. Integrated object detection and tracking with tracklet-conditioned detection. ArXiv, abs/1811.11167, 2018.
  30. Dynamic resolution network. In Neural Information Processing Systems, 2021a.
  31. Flow-guided feature aggregation for video object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 408–417, 2017.
  32. Deformable {detr}: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021b.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 3 posts and received 43 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube