Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems (2404.11488v1)
Abstract: This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack
- A deep learning-based face mask detector for autonomous nano-drones (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12903–12904, 2022.
- Improving video object detection by seq-bboxmatching. In VISIGRAPP(5:VISAPP), pages 226–233, 2019.
- Memory enhanced global-local aggregation for video object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10334–10343, 2020.
- Yolox: Exceeding yolo series in 2021. arXivpreprintarXiv:2107.08430, 2021.
- Seq-nms for video object detection. CoRR, abs/1602.08465, 2016.
- End-to-end video object detection with spatial-temporal transformers. In Proceedings of the 29th ACM International Conference on Multimedia, page 1507–1516, New York, NY, USA, 2021. Association for Computing Machinery.
- Low-power license plate detection and recognition on a risc-v multi-core mcu-based vision system. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2021.
- Bio-inspired autonomous exploration policies with cnn-based object detection on nano-drones. In 2023 Design, Automation & Testin Europe Conference & Exhibition (DATE), pages 1–6. IEEE, 2023.
- A motion-based seq-bbox matching method for video object detection. In 2021 IEEE Symposium on Computers and Communications (ISCC), pages 1–7, 2021.
- Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
- Looking fast and slow: Memory-guided mobile video object detection. arXiv preprint arXiv:1903.10172, 2019.
- Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9992–10002, Los Alamitos, CA, USA, 2021. IEEE Computer Society.
- Video object detection with a convolutional regression tracker. ISPRS Journal of Photogrammetry and Remote Sensing, 176:139–150, 2021.
- Adaptive deep learning for efficient visual pose estimation aboard ultra-low-power nano-drones. ArXiv, abs/2401.15236, 2024.
- The extreme edge at the bottom of the internet of things: A review. IEEE Sensors Journal, PP:1–1, 2019.
- RangiLyu. Nanodet-plus superfast and high accuracy lightweight anchor-free object detection model. 2021.
- Yolov3: An incremental improvement, 2018. cite arxiv:1804.02767Comment: Tech Report.
- Faster r-cnn: Towards real-time object detection with region proposal networks. pages 1137–1149, Los Alamitos, CA, USA, 2017. IEEE Computer Society.
- Vega: A ten-core soc for iot endnodes with dnn acceleration and cognitive wake-up from mram-based state-retentive sleep mode. IEEE Journal of Solid-State Circuits, 57(1):127–139, 2022.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4510–4520, 2018.
- Yolov: Making still image object detectors great at video object detection. Proceedings of the AAAI Conference on Artificial Intelligence, 37(2):2254–2262, 2023.
- An ultra-low-power design of smart wearable stereo camera. In SoutheastCon 2021, pages 1–8, 2021.
- Efficientdet: Scalable and efficient object detection. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10778–10787, Los Alamitos, CA, USA, 2020. IEEE Computer Society.
- Learning spatiotemporal features with 3d convolutional networks. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 4489–4497, Los Alamitos, CA, USA, 2015. IEEE Computer Society.
- Blockcopy: High-resolution video processing with block-sparse feature propagation and online policies. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 5138–5147, 2021.
- Mobiledets:searching for object detection architectures for mobile accelerators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3825–3834, 2021.
- Bytetrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision, 2021.
- Integrated object detection and tracking with tracklet-conditioned detection. ArXiv, abs/1811.11167, 2018.
- Dynamic resolution network. In Neural Information Processing Systems, 2021a.
- Flow-guided feature aggregation for video object detection. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 408–417, 2017.
- Deformable {detr}: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2021b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.