Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices (2404.03574v1)

Published 4 Apr 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Heteroedge: Addressing asymmetry in heterogeneous collaborative autonomous systems. In 2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pages 575–583, 2023.
  2. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems, 3, 2021.
  3. Mobivqa: Efficient on-device visual question answering. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(2):1–23, 2022.
  4. A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 683–688. IEEE, 2015.
  5. Gap-8: A risc-v soc for ai at the edge of the iot. In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 1–4. IEEE, 2018.
  6. Pulp-nn: accelerating quantized neural networks on parallel ultra-low-power risc-v processors. Philosophical Transactions of the Royal Society A, 378(2164):20190155, 2020.
  7. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913, 2017.
  8. Squeezed edge yolo: Onboard object detection on edge devices. arXiv preprint arXiv:2312.11716, 2023.
  9. Flood-resnet50: Optimized deep learning model for efficient flood detection on edge device. In IEEE, 22nd International Conference on Machine Learning and Applications (ICMLA 2023 ). IEEE, 2023.
  10. A flexible multichannel eeg artifact identification processor using depthwise-separable convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems (JETC), 2020.
  11. Tiny-pulp-dronets: Squeezing neural networks for faster and lighter inference on multi-tasking autonomous nano-drones. In 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), pages 287–290. IEEE, 2022.
  12. Mcunet: Tiny deep learning on iot devices. arXiv preprint arXiv:2007.10319, 2020.
  13. Reprohrl: Towards multi-goal navigation in the real world using hierarchical agents. on 37th aaai conference on artificial intelligence. In The 1st Reinforcement Learning Ready for Production workshop, 2023.
  14. A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.
  15. Metae2rl: Toward metareasoning for energy-efficient multi-goal reinforcement learning with squeezed edge yolo. IEEE Micro, 2023.
  16. Mlae2: Metareasoning for latency-aware energy-efficient autonomous nano-drones. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2023.
  17. Aris: A real time edge computed accident risk inference system. In 2021 IEEE International Conference on Smart Computing (SMARTCOMP), pages 47–54. IEEE, 2021.
  18. Towards developing a data security aware federated training framework in multi-modal contested environments. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV, volume 12113, pages 189–198. SPIE, 2022.
  19. Floodnet: A high resolution aerial imagery dataset for post flood scene understanding. IEEE Access, 9:89644–89654, 2021.
  20. Hasib-Al Rashid et al. A low-power lstm processor for multi-channel brain eeg artifact detection. In 2020 21th International Symposium on Quality Electronic Design (ISQED). IEEE, 2020.
  21. Hasib-Al Rashid et al. Tinym22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTnet-v2: A compact low power software hardware architecture for Multimodal deep neural networks. ACM Transactions on Embedded Computing Systems, 2023.
  22. Hasib-Al Rashid et al. “TinyM22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTNet-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment”. Accepted, 2nd Workshop on Sustainable AI, AAAI Conference, 2024.
  23. Hac-pocd: Hardware-aware compressed activity monitoring and fall detector edge poc devices. In 2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 1–5. IEEE, 2023.
  24. Tinym2net: A flexible system algorithm co-designed multimodal learning framework for tiny devices. ArXiv, 2022.
  25. Coughnet-v2: A scalable multimodal dnn framework for point-of-care edge devices to detect symptomatic covid-19 cough. In 2022 IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT), pages 37–40. IEEE, 2022.
  26. Sam-vqa: Supervised attention-based visual question answering model for post-disaster damage assessment on remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  27. Uav-vqg: Visual question generation framework on uav images. In 2021 IEEE International Conference on Big Data (Big Data), pages 4211–4219, 2021.
  28. Vqa-aid: Visual question answering for post-disaster damage assessment and analysis. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 8660–8663, 2021.
  29. Grad-cam aware supervised attention for visual question answering for post-disaster damage assessment. In 2022 IEEE International Conference on Image Processing (ICIP), pages 3783–3787, 2022.
  30. Polar-vqa: Visual question answering on remote sensed ice sheet imagery from polar region. arXiv preprint arXiv:2303.07403, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hasib-Al Rashid (5 papers)
  2. Argho Sarkar (4 papers)
  3. Aryya Gangopadhyay (15 papers)
  4. Maryam Rahnemoonfar (23 papers)
  5. Tinoosh Mohsenin (30 papers)
Citations (3)

Summary

  • The paper demonstrates a novel approach to distill a baseline VQA model into a compact version tailored for tinyML devices.
  • It employs knowledge distillation and low bit-width quantization to achieve 79.5% accuracy with a 100% reduction in memory usage.
  • The model’s deployment on platforms like the Crazyflie 2.0 drone highlights its real-world viability for rapid, autonomous decision-making.

TinyVQA: A Novel Approach for Visual Question Answering on Resource-Limited Devices

Introduction to TinyVQA

In the field of tiny Machine Learning (tinyML), the proposition of TinyVQA marks a significant stride toward deploying multimodal deep neural networks on devices with limited resources. It introduces a compact, efficient framework tailored for visual question answering (VQA), a domain that necessitates the integration of visual and textual data to generate insights. Historically, the deployment of such sophisticated models on constrained hardware has been fraught with challenges, chiefly due to their complexity and the substantial computational resources they demand. TinyVQA, through its innovative design and strategic optimizations, overcomes these barriers, facilitating the deployment of VQA tasks on tinyML hardware with minimal compromise on performance.

TinyVQA Model Architecture

The architecture of TinyVQA is divided into two primary components, echoing the model's objective to balance performance with efficiency:

  • The Baseline VQA Model leverages an attention-based mechanism, integrating visual and textual cues to answer queries about images. This sophisticated model, while highly accurate, is not inherently optimized for deployment on resource-constrained devices. Its role is pivotal in training, providing a high-quality knowledge base for distilling into the more compact TinyVQA model.
  • The Memory-Aware Compact VQA Model signifies the core of TinyVQA's innovation. This model distills the knowledge from the baseline model, employing techniques such as knowledge distillation and low bit-width quantization to drastically reduce its size without significantly compromising its accuracy. Designed with the limitations of tinyML hardware in mind, it exemplifies a significant reduction in model size while maintaining functional integrity.

Evaluation of TinyVQA

The effectiveness of TinyVQA was measured using the FloodNet dataset, chosen for its relevance to real-world application in post-disaster scenarios. The dataset, derived from imagery collected post-Hurricane Harvey, provides a diverse set of visual and textual queries, including damage assessment and environmental condition questions. The results are commendable:

  • The TinyVQA model achieved an accuracy of 79.5%, showcasing a mere 1.5% reduction in performance compared to the baseline model, yet demonstrating a 100% decrease in memory usage.
  • These outcomes underscore the model's potential in executing complex VQA tasks within the stringent limitations of tinyML devices, heralding a new era of efficiency and applicability in edge computing.

Deployment on Resource-Constrained Hardware

The deployment of TinyVQA on the Crazyflie 2.0 drone equipped with an AI deck and powered by the GAP8 microprocessor is a testament to its real-world viability. The deployment highlights include:

  • Implementation within the tight memory constraints of the GAP8 architecture, utilizing a mix of model compression techniques to fit within the available resources.
  • The model's operational efficiency, with low latencies of 56 ms and minimal power consumption of 0.7 W, paves the way for real-time, autonomous VQA applications in scenarios where rapid, informed decision-making is crucial.

Conclusion and Future Perspectives

TinyVQA represents a significant leap forward in deploying multimodal deep learning models on resource-limited devices. By demonstrating high accuracy in visual question answering tasks with remarkably low resource consumption, TinyVQA paves the way for advanced, intelligent applications in areas previously constrained by hardware limitations. As tinyML continues to evolve, the principles and methodologies underpinning TinyVQA offer a blueprint for future research and development in the field, especially in scenarios demanding rapid, on-site intelligence, such as disaster response and remote sensing.

With a proven capability to operate on the cutting edge of efficiency and performance, the future of tinyML looks bright, promising unprecedented advancements in how computational intelligence is deployed in the real world.

X Twitter Logo Streamline Icon: https://streamlinehq.com