TinyVQA: Compact Multimodal Deep Neural Network for Visual Question Answering on Resource-Constrained Devices (2404.03574v1)
Abstract: Traditional machine learning models often require powerful hardware, making them unsuitable for deployment on resource-limited devices. Tiny Machine Learning (tinyML) has emerged as a promising approach for running machine learning models on these devices, but integrating multiple data modalities into tinyML models still remains a challenge due to increased complexity, latency, and power consumption. This paper proposes TinyVQA, a novel multimodal deep neural network for visual question answering tasks that can be deployed on resource-constrained tinyML hardware. TinyVQA leverages a supervised attention-based model to learn how to answer questions about images using both vision and language modalities. Distilled knowledge from the supervised attention-based VQA model trains the memory aware compact TinyVQA model and low bit-width quantization technique is employed to further compress the model for deployment on tinyML devices. The TinyVQA model was evaluated on the FloodNet dataset, which is used for post-disaster damage assessment. The compact model achieved an accuracy of 79.5%, demonstrating the effectiveness of TinyVQA for real-world applications. Additionally, the model was deployed on a Crazyflie 2.0 drone, equipped with an AI deck and GAP8 microprocessor. The TinyVQA model achieved low latencies of 56 ms and consumes 693 mW power while deployed on the tiny drone, showcasing its suitability for resource-constrained embedded systems.
- Heteroedge: Addressing asymmetry in heterogeneous collaborative autonomous systems. In 2023 IEEE 20th International Conference on Mobile Ad Hoc and Smart Systems (MASS), pages 575–583, 2023.
- Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems, 3, 2021.
- Mobivqa: Efficient on-device visual question answering. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(2):1–23, 2022.
- A ultra-low-energy convolution engine for fast brain-inspired vision in multicore clusters. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE), pages 683–688. IEEE, 2015.
- Gap-8: A risc-v soc for ai at the edge of the iot. In 2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 1–4. IEEE, 2018.
- Pulp-nn: accelerating quantized neural networks on parallel ultra-low-power risc-v processors. Philosophical Transactions of the Royal Society A, 378(2164):20190155, 2020.
- Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 6904–6913, 2017.
- Squeezed edge yolo: Onboard object detection on edge devices. arXiv preprint arXiv:2312.11716, 2023.
- Flood-resnet50: Optimized deep learning model for efficient flood detection on edge device. In IEEE, 22nd International Conference on Machine Learning and Applications (ICMLA 2023 ). IEEE, 2023.
- A flexible multichannel eeg artifact identification processor using depthwise-separable convolutional neural networks. ACM Journal on Emerging Technologies in Computing Systems (JETC), 2020.
- Tiny-pulp-dronets: Squeezing neural networks for faster and lighter inference on multi-tasking autonomous nano-drones. In 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS), pages 287–290. IEEE, 2022.
- Mcunet: Tiny deep learning on iot devices. arXiv preprint arXiv:2007.10319, 2020.
- Reprohrl: Towards multi-goal navigation in the real world using hierarchical agents. on 37th aaai conference on artificial intelligence. In The 1st Reinforcement Learning Ready for Production workshop, 2023.
- A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.
- Metae2rl: Toward metareasoning for energy-efficient multi-goal reinforcement learning with squeezed edge yolo. IEEE Micro, 2023.
- Mlae2: Metareasoning for latency-aware energy-efficient autonomous nano-drones. In 2023 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1–5. IEEE, 2023.
- Aris: A real time edge computed accident risk inference system. In 2021 IEEE International Conference on Smart Computing (SMARTCOMP), pages 47–54. IEEE, 2021.
- Towards developing a data security aware federated training framework in multi-modal contested environments. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications IV, volume 12113, pages 189–198. SPIE, 2022.
- Floodnet: A high resolution aerial imagery dataset for post flood scene understanding. IEEE Access, 9:89644–89654, 2021.
- Hasib-Al Rashid et al. A low-power lstm processor for multi-channel brain eeg artifact detection. In 2020 21th International Symposium on Quality Electronic Design (ISQED). IEEE, 2020.
- Hasib-Al Rashid et al. Tinym22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTnet-v2: A compact low power software hardware architecture for Multimodal deep neural networks. ACM Transactions on Embedded Computing Systems, 2023.
- Hasib-Al Rashid et al. “TinyM22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTNet-V3: Memory-Aware Compressed Multimodal Deep Neural Networks for Sustainable Edge Deployment”. Accepted, 2nd Workshop on Sustainable AI, AAAI Conference, 2024.
- Hac-pocd: Hardware-aware compressed activity monitoring and fall detector edge poc devices. In 2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), pages 1–5. IEEE, 2023.
- Tinym2net: A flexible system algorithm co-designed multimodal learning framework for tiny devices. ArXiv, 2022.
- Coughnet-v2: A scalable multimodal dnn framework for point-of-care edge devices to detect symptomatic covid-19 cough. In 2022 IEEE Healthcare Innovations and Point of Care Technologies (HI-POCT), pages 37–40. IEEE, 2022.
- Sam-vqa: Supervised attention-based visual question answering model for post-disaster damage assessment on remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 2023.
- Uav-vqg: Visual question generation framework on uav images. In 2021 IEEE International Conference on Big Data (Big Data), pages 4211–4219, 2021.
- Vqa-aid: Visual question answering for post-disaster damage assessment and analysis. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 8660–8663, 2021.
- Grad-cam aware supervised attention for visual question answering for post-disaster damage assessment. In 2022 IEEE International Conference on Image Processing (ICIP), pages 3783–3787, 2022.
- Polar-vqa: Visual question answering on remote sensed ice sheet imagery from polar region. arXiv preprint arXiv:2303.07403, 2023.
- Hasib-Al Rashid (5 papers)
- Argho Sarkar (4 papers)
- Aryya Gangopadhyay (15 papers)
- Maryam Rahnemoonfar (23 papers)
- Tinoosh Mohsenin (30 papers)