Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems (2405.10426v1)
Abstract: Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.
- Efficient intermittent computing with differential checkpointing. In Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. 70–81.
- The Internet of Batteryless Things. Commun. ACM 67, 3 (2024), 64–73.
- Khakim Akhunov and Kasim Sinan Yildirim. 2022. AdaMICA: Adaptive Multicore Intermittent Computing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–30.
- Ambiq. 2022. Ambiq Accelerates the Development of Optimized AI Features with neuralSPOT. https://ambiq.com/news/ambiq-accelerates-the-development-of-optimized-ai-features-with-neuralspot/.
- Anonymous. 2023. FreeML Github Repo. https://.
- Protean: An energy-efficient and heterogeneous platform for adaptive and hardware-accelerated battery-free computing. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems. 207–221.
- Adaptive Intelligence for Batteryless Sensors Using Software-Accelerated Tsetlin Machines. In Proceedings of SenSys.
- REHASH: A Flexible, Developer Focused, Heuristic Adaptation Platform for Intermittently Powered Computing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1–42.
- Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems.
- Enable deep learning on mobile devices: Methods, systems, and applications. ACM Transactions on Design Automation of Electronic Systems (TODAES) 27, 3 (2022), 1–50.
- Alexei Colin and Brandon Lucia. 2016. Chain: tasks and channels for reliable intermittent programs. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications.
- Camaroptera: A long-range image sensor with local inference for remote sensing applications. ACM Transactions on Embedded Computing Systems (TECS) (2022).
- Analog Devices. 2020. Artificial Intelligence Microcontroller with Ultra-Low-Power Convolutional Neural Network Accelerator. Retrieved June 15, 2022 from https://datasheets.maximintegrated.com/en/ds/MAX78000.pdf
- Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 199–213.
- Protonn: Compressed and accurate knn for resource-scarce devices. In International conference on machine learning. PMLR.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
- Learning both weights and connections for efficient neural network. Advances in neural information processing systems 28 (2015).
- Deep residual learning for image recognition. arXiv 2015. arXiv preprint arXiv:1512.03385 14 (2015).
- Edge Impulse: An MLOps Platform for Tiny Machine Learning. arXiv (2022).
- Andrey Ignatov. [n. d.]. HAR. https://github.com/aiff22/HAR
- Texas Instruments Inc. 2017. MSP430FR59xx Mixed-Signal Microcontrollers (Rev. F). https://www.ti.com/lit/ds/symlink/msp430fr5969.pdf
- Bashima Islam and Shahriar Nirjon. 2020. Zygarde: Time-Sensitive On-Device Deep Inference and Adaptation on Intermittently-Powered Systems. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3.
- HarvNet: Resource-Optimized Operation of Multi-Exit Deep Neural Networks on Energy Harvesting Devices. In Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services. 42–55.
- Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).
- Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009).
- Seulki Lee and Shahriar Nirjon. 2019. Neuro. zero: a zero-energy neural network accelerator for embedded sensing and inference systems. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems. 138–152.
- Seulki Lee and Shahriar Nirjon. 2020. Fast and scalable in-memory deep multitask learning via neural weight virtualization. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services. 175–190.
- Seulki Lee and Shahriar Nirjon. 2022. Weight Separation for Memory-Efficient and Accurate Deep Multitask Learning. In 2022 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 13–22.
- Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020), 11711–11722.
- Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).
- Alpaca: Intermittent execution without checkpoints. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–30.
- Dynamic task-based intermittent execution for energy-harvesting devices. ACM Transactions on Sensor Networks (TOSN) 16, 1 (2020), 1–24.
- Intermittent-aware neural architecture search. ACM Transactions on Embedded Computing Systems (TECS) 20, 5s (2021), 1–27.
- ePerceptive: energy reactive embedded intelligence for batteryless sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 382–394.
- Global solution of fully-observed variational Bayesian matrix factorization is column-wise independent. Advances in Neural Information Processing Systems 24 (2011).
- Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464–2469.
- Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).
- Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR abs/1708.07747 (2017). arXiv:1708.07747 http://arxiv.org/abs/1708.07747
- Fastdeepiot: Towards understanding and optimizing neural network execution time on mobile and embedded devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 278–291.
- Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1–14.
- Ink: Reactive kernel for tiny batteryless sensors. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 41–53.
- Efficient and Safe I/O Operations for Intermittent Systems. In Proceedings of the Eighteenth European Conference on Computer Systems. 63–78.
- Immortal Threads: Multithreaded Event-driven Intermittent Computing on {{\{{Ultra-Low-Power}}\}} Microcontrollers. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 339–355.