Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded Systems (2405.10426v1)

Published 16 May 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Efficient intermittent computing with differential checkpointing. In Proceedings of the 20th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems. 70–81.
  2. The Internet of Batteryless Things. Commun. ACM 67, 3 (2024), 64–73.
  3. Khakim Akhunov and Kasim Sinan Yildirim. 2022. AdaMICA: Adaptive Multicore Intermittent Computing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6, 3 (2022), 1–30.
  4. Ambiq. 2022. Ambiq Accelerates the Development of Optimized AI Features with neuralSPOT. https://ambiq.com/news/ambiq-accelerates-the-development-of-optimized-ai-features-with-neuralspot/.
  5. Anonymous. 2023. FreeML Github Repo. https://.
  6. Protean: An energy-efficient and heterogeneous platform for adaptive and hardware-accelerated battery-free computing. In Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems. 207–221.
  7. Adaptive Intelligence for Batteryless Sensors Using Software-Accelerated Tsetlin Machines. In Proceedings of SenSys.
  8. REHASH: A Flexible, Developer Focused, Heuristic Adaptation Platform for Intermittently Powered Computing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 3 (2021), 1–42.
  9. Sourav Bhattacharya and Nicholas D Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems.
  10. Enable deep learning on mobile devices: Methods, systems, and applications. ACM Transactions on Design Automation of Electronic Systems (TODAES) 27, 3 (2022), 1–50.
  11. Alexei Colin and Brandon Lucia. 2016. Chain: tasks and channels for reliable intermittent programs. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications.
  12. Camaroptera: A long-range image sensor with local inference for remote sensing applications. ACM Transactions on Embedded Computing Systems (TECS) (2022).
  13. Analog Devices. 2020. Artificial Intelligence Microcontroller with Ultra-Low-Power Convolutional Neural Network Accelerator. Retrieved June 15, 2022 from https://datasheets.maximintegrated.com/en/ds/MAX78000.pdf
  14. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 199–213.
  15. Protonn: Compressed and accurate knn for resource-scarce devices. In International conference on machine learning. PMLR.
  16. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
  17. Learning both weights and connections for efficient neural network. Advances in neural information processing systems 28 (2015).
  18. Deep residual learning for image recognition. arXiv 2015. arXiv preprint arXiv:1512.03385 14 (2015).
  19. Edge Impulse: An MLOps Platform for Tiny Machine Learning. arXiv (2022).
  20. Andrey Ignatov. [n. d.]. HAR. https://github.com/aiff22/HAR
  21. Texas Instruments Inc. 2017. MSP430FR59xx Mixed-Signal Microcontrollers (Rev. F). https://www.ti.com/lit/ds/symlink/msp430fr5969.pdf
  22. Bashima Islam and Shahriar Nirjon. 2020. Zygarde: Time-Sensitive On-Device Deep Inference and Adaptation on Intermittently-Powered Systems. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 3.
  23. HarvNet: Resource-Optimized Operation of Multi-Exit Deep Neural Networks on Energy Harvesting Devices. In Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services. 42–55.
  24. Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015).
  25. Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009).
  26. Seulki Lee and Shahriar Nirjon. 2019. Neuro. zero: a zero-energy neural network accelerator for embedded sensing and inference systems. In Proceedings of the 17th Conference on Embedded Networked Sensor Systems. 138–152.
  27. Seulki Lee and Shahriar Nirjon. 2020. Fast and scalable in-memory deep multitask learning via neural weight virtualization. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services. 175–190.
  28. Seulki Lee and Shahriar Nirjon. 2022. Weight Separation for Memory-Efficient and Accurate Deep Multitask Learning. In 2022 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 13–22.
  29. Mcunet: Tiny deep learning on iot devices. Advances in Neural Information Processing Systems 33 (2020), 11711–11722.
  30. Rethinking the value of network pruning. arXiv preprint arXiv:1810.05270 (2018).
  31. Alpaca: Intermittent execution without checkpoints. Proceedings of the ACM on Programming Languages 1, OOPSLA (2017), 1–30.
  32. Dynamic task-based intermittent execution for energy-harvesting devices. ACM Transactions on Sensor Networks (TOSN) 16, 1 (2020), 1–24.
  33. Intermittent-aware neural architecture search. ACM Transactions on Embedded Computing Systems (TECS) 20, 5s (2021), 1–27.
  34. ePerceptive: energy reactive embedded intelligence for batteryless sensors. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 382–394.
  35. Global solution of fully-observed variational Bayesian matrix factorization is column-wise independent. Advances in Neural Information Processing Systems 24 (2011).
  36. Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, 2464–2469.
  37. Pete Warden. 2018. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv preprint arXiv:1804.03209 (2018).
  38. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR abs/1708.07747 (2017). arXiv:1708.07747 http://arxiv.org/abs/1708.07747
  39. Fastdeepiot: Towards understanding and optimizing neural network execution time on mobile and embedded devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 278–291.
  40. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework. In Proceedings of the 15th ACM Conference on Embedded Network Sensor Systems. 1–14.
  41. Ink: Reactive kernel for tiny batteryless sensors. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems. 41–53.
  42. Efficient and Safe I/O Operations for Intermittent Systems. In Proceedings of the Eighteenth European Conference on Computer Systems. 63–78.
  43. Immortal Threads: Multithreaded Event-driven Intermittent Computing on {{\{{Ultra-Low-Power}}\}} Microcontrollers. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). 339–355.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com