Efficient IoT Inference via Context-Awareness (2310.19112v2)
Abstract: While existing strategies to execute deep learning-based classification on low-power platforms assume the models are trained on all classes of interest, this paper posits that adopting context-awareness i.e. narrowing down a classification task to the current deployment context consisting of only recent inference queries can substantially enhance performance in resource-constrained environments. We propose a new paradigm, CACTUS, for scalable and efficient context-aware classification where a micro-classifier recognizes a small set of classes relevant to the current context and, when context change happens (e.g., a new class comes into the scene), rapidly switches to another suitable micro-classifier. CACTUS features several innovations, including optimizing the training cost of context-aware classifiers, enabling on-the-fly context-aware switching between classifiers, and balancing context switching costs and performance gains via simple yet effective switching policies. We show that CACTUS achieves significant benefits in accuracy, latency, and compute budget across a range of datasets and IoT platforms.
- https://www.tensorflow.org/lite/performance/post_training_quant.
- Arm cortex-a77. https://www.arm.com/products/silicon-ip-cpu/cortex-a/cortex-a77.
- Arm11. https://developer.arm.com/documentation/ddi0360/f/introduction/about-the-processor.
- Gap8 tool chain. https://greenwaves-technologies.com/tools-and-software/.
- Snapshot camdeboo. https://lila.science/datasets/snapshot-camdeboo.
- Snapshot enonkishu. https://lila.science/datasets/snapshot-enonkishu.
- Snapshot serengeti. https://lila.science/datasets/snapshot-serengeti.
- Test-time data augmentation for estimation of heteroscedastic aleatoric uncertainty in deep neural networks. In Medical Imaging with Deep Learning, 2018.
- Designing neural network architectures using reinforcement learning.
- Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM, pages 176–189. ACM, 2016.
- TVM: An automated End-to-End optimizing compiler for deep learning. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 578–594, Carlsbad, CA, October 2018. USENIX Association.
- A survey of model compression and acceleration for deep neural networks. arXiv preprint arXiv:1710.09282, 2017.
- Deep feature compression for collaborative object detection. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 3743–3747. IEEE, 2018.
- An analysis of single-layer networks in unsupervised feature learning. In Geoffrey Gordon, David Dunson, and Miroslav Dudík, editors, Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pages 215–223, Fort Lauderdale, FL, USA, 11–13 Apr 2011. PMLR.
- Lightweight compression of neural network feature tensors for collaborative intelligence. In 2020 IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2020.
- Robustifying the deployment of tinyML models for autonomous mini-vehicles. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, may 2021.
- Palleon: A runtime system for efficient video processing toward dynamic class skew. In 2021 USENIX Annual Technical Conference (USENIX ATC 21), pages 427–441. USENIX Association, July 2021.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.
- A survey of uncertainty in deep neural networks. CoRR, abs/2107.03342, 2021.
- Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services, pages 123–136, 2016.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- Amc: Automl for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV), pages 784–800, 2018.
- A baseline for detecting misclassified and out-of-distribution examples in neural networks. 2016.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Focus: Querying large video datasets with low latency and low cost. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18), pages 269–286, Carlsbad, CA, October 2018. USENIX Association.
- Re-thinking computation offload for efficient inference on iot devices with duty-cycled radios. In The 29th International Conference on Mobile Computing and Networking (MobiCom ’23), 2023.
- Clio: Enabling automatic compilation of deep learning pipelines across iot and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking, pages 1–12, 2020.
- Tinyml: Enabling of inference deep learning models on ultra-low-power iot edge devices for ai applications. Micromachines, 13:851, 05 2022.
- Packet-loss-tolerant split inference for delay-sensitive deep learning in lossy wireless networks. arXiv preprint arXiv:2104.13629, 2021.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2704–2713, 2018.
- Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, pages 253–266, 2018.
- Chameleon: Scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication, SIGCOMM ’18, page 253–266, New York, NY, USA, 2018. Association for Computing Machinery.
- Noscope: optimizing neural network queries over video at scale. arXiv preprint arXiv:1703.02529, 2017.
- Noscope: Optimizing neural network queries over video at scale. Proc. VLDB Endow., 10(11):1586–1597, aug 2017.
- Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News, 45(1):615–629, 2017.
- Shallow-deep networks: Understanding and mitigating network overthinking. In International Conference on Machine Learning, pages 3301–3310. PMLR, 2019.
- Enabling deep learning at the lot edge. In 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 1–6, 2018.
- Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus. arXiv preprint arXiv:1801.06601, 2018.
- Simple and scalable predictive uncertainty estimation using deep ensembles. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6405–6416, Red Hook, NY, USA, 2017. Curran Associates Inc.
- Deepear: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 283–294. ACM, 2015.
- It’s always personal: Using early exits for efficient on-device cnn personalisation. In Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications, pages 15–21, 2021.
- Pruning filters for efficient convnets, 2016.
- Learning small-size dnn with output-distribution-based criteria. In Fifteenth annual conference of the international speech communication association, 2014.
- Train big, then compress: Rethinking model size for efficient training and inference of transformers. In International Conference on Machine Learning, pages 5958–5968. PMLR, 2020.
- On-demand deep model compression for mobile devices: A usage-driven model selection framework. 2018.
- Metapruning: Meta learning for automatic neural network channel pruning. In Proceedings of the IEEE International Conference on Computer Vision, pages 3296–3305, 2019.
- A survey on the optimization of neural network accelerators for micro-ai on-device inference. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2021.
- Edge machine learning for ai-enabled iot devices: A review. Sensors (Basel, Switzerland), 20, 2020.
- Inhibited softmax for uncertainty estimation in neural networks, 2018.
- Towards maximizing the representation gap between in-domain & out-of-distribution examples. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9239–9250. Curran Associates, Inc., 2020.
- Efficient neural architecture search via parameters sharing. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4095–4104. PMLR, 10–15 Jul 2018.
- Direct uncertainty prediction for medical second opinions. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 5281–5290. PMLR, 09–15 Jun 2019.
- Density estimation in representation space to predict model uncertainty, 2019.
- Evidential deep learning to quantify classification uncertainty. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
- Better aggregation in test-time augmentation. 2020.
- Fast video classification via adaptive cascading of deep models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
- EfficientNet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR, 09–15 Jun 2019.
- Branchynet: Fast inference via early exiting from deep neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 2464–2469. IEEE, 2016.
- Distributed deep neural networks over the cloud, the edge and end devices. In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pages 328–339. IEEE, 2017.
- https://greenwaves-technologies.com/gap8-product/. GAP8: Ultra-low power, always-on processor for embedded artificial intelligence.
- Tinyml: Machine learning with tensorflow lite on arduino and ultra-low-power microcontrollers. O’Reilly Media, 2019.
- Fastdeepiot: Towards understanding and optimizing neural network execution time on mobile and embedded devices. In Proceedings of the 16th ACM Conference on Embedded Networked Sensor Systems, pages 278–291. ACM, 2018.
- Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.