2000 character limit reached
Adaptive Inference: Theoretical Limits and Unexplored Opportunities (2402.04359v1)
Published 6 Feb 2024 in cs.LG
Abstract: This paper introduces the first theoretical framework for quantifying the efficiency and performance gain opportunity size of adaptive inference algorithms. We provide new approximate and exact bounds for the achievable efficiency and performance gains, supported by empirical evidence demonstrating the potential for 10-100x efficiency improvements in both Computer Vision and Natural Language Processing tasks without incurring any performance penalties. Additionally, we offer insights on improving achievable efficiency gains through the optimal selection and design of adaptive inference state spaces.
- Carbontracker: Tracking and predicting the carbon footprint of training deep learning models. arXiv preprint arXiv:2007.03051, 2020.
- Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023.
- Adamv-moe: Adaptive multi-task vision mixture-of-experts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 17346–17357, 2023.
- Chapter eight - energy-efficient deep learning inference on edge devices. In Kim, S. and Deka, G. C. (eds.), Hardware Accelerator Systems for Artificial Intelligence and Machine Learning, volume 122 of Advances in Computers, pp. 247–301. Elsevier, 2021. doi: https://doi.org/10.1016/bs.adcom.2020.07.002. URL https://www.sciencedirect.com/science/article/pii/S0065245820300553.
- Trends in ai inference energy consumption: Beyond the performance-vs-parameter laws of deep learning. Sustainable Computing: Informatics and Systems, 38:100857, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Freund, K. Google cloud doubles down on nvidia gpus for inference, 2019. URL https://www. forbes. com/sites/moorinsights/2019/05/09/google-cloud-doubles-down-on-nvidia-gpus-for-inference, 2019.
- A framework for few-shot language model evaluation, 12 2023. URL https://zenodo.org/records/10256836.
- Retrospective: Eie: Efficient inference engine on sparse and compressed neural network. arXiv preprint arXiv:2306.09552, 2023.
- Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):7436–7456, 2021.
- Hugginface, 2024. URL https://HuggingFaceH4/open_llm_leaderboard. [Accessed 08-01-2024].
- Eenet: Learning to early exit for adaptive inference. arXiv preprint arXiv:2301.07099, 2023.
- Automoe: Heterogeneous mixture-of-experts with adaptive computation for efficient neural machine translation. In ACL 2023, June 2023.
- Adaptive inference through early-exit networks: Design, challenges and directions. In Proceedings of the 5th International Workshop on Embedded and Mobile Deep Learning, pp. 1–6, 2021.
- Edge ai: On-demand accelerating deep neural network inference via edge computing. IEEE Transactions on Wireless Communications, 19(1):447–457, 2019.
- Adaptive gating in mixture-of-experts based language models. arXiv preprint arXiv:2310.07188, 2023a.
- Model compression for deep neural networks: A survey. Computers, 12(3):60, 2023b.
- Great power, great responsibility: Recommendations for reducing energy for training language models. arXiv preprint arXiv:2205.09646, 2022.
- Ar-net: Adaptive frame resolution for efficient action recognition. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pp. 86–104. Springer, 2020.
- Paperswithcode, 2024. URL https://paperswithcode.com/sota/image-classification-on-imagenet. [Accessed 6-01-2024].
- Finding the sweet spot: Analysis and improvement of adaptive inference in low resource settings. arXiv preprint arXiv:2306.02307, 2023.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
- From words to watts: Benchmarking the energy costs of large language model inference. In 2023 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–9. IEEE, 2023.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, pp. 6105–6114. PMLR, 2019.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- A survey on model compression and acceleration for pretrained language models. In Proceedings of the AAAI Conference on Artificial Intelligence, pp. 10566–10575, 2023.
- Scaling for edge inference of deep neural networks. Nature Electronics, 1(4):216–222, 2018.
- Resolution adaptive networks for efficient inference. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2369–2378, 2020.
- URL https://huggingface.co/spaces/MrYXJ/calculate-model-flops. [Accessed 08-01-2024].
- Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.