Designing Large Foundation Models for Efficient Training and Inference: A Survey
Abstract: This paper focuses on modern efficient training and inference technologies on foundation models and illustrates them from two perspectives: model and system design. Model and System Design optimize LLM training and inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible. The paper list repository is available at https://github.com/NoakLiu/Efficient-Foundation-Models-Survey.
- Amable (2021). Memory use of gpt-j 6b.
- Do deep nets really need to be deep? In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pages 2654–2662. MIT Press.
- Gordon, M. (2020). Do we really need model compression?
- Minillm: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2704–2713.
- Efficient memory management for large language model serving with pagedattention. In SOSP ’23: Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
- Awq: Activation-aware weight quantization for on-device llm compression and acceleration. Proceedings of Machine Learning and Systems, 6:87–100.
- Llm-pruner: On the structural pruning of large language models. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 21702–21720.
- Efficient streaming language models with attention sinks. In The Twelfth International Conference on Learning Representations.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.