Papers
Topics
Authors
Recent
Search
2000 character limit reached

Designing Large Foundation Models for Efficient Training and Inference: A Survey

Published 3 Sep 2024 in cs.DC and cs.LG | (2409.01990v5)

Abstract: This paper focuses on modern efficient training and inference technologies on foundation models and illustrates them from two perspectives: model and system design. Model and System Design optimize LLM training and inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible. The paper list repository is available at https://github.com/NoakLiu/Efficient-Foundation-Models-Survey.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Amable (2021). Memory use of gpt-j 6b.
  2. Do deep nets really need to be deep? In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, pages 2654–2662. MIT Press.
  3. Gordon, M. (2020). Do we really need model compression?
  4. Minillm: Knowledge distillation of large language models. In The Twelfth International Conference on Learning Representations.
  5. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.
  6. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2704–2713.
  7. Efficient memory management for large language model serving with pagedattention. In SOSP ’23: Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
  8. Awq: Activation-aware weight quantization for on-device llm compression and acceleration. Proceedings of Machine Learning and Systems, 6:87–100.
  9. Llm-pruner: On the structural pruning of large language models. In Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., editors, Advances in Neural Information Processing Systems, volume 36, pages 21702–21720.
  10. Efficient streaming language models with attention sinks. In The Twelfth International Conference on Learning Representations.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.