Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design (2409.01990v2)

Published 3 Sep 2024 in cs.DC and cs.LG

Abstract: As LLMs become popular, the need for efficient design for ML models on LLMs grows. We are amazed by the excellent output by the LLMs, yet we are still troubled with slow inference speed and large memory consumption of contemporary LLMs. This paper focuses on modern efficient inference technologies on LLMs and illustrates them from two perspectives: model and system design. These methodologies optimize LLM inference from different aspects to save computational resources, making LLMs more efficient, affordable, and more accessible.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (10)

Authors (8)

Dong Liu (266 papers)
Zhixin Lai (11 papers)
Yite Wang (5 papers)
Jing Wu (182 papers)
Yanxuan Yu (3 papers)
Zhongwei Wan (39 papers)
Benjamin Lengerich (6 papers)
Ying Nian Wu (138 papers)

Tweets

https://twitter.com/LengerichLab/status/1895579679320576396

Efficient Large Foundation Model Inference: A Perspective From Model and System Co-Design (2409.01990v2)

Related Papers

Tweets