Merino: Entropy-driven Design for Generative Language Models on IoT Devices (2403.07921v2)

Published 28 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: Generative LLMs stand as a revolutionary advancement in the modern era of AI. However, scaling down LLMs for resource-constrained hardware, such as Internet-of-Things (IoT) devices requires non-trivial efforts and domain knowledge. In this paper, we propose a novel information-entropy framework for designing mobile-friendly generative LLMs. The whole design procedure involves solving a mathematical programming (MP) problem, which can be done on the CPU within minutes, making it nearly zero-cost. We evaluate our designed models, termed MeRino, across fourteen NLP downstream tasks, showing their competitive performance against the state-of-the-art autoregressive transformer models under the mobile setting. Notably, MeRino achieves similar or better performance on both LLMing and zero-shot learning tasks, compared to the 350M parameter OPT while being 4.9x faster on NVIDIA Jetson Nano with 5.5x reduction in model size.

References (49)

Authors (5)

Youpeng Zhao (16 papers)
Ming Lin (65 papers)
Huadong Tang (3 papers)
Qiang Wu (154 papers)
Jun Wang (992 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Merino: Entropy-driven Design for Generative Language Models on IoT Devices (2403.07921v2)

Summary

Related Papers

Tweets