Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
88 tokens/sec
Gemini 2.5 Pro Premium
40 tokens/sec
GPT-5 Medium
20 tokens/sec
GPT-5 High Premium
26 tokens/sec
GPT-4o
90 tokens/sec
DeepSeek R1 via Azure Premium
73 tokens/sec
GPT OSS 120B via Groq Premium
485 tokens/sec
Kimi K2 via Groq Premium
197 tokens/sec
2000 character limit reached

Evolving Subnetwork Training for Large Language Models (2406.06962v1)

Published 11 Jun 2024 in cs.CL and cs.AI

Abstract: LLMs have ushered in a new era of artificial intelligence research. However, their substantial training costs hinder further development and widespread adoption. In this paper, inspired by the redundancy in the parameters of LLMs, we propose a novel training paradigm: Evolving Subnetwork Training (EST). EST samples subnetworks from the layers of the LLM and from commonly used modules within each layer, Multi-Head Attention (MHA) and Multi-Layer Perceptron (MLP). By gradually increasing the size of the subnetworks during the training process, EST can save the cost of training. We apply EST to train GPT2 model and TinyLlama model, resulting in 26.7\% FLOPs saving for GPT2 and 25.0\% for TinyLlama without an increase in loss on the pre-training dataset. Moreover, EST leads to performance improvements in downstream tasks, indicating that it benefits generalization. Additionally, we provide intuitive theoretical studies based on training dynamics and Dropout theory to ensure the feasibility of EST. Our code is available at https://github.com/OpenDFM/EST.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube