STEP: Staged Parameter-Efficient Pre-training for Large Language Models (2504.04151v1)

Published 5 Apr 2025 in cs.CL

Abstract: Pre-training LLMs faces significant memory challenges due to the large size of model parameters. We introduce STaged parameter-Efficient Pre-training (STEP), which integrates parameter-efficient tuning techniques with model growth. We conduct experiments on pre-training LLMs of various sizes and demonstrate that STEP achieves up to a 53.9% reduction in maximum memory requirements compared to vanilla pre-training while maintaining equivalent performance. Furthermore, we show that the model by STEP performs comparably to vanilla pre-trained models on downstream tasks after instruction tuning.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

STEP: Staged Parameter-Efficient Pre-training for Large Language Models (2504.04151v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Authors (3)

Don't miss out on important new AI/ML research

STEP: Staged Parameter-Efficient Pre-training for Large Language Models (2504.04151v1)

Collections

Summary

Paper Prompts

Follow-up Questions

Related Papers

Authors (3)

Don't miss out on important new AI/ML research