LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs (2404.10933v1)

Published 16 Apr 2024 in cs.AI, cs.CL, and cs.LG

Abstract: Fine-tuning pre-trained LLMs with limited hardware presents challenges due to GPU memory constraints. Various distributed fine-tuning methods have been proposed to alleviate memory constraints on GPU. However, determining the most effective method for achieving rapid fine-tuning while preventing GPU out-of-memory issues in a given environment remains unclear. To address this challenge, we introduce LLMem, a solution that estimates the GPU memory consumption when applying distributed fine-tuning methods across multiple GPUs and identifies the optimal method. We conduct GPU memory usage estimation prior to fine-tuning, leveraging the fundamental structure of transformer-based decoder models and the memory usage distribution of each method. Experimental results show that LLMem accurately estimates peak GPU memory usage on a single GPU, with error rates of up to 1.6%. Additionally, it shows an average error rate of 3.0% when applying distributed fine-tuning methods to LLMs with more than a billion parameters on multi-GPU setups.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (23)

Authors (7)

Taeho Kim (22 papers)
Yanming Wang (17 papers)
Vatshank Chaturvedi (2 papers)
Lokesh Gupta (1 paper)
Seyeon Kim (5 papers)
Yongin Kwon (10 papers)
Sangtae Ha (15 papers)

Citations (1)

View on Semantic Scholar

LLMem: Estimating GPU Memory Usage for Fine-Tuning Pre-Trained LLMs (2404.10933v1)

Related Papers