Parameter-Efficient Sparse Fine-Tuning
The substantial parameter count of LLMs like Falcon, LLaMA 2, and Mistral necessitates refined fine-tuning approaches that bypass the arduous task of updating the entirety of an LLM's parameters. Sparse Fine-Tuning (SFT) has been recognized for striking a balance between parameter economy and robust model performance. Despite this, memory demands expand in proportion to LLM size, limiting the scalability of SFT. Addressing this constraint, this work scales SFT to LLMs eyeing memory-efficient methods.
Memory-Efficient SFT
A novel iterative paradigm for SFT is introduced, which cycles through updating active parameter deltas, pruning indices based on delta magnitude change, and regrowth of indices. The regrowth employs criteria based on gradients or estimated momenta via SM3 optimizer, distinguishing the process from dense pretraining methods. These alterations yield a dense model from an operation standpoint, side-stepping inefficient sparse tensor operations commonly faced in hardware.
Experimental Validation
The efficiency of SFT is tested via instruction-tuning on standard dataset mixtures. Experiments show that SFT often outperforms established parameter-efficient fine-tuning methods such as Low-Rank Adaptation (LoRA), particularly in performance and runtime comparability. Compatibility with quantization and efficient optimizers is also demonstrated, scaling SFT to LLM sizes previously deemed impractical due to memory constraints.
Quantization and Efficiency Results
The introduction of quantization to SFT, denoted as "qSFT", underscores the method's adaptability to drastically memory-constrained environments. qSFT maintains competitive performance when pitted against 4-bit quantized LLMs fine-tuned using other techniques. The approach also showcases a symbiotic relationship with activation checkpointing, offering insights into prioritizing techniques to optimize the memory efficiency of LLMs.
This research sets the stage for SFT as a leading strategy both in parameter and memory efficiency for LLM adaptation. Further exploration could refine growth criteria and extend SFT's applicability across all model parameters, including embedding layers, marking a significant advancement in the field's ongoing refinement of LLM fine-tuning methodologies.