Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 99 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 36 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 461 tok/s Pro
Kimi K2 191 tok/s Pro
2000 character limit reached

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling (2406.11617v1)

Published 17 Jun 2024 in cs.CL

Abstract: With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP), DELLA shows an average improvement of 2.4 points over baseline methods employing delta parameter pruning (an improvement of 3.6 points over TIES, 1.2 points over DARE), and 11.1 points over the no-pruning baseline (TA). We release the source code at: https://github.com/declare-lab/della.

Citations (13)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces Della-Merging, a novel method that reduces interference in model merging by leveraging magnitude-based pruning and a rescaling operation.
  • It employs a systematic three-step process—Drop, Elect, and Fuse—to prune lower magnitude parameters and integrate selected parameters effectively.
  • Empirical evaluations show an average improvement of 2.4 points over state-of-the-art methods, underscoring its efficiency in resource-constrained settings.

Della-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

The paper "DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling" addresses the emerging challenge of model merging techniques, which combine the capabilities of multiple models into a multitask model without incurring additional training costs. The authors introduce a novel technique named Della-Merging, which leverages a pruning strategy termed MagPrune. This method prioritizes parameters based on their magnitude, assigning higher dropout probabilities to those with lower magnitudes. The pruning is followed by a rescaling operation to approximate the original embeddings effectively.

Summary of Approach

The essence of Della-Merging lies in its three-step process: Drop, Elect, and Fuse.

  1. Drop: This step is critical to reduce interference among model parameters by employing MagPrune. Parameters are ranked by magnitude, with higher dropout probabilities assigned to lower-ranked (lower magnitude) parameters. Subsequently, parameters that withstand the dropout are rescaled using a factor of $1/(1-p)$.
  2. Elect: In this phase, the method further mitigates interference by electing delta parameters through a sign-based selection mechanism that minimizes directional conflicts.
  3. Fuse: This final stage consolidates the selected parameters through element-wise addition, integrating model parameters into a merged model.

Results and Performance

The empirical evaluation is comprehensive, involving three expert models (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP). Della demonstrates a notable average improvement of 2.4 points over existing merging methods that use delta parameter pruning, achieving higher performance (3.6 points over Ties and 1.2 points over Dare). It also surpasses the no-pruning baseline, with a marked improvement of 11.1 points.

Implications and Future Directions

The Della-Merging technique represents a significant advancement in the field of model merging by effectively utilizing magnitude-based pruning to reduce parameter interference. Its success in outperforming state-of-the-art approaches such as Dare and Ties suggests profound implications for optimizing model merging strategies, particularly in settings where conserving computational resources and storage is vital.

The methodology's adaptability is noteworthy, as Della is structured to encompass not only current methods (NoDrop, Dare, Ties) but also offers new configurations via its versatile parameter settings. This adaptability could potentially be extended to heterogeneous model backbones, presenting a promising avenue for future exploration.

From a theoretical standpoint, the approach underscores the utility of magnitude as a metric for parameter pruning, prompting further research into how parameter importance can be quantitatively assessed and leveraged across various tasks.

Conclusion

Overall, Della-Merging, with its introduction of MagPrune, is a robust addition to the toolkit of model merging strategies. It adeptly preserves individual model performance post-merging, making it a valuable contribution to the resource-efficient deployment of multitask models. Continued exploration in this direction could yield even more refined techniques for handling parameter inference and model integration, particularly as the scale and complexity of models continue to grow.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

Youtube Logo Streamline Icon: https://streamlinehq.com