Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
122 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
48 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
3 tokens/sec
DeepSeek R1 via Azure Pro
55 tokens/sec
2000 character limit reached

Model merging with SVD to tie the Knots (2410.19735v1)

Published 25 Oct 2024 in cs.CV

Abstract: Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts. We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied. In addition, we introduce a new benchmark that explicitly evaluates whether merged models are general models. Notably, KnOTS consistently improves LoRA merging by up to 4.3% across several vision and language benchmarks, including our new setting. We release our code at: https://github.com/gstoica27/KnOTS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Meta AI. Meta llama 3. https://llama.meta.com/llama3/, 2024.
  2. Git re-basin: Merging models modulo permutation symmetries. In ICLR, 2023.
  3. A large annotated corpus for learning natural language inference. In EMNLP, 2015.
  4. Robust weight signatures: gaining robustness as easy as patching weights? In ICML, 2023.
  5. Remote sensing image scene classification: Benchmark and state of the art. IEEE, 2017.
  6. Fusing finetuned models for better pretraining. arXiv:2204.03044, 2022.
  7. Where to start? analyzing the potential value of intermediate models. In EMNLP, 2023.
  8. Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
  9. Model merging by uncertainty-based gradient matching. In ICLR, 2024.
  10. ColD fusion: Collaborative descent for distributed multitask finetuning. In ACL, 2023.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  12. Essentially no barriers in neural network energy landscape. In ICML, 2018.
  13. The role of permutation invariance in linear mode connectivity of neural networks. In ICLR, 2022.
  14. The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR, 2019.
  15. Loss surfaces, mode connectivity, and fast ensembling of dnns. NeurIPS, 2018.
  16. Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. JSTARS, 2019.
  17. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM TASLP, 2021.
  18. LoRA: Low-rank adaptation of large language models. In ICLR, 2022.
  19. Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv:2307.13269, 2023.
  20. Editing models with task arithmetic. In ICLR, 2023.
  21. Dataless knowledge fusion by merging weights of language models. In ICLR, 2023.
  22. Repair: Renormalizing permuted activations for interpolation repair. In ICLR, 2023.
  23. Towards consistent predictive confidence through fitted ensembles. In IJCNN, 2021.
  24. Scitail: A textual entailment dataset from science question answering. In AAAI, 2018.
  25. Similarity of neural network representations revisited. In ICML, 2019.
  26. 3d object representations for fine-grained categorization. In IEEE 3D Representation and Recognition Workshop ICCV, 2013.
  27. Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
  28. SGDR: Stochastic gradient descent with warm restarts. In ICLR, 2017.
  29. Decoupled weight decay regularization. In ICLR, 2019.
  30. A SICK cure for the evaluation of compositional distributional semantic models. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (eds.), LREC, 2014.
  31. Merging models with fisher-weighted averaging. In NeurIPS, 2022.
  32. Reading digits in natural images with unsupervised feature learning. In Deep learning and unsupervised feature learning workshop NeurIPS, 2011.
  33. What is being transferred in transfer learning? NeurIPS, 2020.
  34. DINOv2: Learning robust visual features without supervision. TMLR, 2024.
  35. Task arithmetic in the tangent space: Improved editing of pre-trained models. NeurIPS, 2024.
  36. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
  37. Learning transferable visual models from natural language supervision. In ICML, 2021.
  38. Colin Raffel. Building machine learning models like open source software. Communications of the ACM, 2023.
  39. Diverse weight averaging for out-of-distribution generalization. NeurIPS, 2022.
  40. WARM: On the benefits of weight averaged reward models. In ICML, 2024.
  41. Sentence-bert: Sentence embeddings using siamese bert-networks. In ENLP, 2019.
  42. Ziplora: Any subject in any style by effectively merging loras. In ECCV, 2024.
  43. Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances. In ICML, 2021.
  44. The german traffic sign recognition benchmark: a multi-class classification competition. In IJCNN, 2011.
  45. Zipit! merging models from different tasks without training. In ICLR, 2024.
  46. Parameter-efficient multi-task model fusion with partial linearization. In ICLR, 2024.
  47. Llama: Open and efficient foundation language models. arXiv:2302.13971, 2023.
  48. Glue: A multi-task benchmark and analysis platform for natural language understanding. In ICLR, 2019.
  49. A broad-coverage challenge corpus for sentence understanding through inference. In NAACL, 2018.
  50. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML, 2022a.
  51. Robust fine-tuning of zero-shot models. In CVPR, 2022b.
  52. SUN database: Exploring a large collection of scene categories. IJCV, 2016.
  53. Fast algorithms for singular value decomposition and the inverse of nearly low-rank matrices. NSR, 2023.
  54. Ties-merging: resolving interference when merging models. In NeurIPS, 2023.
  55. Language models are super mario: Absorbing abilities from homologous models as a free lunch. In ICML, 2024.
  56. Fuse to forget: Bias reduction and selective memorization through model fusion. arXiv:2311.07682, 2023.
Citations (1)

Summary

  • The paper presents KnOTS, a novel technique leveraging SVD to transform LoRA weight updates for improved model merging.
  • It enhances merging accuracy by aligning models in a shared representation space, achieving up to 4.3% improvement on key benchmarks.
  • The joint-task evaluation demonstrates KnOTS' scalability and robustness across diverse datasets and architectures.

Model Merging with SVD: Enhancing LoRA Alignment

Introduction

The concept of model merging has gained traction as a technique to consolidate the abilities of multiple task-specific models into a single multitask model. While effective for fully-finetuned models, these methods often falter when applied to Low-Rank Adaptation (LoRA) finetuned models due to poor parameter alignment. The paper introduces "KnOTS" (Knowledge Orientation Through SVD), leveraging Singular Value Decomposition (SVD) to align the weights of LoRA models, facilitating improved model merging.

Key Contributions

  1. LoRA Alignment with SVD: The authors identified that LoRA finetuned models exhibit lower alignment compared to fully-finetuned models, as evidenced by diminished pairwise centered kernel alignment (CKA) scores. KnOTS uses SVD to transform task-specific updates into a shared representation space, enhancing the alignment of these models.
  2. Improved Merging Performance: KnOTS augments existing merging algorithms by aligning models in a common space, yielding up to 4.3% improvement in accuracy across vision and language benchmarks.
  3. Joint-Task Evaluation: A novel benchmark that assesses a merged model's generality by evaluating on the union of inputs and labels across multiple datasets, advancing the understanding of merging efficacy in more comprehensive settings.

Methodology

KnOTS operationalizes its model merging improvements through a layer-wise application of SVD on LoRA weight updates. This transforms the updates into a more aligned space by decomposing them into shared UΣU\Sigma terms and task-specific VV terms. KnOTS then applies existing merging methods such as Task Arithmetic (TA) and TIES to these aligned parameters, creating a coherent merged model.

Experimental Results

  1. Vision and Language Tasks: KnOTS consistently outperformed traditional merging methods across various datasets, achieving higher normalized accuracy. For instance, merging eight models finetuned with LoRA on distinct image classification tasks using KnOTS-TIES surpassed baseline accuracy by 4.3%.
  2. Scalability with Model Size: On larger models like ViT-L/14 and LLaMA3-8B, KnOTS maintained or enhanced effectiveness, suggesting robustness and scalability across different model architectures and sizes.
  3. Positive Impact on Joint-Task Performance: In the newly introduced joint-task setting, KnOTS demonstrated superior performance in Hits@kk metrics, showcasing its capability in forming general models capable of working beyond task-specific constraints.

Implications and Future Work

KnOTS signifies a methodological advancement in efficiently merging LoRA finetuned models without the need for additional finetuning. This contribution holds promise not only for improving model merging practices but also for encouraging new developments in creating more generalizable AI systems. Future research could explore extending KnOTS to other parameter-efficient finetuning methods and further refine the joint-task evaluation framework for broader applicability.

In conclusion, the KnOTS approach provides a substantial enhancement to the model merging landscape by effectively addressing the alignment issues inherent in LoRA models. Its adaptability across tasks and architectures underscores its potential impact on the development of more integrated AI systems.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.