Model merging with SVD to tie the Knots (2410.19735v1)
Abstract: Recent model merging methods demonstrate that the parameters of fully-finetuned models specializing in distinct tasks can be combined into one model capable of solving all tasks without retraining. Yet, this success does not transfer well when merging LoRA finetuned models. We study this phenomenon and observe that the weights of LoRA finetuned models showcase a lower degree of alignment compared to their fully-finetuned counterparts. We hypothesize that improving this alignment is key to obtaining better LoRA model merges, and propose KnOTS to address this problem. KnOTS uses the SVD to jointly transform the weights of different LoRA models into an aligned space, where existing merging methods can be applied. In addition, we introduce a new benchmark that explicitly evaluates whether merged models are general models. Notably, KnOTS consistently improves LoRA merging by up to 4.3% across several vision and language benchmarks, including our new setting. We release our code at: https://github.com/gstoica27/KnOTS.
- Meta AI. Meta llama 3. https://llama.meta.com/llama3/, 2024.
- Git re-basin: Merging models modulo permutation symmetries. In ICLR, 2023.
- A large annotated corpus for learning natural language inference. In EMNLP, 2015.
- Robust weight signatures: gaining robustness as easy as patching weights? In ICML, 2023.
- Remote sensing image scene classification: Benchmark and state of the art. IEEE, 2017.
- Fusing finetuned models for better pretraining. arXiv:2204.03044, 2022.
- Where to start? analyzing the potential value of intermediate models. In EMNLP, 2023.
- Describing textures in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2014.
- Model merging by uncertainty-based gradient matching. In ICLR, 2024.
- ColD fusion: Collaborative descent for distributed multitask finetuning. In ACL, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Essentially no barriers in neural network energy landscape. In ICML, 2018.
- The role of permutation invariance in linear mode connectivity of neural networks. In ICLR, 2022.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. In ICLR, 2019.
- Loss surfaces, mode connectivity, and fast ensembling of dnns. NeurIPS, 2018.
- Eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. JSTARS, 2019.
- Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM TASLP, 2021.
- LoRA: Low-rank adaptation of large language models. In ICLR, 2022.
- Lorahub: Efficient cross-task generalization via dynamic lora composition. arXiv:2307.13269, 2023.
- Editing models with task arithmetic. In ICLR, 2023.
- Dataless knowledge fusion by merging weights of language models. In ICLR, 2023.
- Repair: Renormalizing permuted activations for interpolation repair. In ICLR, 2023.
- Towards consistent predictive confidence through fitted ensembles. In IJCNN, 2021.
- Scitail: A textual entailment dataset from science question answering. In AAAI, 2018.
- Similarity of neural network representations revisited. In ICML, 2019.
- 3d object representations for fine-grained categorization. In IEEE 3D Representation and Recognition Workshop ICCV, 2013.
- Yann LeCun. The mnist database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
- SGDR: Stochastic gradient descent with warm restarts. In ICLR, 2017.
- Decoupled weight decay regularization. In ICLR, 2019.
- A SICK cure for the evaluation of compositional distributional semantic models. In Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (eds.), LREC, 2014.
- Merging models with fisher-weighted averaging. In NeurIPS, 2022.
- Reading digits in natural images with unsupervised feature learning. In Deep learning and unsupervised feature learning workshop NeurIPS, 2011.
- What is being transferred in transfer learning? NeurIPS, 2020.
- DINOv2: Learning robust visual features without supervision. TMLR, 2024.
- Task arithmetic in the tangent space: Improved editing of pre-trained models. NeurIPS, 2024.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Colin Raffel. Building machine learning models like open source software. Communications of the ACM, 2023.
- Diverse weight averaging for out-of-distribution generalization. NeurIPS, 2022.
- WARM: On the benefits of weight averaged reward models. In ICML, 2024.
- Sentence-bert: Sentence embeddings using siamese bert-networks. In ENLP, 2019.
- Ziplora: Any subject in any style by effectively merging loras. In ECCV, 2024.
- Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances. In ICML, 2021.
- The german traffic sign recognition benchmark: a multi-class classification competition. In IJCNN, 2011.
- Zipit! merging models from different tasks without training. In ICLR, 2024.
- Parameter-efficient multi-task model fusion with partial linearization. In ICLR, 2024.
- Llama: Open and efficient foundation language models. arXiv:2302.13971, 2023.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. In ICLR, 2019.
- A broad-coverage challenge corpus for sentence understanding through inference. In NAACL, 2018.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In ICML, 2022a.
- Robust fine-tuning of zero-shot models. In CVPR, 2022b.
- SUN database: Exploring a large collection of scene categories. IJCV, 2016.
- Fast algorithms for singular value decomposition and the inverse of nearly low-rank matrices. NSR, 2023.
- Ties-merging: resolving interference when merging models. In NeurIPS, 2023.
- Language models are super mario: Absorbing abilities from homologous models as a free lunch. In ICML, 2024.
- Fuse to forget: Bias reduction and selective memorization through model fusion. arXiv:2311.07682, 2023.