Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training-Free Pretrained Model Merging (2403.01753v3)

Published 4 Mar 2024 in cs.CV

Abstract: Recently, model merging techniques have surfaced as a solution to combine multiple single-talent models into a single multi-talent model. However, previous endeavors in this field have either necessitated additional training or fine-tuning processes, or require that the models possess the same pre-trained initialization. In this work, we identify a common drawback in prior works w.r.t. the inconsistency of unit similarity in the weight space and the activation space. To address this inconsistency, we propose an innovative model merging framework, coined as merging under dual-space constraints (MuDSC). Specifically, instead of solely maximizing the objective of a single space, we advocate for the exploration of permutation matrices situated in a region with a unified high similarity in the dual space, achieved through the linear combination of activation and weight similarity matrices. In order to enhance usability, we have also incorporated adaptations for group structure, including Multi-Head Attention and Group Normalization. Comprehensive experimental comparisons demonstrate that MuDSC can significantly boost the performance of merged models with various task combinations and architectures. Furthermore, the visualization of the merged model within the multi-task loss landscape reveals that MuDSC enables the merged model to reside in the overlapping segment, featuring a unified lower loss for each task. Our code is publicly available at https://github.com/zju-vipa/training_free_model_merging.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (63)
  1. Git re-basin: Merging models modulo permutation symmetries. In The Eleventh International Conference on Learning Representations, 2023.
  2. Revisiting model stitching to compare neural representations. Advances in neural information processing systems, 34:225–236, 2021.
  3. Random initialisations performing above chance and how to find them. arXiv preprint arXiv:2209.07509, 2022.
  4. Weight-space symmetry in deep networks gives rise to permutation saddles, connected by equal-loss valleys across the loss landscape. arXiv preprint arXiv:1907.02911, 2019.
  5. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9650–9660, 2021.
  6. Similarity and matching of neural network representations. Advances in Neural Information Processing Systems, 34:5656–5668, 2021.
  7. Class-incremental learning via knowledge amalgamation. In Machine Learning and Knowledge Discovery in Databases, pages 36–50, Cham, 2023. Springer Nature Switzerland.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255, 2009.
  9. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021.
  10. The role of permutation invariance in linear mode connectivity of neural networks. arXiv preprint arXiv:2110.06296, 2021.
  11. Efficiently identifying task groupings for multi-task learning. Advances in Neural Information Processing Systems, 34:27503–27516, 2021.
  12. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
  13. Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural Networks, 13(3):317–327, 2000.
  14. Contrastive knowledge amalgamation for unsupervised image classification. In Artificial Neural Networks and Machine Learning – ICANN 2023, pages 192–204, Cham, 2023. Springer Nature Switzerland.
  15. Hidden symmetries of ReLU networks. In Proceedings of the 40th International Conference on Machine Learning, pages 11734–11760. PMLR, 2023.
  16. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  17. Robert Hecht-Nielsen. On the algebraic structure of feedforward network weight spaces. In Advanced Neural Computers, pages 129–135. North-Holland, Amsterdam, 1990.
  18. Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations, 2023.
  19. Dataless knowledge fusion by merging weights of language models. In The Eleventh International Conference on Learning Representations, 2023.
  20. Amalgamating knowledge from heterogeneous graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  21. Deep graph reprogramming. In CVPR, 2023.
  22. REPAIR: REnormalizing permuted activations for interpolation repair. In The Eleventh International Conference on Learning Representations, 2023.
  23. Learning multiple layers of features from tiny images. 2009.
  24. Model fusion for personalized learning. In Proceedings of the 38th International Conference on Machine Learning, pages 5948–5958. PMLR, 2021.
  25. Mnist handwritten digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
  26. Understanding image representations by measuring their equivariance and equivalence. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 991–999, 2015.
  27. Deep model fusion: A survey. arXiv preprint arXiv:2309.15698, 2023.
  28. Convergent learning: Do different neural networks learn the same representations? arXiv preprint arXiv:1511.07543, 2015.
  29. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021.
  30. Knowledge amalgamation from heterogeneous networks by common feature learning. In Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), 2019.
  31. Collaboration by competition: Self-coordinated knowledge amalgamation for multi-talent student learning. In European Conference on Computer Vision, 2020.
  32. Principal components analysis (pca). Computers & Geosciences, 19(3):303–342, 1993.
  33. On cross-layer alignment for model fusion of heterogeneous neural networks. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5, 2023.
  34. Stitchable neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16102–16112, 2023.
  35. Re-basin via implicit sinkhorn differentiation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20237–20246, 2023.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  37. Amalgamating knowledge towards comprehensive classification. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3068–3075, 2019a.
  38. Customizing student networks from heterogeneous teachers via adaptive knowledge amalgamation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3504–3513, 2019b.
  39. Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances. In International Conference on Machine Learning, pages 9722–9732. PMLR, 2021.
  40. Model fusion via optimal transport. Advances in Neural Information Processing Systems, 33, 2020a.
  41. Model fusion via optimal transport. Advances in Neural Information Processing Systems, 33, 2020b.
  42. Modelgif: Gradient fields for model functional distance. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
  43. Zipit! merging models from different tasks without training, 2023.
  44. Optimizing mode connectivity via neuron alignment. Advances in Neural Information Processing Systems, 33:15300–15311, 2020.
  45. Semi-supervised knowledge amalgamation for sequence classification. Proceedings of the AAAI Conference on Artificial Intelligence, 35(11):9859–9867, 2021.
  46. Knowledge amalgamation for multi-label classification via label dependency transfer. 37:9980–9988, 2023.
  47. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  48. Federated learning with matched averaging. In International Conference on Learning Representations, 2020.
  49. Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
  50. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  51. Probabilistic fusion of neural networks that incorporates global information. In Proceedings of The 14th Asian Conference on Machine Learning, pages 1149–1164. PMLR, 2023.
  52. Resolving interference when merging models. In NeurIPS, New Orleans, USA, 2023. Proceedings of Machine Learning Research.
  53. Adamerging: Adaptive model merging for multi-task learning. arXiv preprint arXiv:2310.02575, 2023.
  54. Factorizing knowledge in neural networks. In European Conference on Computer Vision, pages 73–91. Springer, 2022a.
  55. Deep model reassembly. Advances in neural information processing systems, 35:25739–25753, 2022b.
  56. Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2829–2838, 2019a.
  57. Amalgamating filtered knowledge: Learning task-customized student from multi-task teachers. In International Joint Conference on Artificial Intelligence, 2019b.
  58. Data-free knowledge amalgamation via group-stack dual-gan. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12516–12525, 2020.
  59. Bayesian nonparametric federated learning of neural networks. In Proceedings of the 36th International Conference on Machine Learning, pages 7252–7261. PMLR, 2019.
  60. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  61. Knowledge amalgamation for object detection with transformers. IEEE Transactions on Image Processing, 32:2093–2106, 2023a.
  62. Composing parameter-efficient modules with arithmetic operations. arXiv preprint arXiv:2306.14870, 2023b.
  63. A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 34(12):5586–5609, 2021.
Citations (8)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com