Papers
Topics
Authors
Recent
Search
2000 character limit reached

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

Published 11 Dec 2023 in cs.LG | (2312.06173v1)

Abstract: Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem:770–778, 2016. ISSN 10636919. doi:10.1109/CVPR.2016.90.
  2. Masked Autoencoders Are Scalable Vision Learners, December 2021. URL http://arxiv.org/abs/2111.06377.
  3. Language Models are Unsupervised Multitask Learners. 1:9, 2019.
  4. Scaling Instruction-Finetuned Language Models, December 2022. URL http://arxiv.org/abs/2210.11416.
  5. Conformer: Convolution-augmented Transformer for Speech Recognition, May 2020. URL http://arxiv.org/abs/2005.08100.
  6. Deep Model Fusion: A Survey, September 2023. URL http://arxiv.org/abs/2309.15698.
  7. Learn From Model Beyond Fine-Tuning: A Survey, October 2023. URL http://arxiv.org/abs/2310.08184.
  8. Heterogeneous model reuse via optimizing multiparty multiclass margin. 36th International Conference on Machine Learning, ICML 2019, 2019-June:11862–11871, 2019.
  9. Improving Heterogeneous Model Reuse by Density Estimation. In Thirty-Second International Joint Conference on Artificial Intelligence, volume 4, pages 4244–4252, August 2023a. doi:10.24963/ijcai.2023/472. URL https://www.ijcai.org/proceedings/2023/472.
  10. Editing Models with Task Arithmetic, March 2023. URL http://arxiv.org/abs/2212.04089.
  11. Resolving Interference When Merging Models, June 2023. URL http://arxiv.org/abs/2306.01708.
  12. Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch, November 2023. URL http://arxiv.org/abs/2311.03099.
  13. C. Daniel Freeman and Joan Bruna. Topology and geometry of half-rectified network optimization: 5th International Conference on Learning Representations, ICLR 2017. 2017. URL http://www.scopus.com/inward/record.url?scp=85064823226&partnerID=8YFLogxK.
  14. Uniform convergence may be unable to explain generalization in deep learning. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/hash/05e97c207235d63ceb1db43c60db7bbb-Abstract.html.
  15. Essentially No Barriers in Neural Network Energy Landscape, February 2019. URL http://arxiv.org/abs/1803.00885.
  16. The Role of Permutation Invariance in Linear Mode Connectivity of Neural Networks, July 2022. URL http://arxiv.org/abs/2110.06296.
  17. Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs, October 2018. URL http://arxiv.org/abs/1802.10026.
  18. Optimizing Mode Connectivity via Neuron Alignment. In Advances in Neural Information Processing Systems, volume 33, pages 15300–15311. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/aecad42329922dfc97eee948606e1f8e-Abstract.html.
  19. On Convexity and Linear Mode Connectivity in Neural Networks. OPT2022: 14th Annual Workshop on Optimization for Machine Learning, 2022.
  20. Merging Models with Fisher-Weighted Averaging, August 2022. URL http://arxiv.org/abs/2111.09832.
  21. Model soups: Averaging weights of multiple fine-tuned models improves accuracy without increasing inference time, July 2022. URL http://arxiv.org/abs/2203.05482.
  22. Jean Kaddour. Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging, October 2022. URL http://arxiv.org/abs/2209.14981.
  23. Averaging Weights Leads to Wider Optima and Better Generalization, February 2019. URL http://arxiv.org/abs/1803.05407.
  24. Convergent Learning: Do different neural networks learn the same representations?, February 2016. URL http://arxiv.org/abs/1511.07543.
  25. ZipIt! Merging Models from Different Tasks without Training, May 2023. URL http://arxiv.org/abs/2305.03053.
  26. Dataless Knowledge Fusion by Merging Weights of Language Models, April 2023. URL http://arxiv.org/abs/2212.09849.
  27. Git Re-Basin: Merging Models modulo Permutation Symmetries, March 2023. URL http://arxiv.org/abs/2209.04836.
  28. AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models, March 2023. URL http://arxiv.org/abs/2302.07027.
  29. LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition, July 2023. URL http://arxiv.org/abs/2307.13269.
  30. Composing Parameter-Efficient Modules with Arithmetic Operations, June 2023. URL http://arxiv.org/abs/2306.14870.
  31. π𝜋\piitalic_π-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation, May 2023. URL http://arxiv.org/abs/2304.14381.
  32. Parameter Efficient Multi-task Model Fusion with Partial Linearization, October 2023b. URL http://arxiv.org/abs/2310.04742.
  33. AdaMerging: Adaptive Model Merging for Multi-Task Learning, October 2023. URL http://arxiv.org/abs/2310.02575.
  34. The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables, March 2017. URL http://arxiv.org/abs/1611.00712.
  35. Categorical Reparameterization with Gumbel-Softmax, August 2017. URL http://arxiv.org/abs/1611.01144.
  36. Bag of Tricks for Fully Test-Time Adaptation, October 2023. URL http://arxiv.org/abs/2310.02416.
  37. A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts, March 2023. URL http://arxiv.org/abs/2303.15361.
  38. Learning Transferable Visual Models From Natural Language Supervision, February 2021a. URL http://arxiv.org/abs/2103.00020. arXiv:2103.00020 [cs].
  39. SUN database: Large-scale scene recognition from abbey to zoo. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3485–3492, San Francisco, CA, USA, June 2010. IEEE. ISBN 978-1-4244-6984-0. doi:10.1109/CVPR.2010.5539970. URL http://ieeexplore.ieee.org/document/5539970/.
  40. 3D Object Representations for Fine-Grained Categorization. In 2013 IEEE International Conference on Computer Vision Workshops, pages 554–561, December 2013. doi:10.1109/ICCVW.2013.77. URL https://ieeexplore.ieee.org/document/6755945.
  41. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proceedings of the IEEE, 105(10):1865–1883, October 2017. ISSN 0018-9219, 1558-2256. doi:10.1109/JPROC.2017.2675998. URL http://arxiv.org/abs/1703.00121. arXiv:1703.00121 [cs].
  42. Introducing eurosat: A novel dataset and deep learning benchmark for land use and land cover classification. In IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pages 204–207. IEEE, 2018.
  43. Reading Digits in Natural Images with Unsupervised Feature Learning. 2021.
  44. Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323–332, August 2012. ISSN 0893-6080. doi:10.1016/j.neunet.2012.02.016. URL https://www.sciencedirect.com/science/article/pii/S0893608012000457.
  45. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. ISSN 00189219. doi:10.1109/5.726791. URL http://ieeexplore.ieee.org/document/726791/.
  46. Describing Textures in the Wild. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, pages 3606–3613, Columbus, OH, USA, June 2014. IEEE. ISBN 978-1-4799-5118-5. doi:10.1109/CVPR.2014.461. URL https://ieeexplore.ieee.org/document/6909856.
  47. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, Brussels, Belgium, 2018. Association for Computational Linguistics. doi:10.18653/v1/W18-5446. URL http://aclweb.org/anthology/W18-5446.
  48. E. J. Gumbel. Statistical Theory of Extreme Values and Some Practical Applications. A Series of Lectures. Technical Report PB175818, National Bureau of Standards, Washington, D. C. Applied Mathematics Div., 1954. URL https://ntrl.ntis.gov/NTRL/dashboard/searchResults/titleDetail/PB175818.xhtml.
  49. R. Duncan Luce. Individual Choice Behavior. Individual Choice Behavior. John Wiley, Oxford, England, 1959.
  50. A* sampling. Advances in neural information processing systems, 27, 2014.
  51. Learning Transferable Visual Models From Natural Language Supervision, February 2021b. URL http://arxiv.org/abs/2103.00020.
  52. LoRA: Low-Rank Adaptation of Large Language Models, October 2021. URL http://arxiv.org/abs/2106.09685.
Citations (14)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.