HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models (2409.18893v1)

Published 27 Sep 2024 in cs.LG

Abstract: Model merging is a technique that combines multiple large pretrained models into a single model with enhanced performance and broader task adaptability. It has gained popularity in large pretrained model development due to its ability to bypass the need for original training data and further training processes. However, most existing model merging approaches focus solely on exploring the parameter space, merging models with identical architectures. Merging within the architecture space, despite its potential, remains in its early stages due to the vast search space and the challenges of layer compatibility. This paper marks a significant advance toward more flexible and comprehensive model merging techniques by modeling the architecture-space merging process as a reinforcement learning task. We train policy and value networks using offline sampling of weight vectors, which are then employed for the online optimization of merging strategies. Moreover, a multi-objective optimization paradigm is introduced to accommodate users' diverse task preferences, learning the Pareto front of optimal models to offer customized merging suggestions. Experimental results across multiple tasks, including text translation, mathematical reasoning, and code generation, validate the effectiveness and superiority of the proposed framework in model merging. The code will be made publicly available after the review process.

Summary

The paper introduces a hierarchical multi-objective merging approach that optimizes both parameters and architectures using reinforcement learning.
It employs weight re-scaling and task arithmetic to resolve parameter conflicts and enhance model robustness across diverse tasks.
Experimental results demonstrate that HM3 outperforms traditional methods in translation accuracy, reasoning, and code generation efficiency.

Hierarchical Multi-Objective Model Merging for Pretrained Models

Introduction

The paper "HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models" presents a novel approach to model merging by introducing a hierarchical framework that explores both parameter and architecture spaces. Traditional methods focus primarily on the parameter space and merging models with identical architectures, limiting the flexibility and potential of model merging. This paper models the architecture-space merging process as a reinforcement learning (RL) task, enabling more diverse and efficient model combinations.

Methodology

Parameter-Level Optimization

In parameter-level optimization, HM3 utilizes weight vectors assigned to pretrained models tailored for different tasks. The approach involves leveraging established methods such as DARE and Ties Merging to optimize parameter configurations. The objective is to enhance the merged model's robustness by implementing a drop and rescale mechanism on delta parameters. This hierarchical approach ensures that multi-objective optimization is conducted efficiently, preserving the key features from multiple models via task arithmetic and resolving parameter conflicts post-merging.

Architecture-Level Optimization

The architecture-level optimization employs reinforcement learning to explore effective model architectures through layer-granularity search spaces. It involves restructuring the model across various architectural configurations to achieve optimal merging. The architecture optimization leverages a Markov decision process (MDP) for modelling, and proximal policy optimization (PPO) is utilized to explore potential architectural transformations. Importantly, the approach considers users' diverse preferences across multiple tasks via a multi-objective framework that determines Pareto-efficient model solutions.

Experimental Results

The HM3 was evaluated on a variety of tasks, including text translation, mathematical reasoning, and code generation. The results demonstrated significant improvements over traditional model merging methods. Notably, the merged models obtained with HM3 outperformed others in terms of translation accuracy and efficiencies in mathematical reasoning and code generation.

Significance and Implications

The hierarchical approach allows for model merging across both parameters and architectures, exceeding the limitations of traditional methods which are constrained by identical architectures. This is particularly beneficial in the context of LLMs and diverse task requirements. The proposed multi-objective optimization framework provides scalability and customizability for integrated model solutions.

Deploying HM3 in real-world scenarios could enhance the adaptability of AI systems across different domains and applications, making it possible to fine-tune models efficiently without the necessity for extensive training data or computational resources.

Future Directions

Expanding the HM3 methodology to accommodate larger and more complex models will be crucial for maintaining its efficacy as model scales increase. Likewise, integrating more sophisticated reinforcement learning strategies and fine-tuning the multi-objective optimization procedures can lead to even more robust model merging processes.

Conclusion

The hierarchical framework outlined in "HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models" represents a significant contribution to the field of model merging, providing a scalable, efficient, and user-centered approach to combining pre-trained models. By addressing both parameter and architecture spaces, and considering multi-objective scenarios, HM3 sets a new benchmark for flexibility and performance in model merging practices within AI development.