CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging (2505.06977v2)

Published 11 May 2025 in cs.AI and cs.LG

Abstract: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training. Existing state-of-the-art techniques, such as Task Arithmetic and its variants, merge models by accumulating task vectors -- the parameter differences between pretrained and finetuned models. However, task vector accumulation is often hindered by knowledge conflicts, leading to performance degradation. To address this challenge, we propose Conflict-Aware Task Merging (CAT Merging), a novel training-free framework that selectively trims conflict-prone components from the task vectors. CAT Merging introduces several parameter-specific strategies, including projection for linear weights and masking for scaling and shifting parameters in normalization layers. Extensive experiments on vision, language, and vision-language tasks demonstrate that CAT Merging effectively suppresses knowledge conflicts, achieving average accuracy improvements of up to 2.5% (ViT-B/32) and 2.0% (ViT-L/14) over state-of-the-art methods.

Summary

The paper introduces CAT Merging, a novel training-free method for resolving conflicts and negative transfer when merging pre-trained task-specific models into a single multi-task model.
CAT Merging employs a consensus-based approach using quadratic optimization to find an optimal task vector that minimizes the Frobenius norm of prediction differences without requiring fine-tuning.
Empirical evaluation on ViT models for visual tasks shows CAT Merging achieves competitive performance, such as 82.7% average accuracy for ViT-B/32, often outperforming other training-free methods.

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

The paper "CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging" introduces a novel methodology aimed at addressing the challenge of negative transfer in multi-task learning by proposing a method for model merging without additional training. The primary focus of the paper lies in the mathematical formulation of merging several pre-trained task-specific models into a single multi-task model while minimizing conflict among tasks.

Objective and Methodology

The paper leverages the concept of consensus-based techniques to handle task interference during the merging process. The proposed model merging methodology involves finding an optimal task vector that minimizes the Frobenius norm of differences between model predictions before and after merging. The process is explicitly defined without relying on fine-tuning processes, making it viable for scenarios with limited computational resources or unlabeled data.

The optimization function central to their approach iteratively computes the transformation necessary to merge individual task models, minimizing the fusion discrepancies through quadratic optimization. The proof sections in the paper systematically depict the simplification of this objective function, expanding the Frobenius norm and extracting the gradients to arrive at a solution iteratively.

Results

The empirical evaluation demonstrates the efficacy of CAT Merging using Vision Transformer (ViT) models over multiple visual tasks. The results are comprehensively tabulated, revealing comparative performance across several merging methodologies. A notable comparison is made between training-free methods like Weight Averaging and Fisher Merging with test-time training-based methods such as AdaMerging.

For ViT-B/32 models, CAT Merging achieves an average accuracy of 82.7%, outperforming several baseline models while without claiming superiority in the "#best" category across datasets. This competitive performance extends to ViT-L/14 models, with CAT Merging reaching an average accuracy of 90.5%. These results validate the premise that the proposed approach can capture task commonalities and divergences efficiently without incurring the cost of further training.

Implications and Future Directions

The implications of the CAT Merging approach are substantial in practical scenarios involving integration of models trained on disparate tasks. Such capabilities are particularly relevant in deploying AI systems in production environments where retraining is impracticable due to time or data constraints. Additionally, training-free methods align well with the real-time adaptability needs in edge computing applications.

Theoretically, the proposed method expands the scope of model integration strategies beyond traditional multi-task learning, presenting avenues to explore consensus-based model fusions in various domains beyond computer vision.

Future research could explore extending this approach to sequential models and further improving conflict resolution mechanisms, potentially enhancing their applicability in scenarios with broader task complexities. Integrating domain-specific strategies could refine the accuracy and efficiency of training-free model fusions. Moreover, experimentation with other transformer-based architectures and exploring the method's effectivity in NLP tasks could provide insightful evolutions of the framework.

Overall, CAT Merging presents a distinctive advancement in simplifying model merging challenges while maintaining efficient task performance, contributing substantively to the field of multi-task learning in AI.

YouTube

Show All Videos