Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model Fusion via Optimal Transport (1910.05653v6)

Published 12 Oct 2019 in cs.LG and stat.ML

Abstract: Combining different models is a widely used paradigm in machine learning applications. While the most common approach is to form an ensemble of models and average their individual predictions, this approach is often rendered infeasible by given resource constraints in terms of memory and computation, which grow linearly with the number of models. We present a layer-wise model fusion algorithm for neural networks that utilizes optimal transport to (soft-) align neurons across the models before averaging their associated parameters. We show that this can successfully yield "one-shot" knowledge transfer (i.e, without requiring any retraining) between neural networks trained on heterogeneous non-i.i.d. data. In both i.i.d. and non-i.i.d. settings , we illustrate that our approach significantly outperforms vanilla averaging, as well as how it can serve as an efficient replacement for the ensemble with moderate fine-tuning, for standard convolutional networks (like VGG11), residual networks (like ResNet18), and multi-layer perceptrons on CIFAR10, CIFAR100, and MNIST. Finally, our approach also provides a principled way to combine the parameters of neural networks with different widths, and we explore its application for model compression. The code is available at the following link, https://github.com/sidak/otfusion.

Citations (198)

Summary

  • The paper presents a novel layer-wise model fusion using optimal transport to align neurons and merge parameters in a single-shot process.
  • The method outperforms traditional averaging, achieving higher accuracy on both i.i.d. and non-i.i.d. datasets across various architectures.
  • It offers practical benefits for model compression and federated learning by reducing computational overhead while preserving performance.

Model Fusion via Optimal Transport: A Comprehensive Overview

This paper presents a novel approach for combining neural network models using a framework based on optimal transport (OT) theory. Traditionally, the combination of models in machine learning is achieved via ensemble methods, where outputs of multiple models are averaged to improve prediction performance. However, ensemble methods typically incur high computational costs due to memory and runtime requirements, especially as the number of models increases. The authors propose an alternative called layer-wise model fusion, which fuses the parameters of neural networks through optimal transport, providing a viable solution for efficient model combination without continuous retraining.

Key Contributions

  1. Layer-wise Fusion Using Optimal Transport: The approach leverages optimal transport to align layers of neural networks, enabling the fusion of weights across models. By calculating the soft correspondence between neurons of different models, the method aligns neurons and averages parameters to form a single model, effectively transferring knowledge in a one-shot manner.
  2. Performance Across Disparate Data Distributions: The method is capable of fusing neural networks trained on both i.i.d. and non-i.i.d. data distributions, significantly outperforming vanilla averaging techniques. This indicates robustness even when the models were exposed to different training conditions or data characteristics.
  3. Applications in Model Compression and Federated Learning: The fusion method is particularly beneficial in scenarios where models differ in architecture, such as varying model widths, demonstrating usability in model compression tasks. It shows potential in federated learning, where direct data exchange is restricted due to privacy concerns, facilitating model exchange to reduce communication overhead.
  4. Experimental Results and Practical Implications: The fusion technique was evaluated using standard datasets such as MNIST, CIFAR10, and CIFAR100, showcasing improvements in tasks including structured pruning and model adaptation. Notably, the method provided significant accuracy improvements in scenarios requiring model compression and adaptation, such as conserving the performance of pruned networks.

Strong Numerical Results and Claims

  • In empirical studies, the method consistently showed enhanced performance compared to simpler averaging techniques. For instance, the fused models obtained higher test accuracies across various network architectures like VGG11 and ResNet18, thereby providing a more efficient substitute for ensemble methods.
  • The paper presents a compelling case for structured pruning, where pruned models regain performance through fusion with pre-pruned models, achieving up to 90% sparsity levels with minimal loss in accuracy.

Implications and Future Directions

The introduction of optimal transport for parameter alignment and fusion in neural networks presents a transformative advancement in how models can be efficiently combined, especially in resource-constrained environments. This paradigm shift from data exchange to model exchange could spur advancements in privacy-preserving technologies such as federated learning.

Furthermore, the exploration of OT fusion extends to more complex architectures and tasks, encompassing sophisticated scenarios such as continual learning and generative models. Future work could focus on enhancing the scalability of the approach and further reducing computational overhead, particularly when integrating models with different depths or topologies.

In conclusion, the proposed model fusion via optimal transport demonstrates a promising direction for enhancing machine learning models' efficiency and adaptability while overcoming the limitations imposed by traditional ensemble methods. Its applications in diverse fields highlight the versatile nature and potential for future impact in both theoretical and applied AI research arenas.

Github Logo Streamline Icon: https://streamlinehq.com