Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 88 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 33 tok/s
GPT-5 High 38 tok/s Pro
GPT-4o 85 tok/s
GPT OSS 120B 468 tok/s Pro
Kimi K2 203 tok/s Pro
2000 character limit reached

FedMD: Heterogenous Federated Learning via Model Distillation (1910.03581v1)

Published 8 Oct 2019 in cs.LG and stat.ML

Abstract: Federated learning enables the creation of a powerful centralized model without compromising data privacy of multiple participants. While successful, it does not incorporate the case where each participant independently designs its own model. Due to intellectual property concerns and heterogeneous nature of tasks and data, this is a widespread requirement in applications of federated learning to areas such as health care and AI as a service. In this work, we use transfer learning and knowledge distillation to develop a universal framework that enables federated learning when each agent owns not only their private data, but also uniquely designed models. We test our framework on the MNIST/FEMNIST dataset and the CIFAR10/CIFAR100 dataset and observe fast improvement across all participating models. With 10 distinct participants, the final test accuracy of each model on average receives a 20% gain on top of what's possible without collaboration and is only a few percent lower than the performance each model would have obtained if all private datasets were pooled and made directly available for all participants.

Citations (712)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces FedMD, a framework that enables independently designed models to collaboratively improve through transfer learning and knowledge distillation.
  • The methodology achieves about a 20% accuracy boost by aggregating class scores on public data, addressing challenges of data and architectural heterogeneity.
  • FedMD paves the way for scalable federated learning applications, maintaining intellectual property while enhancing performance in both i.i.d. and non-i.i.d. settings.

FedMD: Heterogeneous Federated Learning via Model Distillation

The paper "FedMD: Heterogeneous Federated Learning via Model Distillation" introduces a novel federated learning framework that enables the creation of a centralized model from heterogeneous models owned by independent participants. This addresses the critical challenge of heterogeneity in model architecture and data distribution, primarily occurring in applications such as healthcare and AI services.

Overview

Traditional federated learning frameworks require participants to agree on a common model architecture, which limits flexibility and applicability in real-world scenarios where participants may have different computational resources and proprietary models. The paper proposes "FedMD," a framework that leverages transfer learning and knowledge distillation to allow each participant to retain its uniquely designed model while contributing to a collective learning process.

Methodology

The framework, FedMD, involves several key components:

  • Transfer Learning: Each participant initially trains its model using a public dataset and subsequently fine-tunes it with its private dataset. This establishes a baseline performance before collaborative training.
  • Communication Protocol: Participants share class scores computed on the public dataset to a central server. The server aggregates these scores to form a consensus model output, which participants then use to update their models via knowledge distillation.
  • Iterative Collaboration: Throughout the process, participants iteratively refine their models by aligning with the consensus and revisiting their private data.

This approach is validated through experimentation on datasets such as MNIST/FEMNIST and CIFAR10/CIFAR100, demonstrating a significant improvement in the accuracy of the individual models—by approximately 20% on average—compared to results without collaboration.

Results and Implications

The experimental results reveal that FedMD successfully enables heterogeneous models to achieve performance levels nearing that of a pooled, centralized dataset. Specifically, in both i.i.d. and non-i.i.d. settings, the federated algorithm allows participants to maintain intellectual property autonomy while benefiting from collective model enhancements.

Notably, FedMD provides a practical solution to the statistical heterogeneity challenge where data distributions vary significantly among participants. By supporting model independence and customization, this framework opens up new avenues for federated learning applications across diverse sectors, particularly those impacted by data privacy and intellectual property concerns.

Future Directions

The paper hints at several potential directions for future research:

  • Advanced Communication Protocols: Implementing feature transformations or emergent communication protocols could further enhance the efficiency and effectiveness of the model collaboration process.
  • Extension to Diverse Tasks: While the paper focuses predominantly on classification tasks, extending this framework to natural language processing and reinforcement learning scenarios would broaden its applicability.
  • Handling Extreme Heterogeneity: Future work may explore aligning the framework with extreme cases involving vast discrepancies in data volume, model capacity, and task nature.

Conclusion

FedMD represents a significant step toward enabling heterogeneity in federated learning. By combining transfer learning and knowledge distillation, the framework mitigates the limitations of traditional approaches, allowing independent models to collaboratively enhance their performance without compromising autonomy or data privacy. This methodology is poised to be an essential tool in the evolution of AI services, particularly in environments requiring robust and adaptable learning systems.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (2)