Papers
Topics
Authors
Recent
Search
2000 character limit reached

Federated Mutual Learning (FML)

Updated 1 July 2026
  • Federated Mutual Learning (FML) is a federated learning paradigm where clients share soft predictions to mutually distill knowledge across heterogeneous models and data.
  • It enhances generalization, personalization, and communication efficiency by incorporating a KL divergence-based objective in both centralized and decentralized protocols.
  • FML reduces communication overhead by exchanging soft labels instead of full model weights, while offering robustness against adversarial and data heterogeneity challenges.

Federated Mutual Learning (FML) defines a broad family of federated learning (FL) frameworks wherein participating clients collaborate by sharing model predictions, soft labels, or models for mutual knowledge distillation, rather than, or in addition to, exchanging raw gradient updates or full model weights. Motivated by the limitations of canonical FL (especially FedAvg) under data, model, and objective heterogeneity, FML algorithms employ distributed or bidirectional knowledge transfer based on Kullback-Leibler (KL) divergence over model outputs. FML encompasses both centralized (server-orchestrated) and decentralized (peer-to-peer) protocols, supports homogeneous or heterogeneous client architectures, and can be instantiated with or without auxiliary public data. This paradigm demonstrably improves generalization, personalization, robustness to data heterogeneity, and communication efficiency compared to classical parameter-averaging approaches.

1. Core Principles and Formal Objectives

In the canonical FML protocol—prototyped by "Federated Learning Framework via Distributed Mutual Learning"—a system consists of KK clients, each with private data DiD_i and model parameters θi\theta_i (Gupta, 3 Mar 2025). Clients also receive access to a small public dataset Xpub={x1,…,xM}X_\text{pub} = \{x_1,\dots,x_M\} for exchanging knowledge. The training objective for each client ii augments the vanilla local empirical loss with a KL-based mutual knowledge distillation term:

LiFML(θi)=Li(θi)+λ1M∑m=1MKL(pi(xm∣θi) ∥ pˉ−i(xm))\mathcal{L}_i^{\text{FML}}(\theta_i) = L_i(\theta_i) + \lambda \frac{1}{M} \sum_{m=1}^M \mathrm{KL}\left(p_i(x_m \mid \theta_i)~\|~\bar p_{-i}(x_m)\right)

where Li(θi)L_i(\theta_i) is the cross-entropy loss on DiD_i, pi(xm∣θi)p_i(x_m \mid \theta_i) is the softmax output (with optional temperature TT) of client DiD_i0 on public example DiD_i1, and DiD_i2 denotes the arithmetic mean of all other clients' outputs on DiD_i3.

By setting DiD_i4, the formulation reduces to standard FL; for DiD_i5, each client is encouraged to align its output distribution to the consensus of peers on the public set. This mechanism supports model-architecture heterogeneity (since only output logits are exchanged) and can incorporate variants such as bi-directional distillation, clustering-based peer selection, or applicability to decentralized topologies (Matsuda et al., 2021, Li et al., 2020, Khalil et al., 2024, Bai et al., 11 Jun 2025, Shen et al., 2020).

2. Algorithmic Instantiations and Protocol Variants

FML implementations differ in topology, communication medium, and aggregation strategies but share key stages:

  1. Local private update: Each client performs one or more epochs of local data optimization.
  2. Prediction exchange: Clients compute output vectors (typically softmax probabilities) on a shared public set or their own data (depending on the protocol).
  3. Aggregation: Server or peer(s) aggregate these output vectors (average or otherwise).
  4. Distillation update: Clients incorporate a KL-regularized objective based on peer outputs.

A representative centralized FML training loop (Gupta, 3 Mar 2025):

θi\theta_i4

Variants include:

  • FedMe: Clients exchange full models (of potentially different architectures), perform deep mutual learning (DML) over local data, and use validation loss-based model selection for automatic architectural tuning (Matsuda et al., 2021).
  • Def-KT and DFML: Decentralized, peer-to-peer protocols where clients pair with others, exchange models, and conduct mutual distillation locally, obviating the need for a central server or public data (Li et al., 2020, Khalil et al., 2024).
  • FedMLAC: Clients maintain both a personalized local model and a lightweight, globally shared "Plug-in" model, enforcing bidirectional KL-based distillation and employing layer-wise pruning in aggregation to enhance robustness in heterogeneous and potentially adversarial environments (Bai et al., 11 Jun 2025).
  • FML ("meme+personal"): Each client simultaneously trains a global ("meme") model and a local personalized model, exchanging meme updates with the server and mutually distilling between meme and local on private data (Shen et al., 2020).

3. Communication and Computational Efficiency

FML protocols typically achieve substantial reductions in communication volume compared to weight-sharing baselines. For example, in a standard centralized FML, the per-round per-client communication cost is DiD_i6 bits (sharing DiD_i7 softmax vectors of dimension DiD_i8), versus DiD_i9 for full model exchange (where θi\theta_i0 is the parameter count) (Gupta, 3 Mar 2025). A practical case showed a 50x reduction: θi\theta_i1 floats (soft outputs) versus θi\theta_i2 floats (weights). Model-exchange variants (FedMe, Def-KT, DFML), by their nature, transmit model parameters but address heterogeneity and personalization tradeoffs not attainable with weight-averaging.

Computation overhead for each client may increase relative to FedAvg due to the need for dual-model forward and backward passes (e.g., meme and local in FML, personalized and plug-in in FedMLAC), most notably doubling per-batch cost in vanilla mutual learning (Shen et al., 2020, Bai et al., 11 Jun 2025).

4. Privacy Properties and Security Considerations

FML mitigates many privacy and inference risks associated with gradient or weight sharing. Since only softpredictions on a public or non-sensitive set are exchanged (rather than gradients on private data or full weights), risk of model inversion attacks is reduced (Gupta, 3 Mar 2025). In research to date, no formal differential privacy proofs are provided; rather, privacy claims rely on the inability to infer private data from soft outputs on known public inputs. Some FML extensions propose the addition of noise to logit vectors or further aggregation protocols to amplify privacy (Gupta, 3 Mar 2025, Shen et al., 2020).

Robustness to adversarial and byzantine participants is addressed explicitly in recent frameworks, such as FedMLAC's layer-wise pruning aggregation, which filters outlier updates based on parameter deviation statistics to defend against poisoning or corrupted-label attacks (Bai et al., 11 Jun 2025).

5. Empirical Results and Application Domains

Across multiple studies, FML frameworks yield superior accuracy, generalization, and personalization under both IID and non-IID data splits, often with faster or more stable convergence than FedAvg or parameter-averaging protocols.

For instance, on a face-mask detection task, distributed FML achieved 94.45% average accuracy, outperforming vanilla FedAvg (92.65%) and asynchronous weight aggregation (92.74%), with markedly lower communication burden (Gupta, 3 Mar 2025). In decentralized FML (e.g., Def-KT, DFML), accuracy gains of 2–5% over baselines are observed under severe heterogeneity in both data and model space (Li et al., 2020, Khalil et al., 2024). FedMLAC demonstrates 1–5% gains in F1 or accuracy for federated audio classification scenarios, particularly robust to noisy or adversarial data (Bai et al., 11 Jun 2025).

Application domains include computer vision (MNIST, CIFAR-10/100), speech and audio recognition (GSC, IEMOCAP), environmental sound recognition, and natural language tasks (Shakespeare) (Matsuda et al., 2021, Bai et al., 11 Jun 2025, Shen et al., 2020).

6. Extensions, Limitations, and Open Questions

FML algorithms have been extended to address various axes of heterogeneity:

Key limitations include the reliance, in some variants, on public datasets for prediction exchange, which may not be feasible for certain privacy settings (Gupta, 3 Mar 2025). Most theoretical analyses rely on FedAvg's convergence properties; formal non-convex convergence and privacy amplification guarantees remain largely open (Shen et al., 2020, Bai et al., 11 Jun 2025). The management of distillation hyperparameters (e.g., θi\theta_i3, KL annealing, mutual learning weights) and interaction with advanced privacy-preserving mechanisms (DP, encrypted aggregation) are active areas of inquiry. Adaptive scheduling and integration with robust aggregation further strengthen these frameworks (Gupta, 3 Mar 2025, Bai et al., 11 Jun 2025).

7. Representative Algorithms and Comparison Table

The FML landscape includes multiple algorithmic variants, distinguished by architecture, communication, and privacy strategies:

Framework Centralized/Decentralized Heterogeneous Model Support Public Data Req. Robustness Components Reference
FML (loss exchange) Centralized Yes Yes Soft-prediction only (Gupta, 3 Mar 2025)
FedMe Centralized Yes Optional (for clustering) Deep mutual learning + model selection (Matsuda et al., 2021)
Def-KT Decentralized Yes No Peer-to-peer bidirectional distillation (Li et al., 2020)
DFML Decentralized Yes (non-restrictive) No WSM, cyclic distillation, peer aggregation (Khalil et al., 2024)
FedMLAC Centralized Partial (Plug-in fixed) No Layer-wise pruning aggregation (LPA) (Bai et al., 11 Jun 2025)
FML (meme+personal) Centralized Yes No Personalized-local + meme mutual learning (Shen et al., 2020)

Each framework demonstrates improved generalization and/or personalization under conditions of data and/or model heterogeneity, with varying privacy and robustness properties. FML has shifted the focus of federated optimization from direct parameter fusion to output-level consensus, providing a flexible foundation for next-generation FL systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Mutual Learning (FML).