Papers
Topics
Authors
Recent
Search
2000 character limit reached

FedDSR: Federated Deep Supervision & Regularization

Updated 14 December 2025
  • FedDSR is a federated learning paradigm that integrates intermediate deep supervision via mutual information losses and NE regularization to tackle non-IID challenges.
  • It applies architecture-agnostic intermediate layer selection to mitigate client drift and improve class-discriminative feature learning in autonomous driving scenarios.
  • Empirical evaluations show up to 8.93% mIoU improvement and reduced training rounds, demonstrating its efficiency and robustness over traditional FL techniques.

Federated Deep Supervision and Regularization (FedDSR) is a federated learning (FL) paradigm developed to address generalization and convergence deficiencies in autonomous driving (AD) applications stemming from heterogeneous, non-independent and identically distributed (non-IID) data across distributed vehicles. By integrating multi-access supervision via mutual information (MI) losses and regularization via negative entropy (NE) at architecture-agnostic intermediate network layers, FedDSR mitigates client drift, strengthens class-discriminative feature learning, and accelerates overall optimization without sacrificing the O(1/T)\mathcal{O}(1/\sqrt{T}) convergence guarantees of nonconvex stochastic gradient descent (Kou et al., 7 Dec 2025).

1. Motivation and Core Concepts

Traditional federated aggregation, such as FedAvg, is susceptible to slow convergence and client-specific overfitting under non-IID distributions, common in AD scenarios (e.g., disparate urban and rural driving). FedDSR introduces deep supervision at selected intermediate layer transitions and applies regularization penalties during local vehicle updates. Multi-access MI loss encourages intermediate features to encode discriminative representations aligned with ground truth, while NE regularization counteracts overconfident activations and improves optimization robustness in federated settings (Kou et al., 7 Dec 2025). This combination yields a unified objective, enhancing both global generalization and federated model convergence.

2. Architecture-Agnostic Intermediate Layer Selection

FedDSR defines MM intermediate points {G1,,GM}\{G_1,\dots,G_M\} in the network using architecture-independent criteria:

  • Before/after downsampling operations (pooling, strided convolution)
  • Between macro-blocks (e.g., ResNet stages, transformer layers)
  • At bottleneck layers (feature compression points)
  • Before/after attention modules
  • At feature-fusion junctions in multi-branch architectures

This method permits application to diverse backbone types, including CNNs, transformers, and hybrid networks, enabling supervision and regularization at locations representing significant shifts in feature abstraction (Kou et al., 7 Dec 2025).

3. Formalization: MI Loss, NE Regularizer, and Unified Objective

FedDSR operates locally on each client (vehicle) nn with dataset Dn\mathcal{D}_n and model θn\theta_n, applying the following loss components:

  • Output-layer cross-entropy:

LCEn=1Dn(xi,yi)pΩk=1Kyi,p,klogpk(xi,p;θn)\mathcal{L}_{\mathrm{CE}^n} = -\frac{1}{|\mathcal{D}_n|}\sum_{(x_i,y_i)}\sum_{p\in\Omega}\sum_{k=1}^K y_{i,p,k} \log p_k(x_{i,p};\theta_n)

  • Intermediate-point mutual information loss at GmG_m:

LMIm,n=1Dn(xi,yi)pΩk=1Kyi,p,klogqkm(zi,pm;ϕmn)\mathcal{L}_{\mathrm{MI}^{m,n}} = -\frac{1}{|\mathcal{D}_n|}\sum_{(x_i,y_i)}\sum_{p\in\Omega}\sum_{k=1}^K y_{i,p,k} \log q_k^m(z_{i,p}^m;\phi_m^n)

  • Intermediate-point negative entropy regularizer at GmG_m:

MM0

The overall local loss optimized by each client:

MM1

where MM2 balance supervision and regularization. Central aggregation minimizes MM3, with weights MM4 proportional to client dataset size (Kou et al., 7 Dec 2025).

4. Training Workflow in Federated Setting

FedDSR implements a client-server workflow summarized in Algorithm 1 (Kou et al., 7 Dec 2025):

  1. Server samples participating vehicles, broadcasts MM5
  2. Each client:
    • Initializes local model and adapter parameters
    • For each local epoch and minibatch: performs forward pass, computes output and intermediate losses, backpropagates gradients, updates MM6 and MM7
    • Returns updated parameters to server
  3. Server aggregates updates via weighted averaging:

MM8

Repeat for MM9 rounds to produce final model {G1,,GM}\{G_1,\dots,G_M\}0, {G1,,GM}\{G_1,\dots,G_M\}1.

5. Empirical Evaluation and Benchmarking

FedDSR is benchmarked in semantic segmentation tasks using CamVid, Cityscapes, and SynthiaSF datasets. Tested architectures include DeepLabv3+ (ResNet-50 backbone), SeaFormer, and TopFormer. Comparisons span multiple FL baselines: FedAvg, FedProx, FedDyn, FedAvgM, FedIR, MOON, SCAFFOLD, BalanceFL, FedGau (Kou et al., 7 Dec 2025).

mIoU Gains (FedDSR vs. FedAvg, DeepLabv3+):

Dataset FedAvg mIoU FedDSR mIoU Improvement
CamVid 72.78% 73.24% +0.63%
Cityscapes 47.91% 50.49% +5.39%
SynthiaSF 26.65% 29.03% +8.93%

FedDSR achieves up to 8.93% mIoU improvement and up to 28.6% reduction in training rounds, requiring {G1,,GM}\{G_1,\dots,G_M\}2 rounds to reach target mIoU vs. 90 rounds for FedAvg on Cityscapes. When combined with other FL algorithms (e.g., FedGau, FedAvgM), FedDSR typically delivers 2-5% additional mIoU uplift (Kou et al., 7 Dec 2025).

6. Qualitative Outcomes and Model Robustness

FedDSR demonstrably reduces segmentation boundary errors, notably improving separation of classes such as sky and road, and corrects small-object segmentation axes. MI supervision at intermediate layers drives the extraction of hierarchy-aware, class-discriminative features, curbing local-client overfitting. NE regularization dampens overly confident hidden activations, resulting in smoother global optimization landscapes and effective mitigation of client drift (Kou et al., 7 Dec 2025). These mechanisms jointly harmonize local gradient directions and foster rapid convergence to globally generalizable solutions.

7. Ablation Studies: Selection and Positioning of Intermediate Points

FedDSR’s ablation analysis offers insights into the configuration of supervision and regularization points:

  • Number of intermediate transitions ({G1,,GM}\{G_1,\dots,G_M\}3):
    • 1 point: 50.49% mIoU
    • 3 points: 51.52% mIoU (peak)
    • 5 points: 50.49% mIoU
    • A moderate {G1,,GM}\{G_1,\dots,G_M\}4 yields optimal trade-off
  • Spacing between points:
    • Two-base spacing yields highest mIoU (51.72%)
    • Features spaced moderately apart maximize benefit
  • Positioning of points:
    • Close-to-input supervision: 52.98% mIoU, surpasses central or output-proximal supervision
    • Early supervision leads to superior gains, confirming the importance of guiding shallow representations

In summary, FedDSR systematically integrates intermediate-layer MI-based supervision and NE-based regularization in FL for AD. The paradigm is model-agnostic, preserves theoretical convergence rates, and yields substantial improvements in generalization and training efficiency across standard segmentation benchmarks (Kou et al., 7 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Deep Supervision and Regularization (FedDSR).