FedDSR: Federated Deep Supervision & Regularization

Updated 14 December 2025

FedDSR is a federated learning paradigm that integrates intermediate deep supervision via mutual information losses and NE regularization to tackle non-IID challenges.
It applies architecture-agnostic intermediate layer selection to mitigate client drift and improve class-discriminative feature learning in autonomous driving scenarios.
Empirical evaluations show up to 8.93% mIoU improvement and reduced training rounds, demonstrating its efficiency and robustness over traditional FL techniques.

Federated Deep Supervision and Regularization (FedDSR) is a federated learning (FL) paradigm developed to address generalization and convergence deficiencies in autonomous driving (AD) applications stemming from heterogeneous, non-independent and identically distributed (non-IID) data across distributed vehicles. By integrating multi-access supervision via mutual information (MI) losses and regularization via negative entropy (NE) at architecture-agnostic intermediate network layers, FedDSR mitigates client drift, strengthens class-discriminative feature learning, and accelerates overall optimization without sacrificing the $\mathcal{O}(1/\sqrt{T})$ convergence guarantees of nonconvex stochastic gradient descent (Kou et al., 7 Dec 2025).

1. Motivation and Core Concepts

Traditional federated aggregation, such as FedAvg, is susceptible to slow convergence and client-specific overfitting under non-IID distributions, common in AD scenarios (e.g., disparate urban and rural driving). FedDSR introduces deep supervision at selected intermediate layer transitions and applies regularization penalties during local vehicle updates. Multi-access MI loss encourages intermediate features to encode discriminative representations aligned with ground truth, while NE regularization counteracts overconfident activations and improves optimization robustness in federated settings (Kou et al., 7 Dec 2025). This combination yields a unified objective, enhancing both global generalization and federated model convergence.

2. Architecture-Agnostic Intermediate Layer Selection

FedDSR defines $M$ intermediate points $\{G_1,\dots,G_M\}$ in the network using architecture-independent criteria:

Before/after downsampling operations (pooling, strided convolution)
Between macro-blocks (e.g., ResNet stages, transformer layers)
At bottleneck layers (feature compression points)
Before/after attention modules
At feature-fusion junctions in multi-branch architectures

This method permits application to diverse backbone types, including CNNs, transformers, and hybrid networks, enabling supervision and regularization at locations representing significant shifts in feature abstraction (Kou et al., 7 Dec 2025).

3. Formalization: MI Loss, NE Regularizer, and Unified Objective

FedDSR operates locally on each client (vehicle) $n$ with dataset $\mathcal{D}_n$ and model $\theta_n$ , applying the following loss components:

Output-layer cross-entropy:

$\mathcal{L}_{\mathrm{CE}^n} = -\frac{1}{|\mathcal{D}_n|}\sum_{(x_i,y_i)}\sum_{p\in\Omega}\sum_{k=1}^K y_{i,p,k} \log p_k(x_{i,p};\theta_n)$

Intermediate-point mutual information loss at $G_m$ :

$\mathcal{L}_{\mathrm{MI}^{m,n}} = -\frac{1}{|\mathcal{D}_n|}\sum_{(x_i,y_i)}\sum_{p\in\Omega}\sum_{k=1}^K y_{i,p,k} \log q_k^m(z_{i,p}^m;\phi_m^n)$

Intermediate-point negative entropy regularizer at $G_m$ :

$M$ 0

The overall local loss optimized by each client:

$M$ 1

where $M$ 2 balance supervision and regularization. Central aggregation minimizes $M$ 3, with weights $M$ 4 proportional to client dataset size (Kou et al., 7 Dec 2025).

4. Training Workflow in Federated Setting

FedDSR implements a client-server workflow summarized in Algorithm 1 (Kou et al., 7 Dec 2025):

Server samples participating vehicles, broadcasts $M$ 5
Each client:
- Initializes local model and adapter parameters
- For each local epoch and minibatch: performs forward pass, computes output and intermediate losses, backpropagates gradients, updates $M$ 6 and $M$ 7
- Returns updated parameters to server
Server aggregates updates via weighted averaging:

$M$ 8

Repeat for $M$ 9 rounds to produce final model $\{G_1,\dots,G_M\}$ 0, $\{G_1,\dots,G_M\}$ 1.

5. Empirical Evaluation and Benchmarking

FedDSR is benchmarked in semantic segmentation tasks using CamVid, Cityscapes, and SynthiaSF datasets. Tested architectures include DeepLabv3+ (ResNet-50 backbone), SeaFormer, and TopFormer. Comparisons span multiple FL baselines: FedAvg, FedProx, FedDyn, FedAvgM, FedIR, MOON, SCAFFOLD, BalanceFL, FedGau (Kou et al., 7 Dec 2025).

mIoU Gains (FedDSR vs. FedAvg, DeepLabv3+):

Dataset	FedAvg mIoU	FedDSR mIoU	Improvement
CamVid	72.78%	73.24%	+0.63%
Cityscapes	47.91%	50.49%	+5.39%
SynthiaSF	26.65%	29.03%	+8.93%

FedDSR achieves up to 8.93% mIoU improvement and up to 28.6% reduction in training rounds, requiring $\{G_1,\dots,G_M\}$ 2 rounds to reach target mIoU vs. 90 rounds for FedAvg on Cityscapes. When combined with other FL algorithms (e.g., FedGau, FedAvgM), FedDSR typically delivers 2-5% additional mIoU uplift (Kou et al., 7 Dec 2025).

6. Qualitative Outcomes and Model Robustness

FedDSR demonstrably reduces segmentation boundary errors, notably improving separation of classes such as sky and road, and corrects small-object segmentation axes. MI supervision at intermediate layers drives the extraction of hierarchy-aware, class-discriminative features, curbing local-client overfitting. NE regularization dampens overly confident hidden activations, resulting in smoother global optimization landscapes and effective mitigation of client drift (Kou et al., 7 Dec 2025). These mechanisms jointly harmonize local gradient directions and foster rapid convergence to globally generalizable solutions.

7. Ablation Studies: Selection and Positioning of Intermediate Points

FedDSR’s ablation analysis offers insights into the configuration of supervision and regularization points:

Number of intermediate transitions ( $\{G_1,\dots,G_M\}$ 3):
- 1 point: 50.49% mIoU
- 3 points: 51.52% mIoU (peak)
- 5 points: 50.49% mIoU
- A moderate $\{G_1,\dots,G_M\}$ 4 yields optimal trade-off
Spacing between points:
- Two-base spacing yields highest mIoU (51.72%)
- Features spaced moderately apart maximize benefit
Positioning of points:
- Close-to-input supervision: 52.98% mIoU, surpasses central or output-proximal supervision
- Early supervision leads to superior gains, confirming the importance of guiding shallow representations

In summary, FedDSR systematically integrates intermediate-layer MI-based supervision and NE-based regularization in FL for AD. The paradigm is model-agnostic, preserves theoretical convergence rates, and yields substantial improvements in generalization and training efficiency across standard segmentation benchmarks (Kou et al., 7 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Deep Supervision and Regularization (FedDSR).

FedDSR: Federated Deep Supervision & Regularization

1. Motivation and Core Concepts

2. Architecture-Agnostic Intermediate Layer Selection

3. Formalization: MI Loss, NE Regularizer, and Unified Objective

4. Training Workflow in Federated Setting

5. Empirical Evaluation and Benchmarking

mIoU Gains (FedDSR vs. FedAvg, DeepLabv3+):

6. Qualitative Outcomes and Model Robustness

7. Ablation Studies: Selection and Positioning of Intermediate Points

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FedDSR: Federated Deep Supervision & Regularization

1. Motivation and Core Concepts

2. Architecture-Agnostic Intermediate Layer Selection

3. Formalization: MI Loss, NE Regularizer, and Unified Objective

4. Training Workflow in Federated Setting

5. Empirical Evaluation and Benchmarking

mIoU Gains (FedDSR vs. FedAvg, DeepLabv3+):

6. Qualitative Outcomes and Model Robustness

7. Ablation Studies: Selection and Positioning of Intermediate Points

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research