FedDSR: Federated Deep Supervision & Regularization
- FedDSR is a federated learning paradigm that integrates intermediate deep supervision via mutual information losses and NE regularization to tackle non-IID challenges.
- It applies architecture-agnostic intermediate layer selection to mitigate client drift and improve class-discriminative feature learning in autonomous driving scenarios.
- Empirical evaluations show up to 8.93% mIoU improvement and reduced training rounds, demonstrating its efficiency and robustness over traditional FL techniques.
Federated Deep Supervision and Regularization (FedDSR) is a federated learning (FL) paradigm developed to address generalization and convergence deficiencies in autonomous driving (AD) applications stemming from heterogeneous, non-independent and identically distributed (non-IID) data across distributed vehicles. By integrating multi-access supervision via mutual information (MI) losses and regularization via negative entropy (NE) at architecture-agnostic intermediate network layers, FedDSR mitigates client drift, strengthens class-discriminative feature learning, and accelerates overall optimization without sacrificing the convergence guarantees of nonconvex stochastic gradient descent (Kou et al., 7 Dec 2025).
1. Motivation and Core Concepts
Traditional federated aggregation, such as FedAvg, is susceptible to slow convergence and client-specific overfitting under non-IID distributions, common in AD scenarios (e.g., disparate urban and rural driving). FedDSR introduces deep supervision at selected intermediate layer transitions and applies regularization penalties during local vehicle updates. Multi-access MI loss encourages intermediate features to encode discriminative representations aligned with ground truth, while NE regularization counteracts overconfident activations and improves optimization robustness in federated settings (Kou et al., 7 Dec 2025). This combination yields a unified objective, enhancing both global generalization and federated model convergence.
2. Architecture-Agnostic Intermediate Layer Selection
FedDSR defines intermediate points in the network using architecture-independent criteria:
- Before/after downsampling operations (pooling, strided convolution)
- Between macro-blocks (e.g., ResNet stages, transformer layers)
- At bottleneck layers (feature compression points)
- Before/after attention modules
- At feature-fusion junctions in multi-branch architectures
This method permits application to diverse backbone types, including CNNs, transformers, and hybrid networks, enabling supervision and regularization at locations representing significant shifts in feature abstraction (Kou et al., 7 Dec 2025).
3. Formalization: MI Loss, NE Regularizer, and Unified Objective
FedDSR operates locally on each client (vehicle) with dataset and model , applying the following loss components:
- Output-layer cross-entropy:
- Intermediate-point mutual information loss at :
- Intermediate-point negative entropy regularizer at :
0
The overall local loss optimized by each client:
1
where 2 balance supervision and regularization. Central aggregation minimizes 3, with weights 4 proportional to client dataset size (Kou et al., 7 Dec 2025).
4. Training Workflow in Federated Setting
FedDSR implements a client-server workflow summarized in Algorithm 1 (Kou et al., 7 Dec 2025):
- Server samples participating vehicles, broadcasts 5
- Each client:
- Initializes local model and adapter parameters
- For each local epoch and minibatch: performs forward pass, computes output and intermediate losses, backpropagates gradients, updates 6 and 7
- Returns updated parameters to server
- Server aggregates updates via weighted averaging:
8
Repeat for 9 rounds to produce final model 0, 1.
5. Empirical Evaluation and Benchmarking
FedDSR is benchmarked in semantic segmentation tasks using CamVid, Cityscapes, and SynthiaSF datasets. Tested architectures include DeepLabv3+ (ResNet-50 backbone), SeaFormer, and TopFormer. Comparisons span multiple FL baselines: FedAvg, FedProx, FedDyn, FedAvgM, FedIR, MOON, SCAFFOLD, BalanceFL, FedGau (Kou et al., 7 Dec 2025).
mIoU Gains (FedDSR vs. FedAvg, DeepLabv3+):
| Dataset | FedAvg mIoU | FedDSR mIoU | Improvement |
|---|---|---|---|
| CamVid | 72.78% | 73.24% | +0.63% |
| Cityscapes | 47.91% | 50.49% | +5.39% |
| SynthiaSF | 26.65% | 29.03% | +8.93% |
FedDSR achieves up to 8.93% mIoU improvement and up to 28.6% reduction in training rounds, requiring 2 rounds to reach target mIoU vs. 90 rounds for FedAvg on Cityscapes. When combined with other FL algorithms (e.g., FedGau, FedAvgM), FedDSR typically delivers 2-5% additional mIoU uplift (Kou et al., 7 Dec 2025).
6. Qualitative Outcomes and Model Robustness
FedDSR demonstrably reduces segmentation boundary errors, notably improving separation of classes such as sky and road, and corrects small-object segmentation axes. MI supervision at intermediate layers drives the extraction of hierarchy-aware, class-discriminative features, curbing local-client overfitting. NE regularization dampens overly confident hidden activations, resulting in smoother global optimization landscapes and effective mitigation of client drift (Kou et al., 7 Dec 2025). These mechanisms jointly harmonize local gradient directions and foster rapid convergence to globally generalizable solutions.
7. Ablation Studies: Selection and Positioning of Intermediate Points
FedDSR’s ablation analysis offers insights into the configuration of supervision and regularization points:
- Number of intermediate transitions (3):
- 1 point: 50.49% mIoU
- 3 points: 51.52% mIoU (peak)
- 5 points: 50.49% mIoU
- A moderate 4 yields optimal trade-off
- Spacing between points:
- Two-base spacing yields highest mIoU (51.72%)
- Features spaced moderately apart maximize benefit
- Positioning of points:
- Close-to-input supervision: 52.98% mIoU, surpasses central or output-proximal supervision
- Early supervision leads to superior gains, confirming the importance of guiding shallow representations
In summary, FedDSR systematically integrates intermediate-layer MI-based supervision and NE-based regularization in FL for AD. The paradigm is model-agnostic, preserves theoretical convergence rates, and yields substantial improvements in generalization and training efficiency across standard segmentation benchmarks (Kou et al., 7 Dec 2025).