FedBug: Debug & Gradual Unfreezing in FL

Updated 5 May 2026

FedBug is a dual-approach term describing both a systematic debugging framework for label-free fault localization and a gradual unfreezing protocol to mitigate client drift in federated learning.
The debugging framework employs record-and-replay infrastructure combined with differential testing via neuron activations to pinpoint faulty client behavior without accessing raw data.
The gradual unfreezing protocol sequentially unfreezes model layers during local training, ensuring cross-client alignment and improved convergence under heterogeneous conditions.

FedBug is a term associated with two distinct lines of research in federated learning (FL): (1) a systematic debugging framework, formally called FedDebug, designed for interactive, label-free fault localization in distributed FL applications (Gill et al., 2023); and (2) a bottom-up gradual unfreezing training methodology for mitigating client drift in heterogeneous FL, proposed as Federated Learning with Bottom-Up Gradual Unfreezing (Kao et al., 2023). Both approaches target fundamental operational and optimization challenges in FL but constitute unrelated frameworks and methodologies.

1. FedBug as Systematic Debugging and Fault Localization in Federated Learning

FedBug (as a shorthand for FedDebug) addresses critical limitations in the debugging of FL deployments where data privacy prohibits direct access to client data, making conventional debugging methods infeasible. This framework introduces record-and-replay infrastructure combined with differential testing based on neuron activations, enabling fine-grained, interactive diagnosis and pinpointing of faulty clients or rounds without accessing raw data or labels (Gill et al., 2023).

1.1 Architectural Components

Record-and-Replay Infrastructure: Captures the observable state (model weights, reported metrics, hyperparameters) at the end of each FL round. This state is logged per client without recording raw data. Debugging is enabled by spawning a parallel session at developer-specified breakpoints, supporting operations such as step-in (client aggregation granularity), step-out (global model view), step-next, and step-back.
Differential Testing via Neuron Activations: Fault localization is performed by generating synthetic inputs and recording the set of active neurons (activations above threshold $\tau$ ) for each client. By comparing those sets via a leave-one-out strategy, FedBug identifies the client(s) with the most deviant behavior, even in the absence of test labels or data.

1.2 Core Algorithms

Inference-Guided Test Input Selection: Random synthetic inputs are chosen to maximize label agreement among subsets of clients, facilitating clear differential comparisons.
Faulty Client Localization: For each test input, the system computes the intersection of active-neuron sets for all $(n-1)$ -client subsets, identifying the benignant set as the group with maximal overlap and labeling the excluded client(s) as faulty.
Neuron Activation Fingerprinting:

$A_c(x) = \{(\ell, k)\ |\ h_k^{(\ell)}(x; c) \geq \tau\},\quad \tau \approx 0.003$

The fault decision is made by maximizing $S(c^*) = \left|\bigcap_{c\in C\setminus\{c^*\}} A_c(x)\right|$ over all clients.

1.3 Implementation Details

All telemetry (model checksums, scalar metrics) is recorded on the aggregator at round boundaries. A simulator can replay FL rounds, exposing the fusion process for interactive forensic analysis. Model checkpoints are managed such that only the latest per-client model is memory-resident; others are streamed as needed.

1.4 Evaluation and Performance

Datasets: CIFAR-10, FEMNIST; Models: ResNet, DenseNet, VGG; FL: IID and non-IID partitions.
Accuracy: 100% fault localization for single faulty clients over all test configurations; ~90.3% average accuracy for multi-fault scenarios with up to 7 faulty clients; DenseNet achieves ~99.7% in these cases.
Overhead: Net training time overhead is ≤1.2% per round; localization time is ≤2.1% of round duration for up to 30 clients.
Scalability: Diagnostic accuracy degrades gracefully (to ~75%) at 400 clients, while debugging time remains linear in client count.

1.5 Practical Workflow

Example usage involves setting a breakpoint on a suspicious round, stepping into the partial aggregation, optionally inspecting individual client models, invoking automated fault localization, interpreting neuron activations, and—if needed—removing the faulty client and replaying subsequent rounds for real-time remediation.

2. FedBug as Bottom-Up Gradual Unfreezing for Federated Optimization

FedBug also denotes a training protocol aiming to address client drift arising from data heterogeneity in FL. In this approach, the client model parameters distributed from the server serve as anchors for cross-client alignment, and training proceeds through a bottom-up gradual unfreezing (GU) schedule, beginning with all modules frozen and sequentially unfreezing them starting from the input towards the output layers (Kao et al., 2023).

2.1 Algorithm and Protocol Description

Model Decomposition: The global model $\theta = [\theta^1,\ldots,\theta^M]$ is partitioned into $M$ modules (e.g., layers or blocks).
Gradual Unfreezing Schedule: For the initial fraction $P$ of $K$ local iterations, only the first $m$ modules are trainable (with $m$ growing as $(n-1)$ 0 increases), keeping the upper modules frozen and identical to the server-provided parameters.
Cross-Client Alignment: Frozen modules anchor a common feature-to-decision hyperplane, ensuring consistent representations across clients throughout the GU stage, thereby suppressing drift.
Aggregation: At the conclusion of local training, models are averaged as in FedAvg:

$(n-1)$ 1

No Additional Regularization: The cross-client consistency effect is achieved implicitly via freezing; no extra regularization terms are used.

Pseudocode Summary (condensed)

$(n-1)$ 9

2.2 Theoretical Analysis

Objective: Identical to FedAvg: $(n-1)$ 2.
Convergence Analysis (Overparameterized Setting): In a two-client, two-layer scenario, FedBug attains a strictly better contraction ratio for client discrepancy ( $(n-1)$ 3) compared to FedAvg, with additional contraction attributable to the sequential freezing schedule:

$(n-1)$ 4

This result demonstrates linear convergence acceleration when using FedBug in over-parameterized linear setups.

2.3 Empirical Validation

Datasets: CIFAR-10, CIFAR-100, Tiny-ImageNet; IID and Dirichlet non-IID partitioning.
Architectures: Standard CNNs, ResNet-18/34 with GroupNorm.
Main Results: On CIFAR-10 (IID, 1% participation), FedAvg test accuracy is 78.61%, while FedBug (10–40% GU) improves to 80.2–80.6%. On CIFAR-100, gains exceed 6% at best GU settings.
Ablations: Even 10% GU yields improvements; 20–40% is optimal. Bottom-up GU outperforms top-down or freezing the top $(n-1)$ 5– $(n-1)$ 6 layers.

2.4 Compatibility and Usability

Compatibility: FedBug composes with advanced FL methods (FedProx, FedDyn, FedExp, FedDecorr), producing consistent 0.3–3% accuracy improvements without altering server aggregation logic.
Practical Recommendations: Use GU fraction $(n-1)$ 7– $(n-1)$ 8; select modules to balance parameter counts across layers; freeze beyond active modules per the GU schedule. Implementation entails only minor modifications to client-side training loops.

3. Comparative Perspective and Key Distinctions

FedBug (Debugging) (Gill et al., 2023)	FedBug (Gradual Unfreezing) (Kao et al., 2023)
Systematic, label-free fault localization in FL	Training protocol for mitigating client drift
Interactive record/replay; neuron activation differencing	Client-side gradual layer unfreezing, bottom-up
Requires no raw data or labels	Requires no extra penalties or communication
Targets operational debugging and forensics	Targets convergence and optimization improvement

The two frameworks share a focus on practical FL limitations—fault diagnosis and statistical heterogeneity, respectively—but operate independently and are applicable at different system layers.

4. Implementation Considerations

For the debugging application, integrating FedBug involves telemetry logging at the aggregator and implementing record/replay simulation and neuron activation differencing for forensic analysis. For the gradual unfreezing protocol, implementation involves augmenting local training processes with GU scheduling, no changes to aggregation, and careful module decomposition for consistent unfreezing intervals.

5. Practical Applications and Guidelines

FedBug (Debugging): Deployed in privacy-critical FL scenarios (e.g., medical imaging) where rapid, interactive diagnosis and exclusion of faulty clients is essential; supports label-free investigation and mitigation of round-specific anomalies with minimal runtime overhead.
FedBug (Gradual Unfreezing): Recommended for FL deployments suffering from severe client drift—particularly in non-IID or low-participation contexts and for deep architectures where drift predominantly impacts higher layers. Suitable for systems where ancillary communication or regularization terms are undesirable.

6. Significance and Further Implications

FedBug, in both incarnations, exemplifies the ongoing advancement of operational and algorithmic debugging, transparency, and optimization in federated learning. The debugging framework demonstrates that reliable client-level diagnosis can be achieved efficiently without compromising data privacy, while the gradual unfreezing schedule offers a generalizable, overhead-free mechanism for harmonizing distributed optimization across heterogeneous clients. Each approach is compatible with the broader ecosystem of FL algorithms, and their modular design supports practical adoption without fundamental system redesign.

References:

FedDebug/FedBug, Federated Learning Debugging: (Gill et al., 2023) FedBug, Gradual Unfreezing Federated Learning: (Kao et al., 2023)

Markdown Report Issue Upgrade to Chat

References (2)

FedDebug: Systematic Debugging for Federated Learning Applications (2023)

FedBug: A Bottom-Up Gradual Unfreezing Framework for Federated Learning (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FedBug.