Federated Learning with Feedback Alignment
- FLFA is a federated learning technique that employs global weight feedback to align local client updates and mitigate drift in non-IID data settings.
- It modifies the traditional backpropagation by substituting local weight transposes with fixed global matrices, leading to improvements in convergence and model accuracy.
- Empirical results demonstrate robust gains, including up to a 20% boost in representation quality, with negligible computational cost and zero additional communication.
Federated Learning with Feedback Alignment (FLFA) refers to a class of techniques in federated learning (FL) where feedback alignment (FA) is incorporated into local model training to reduce the adverse effects of client data heterogeneity and local drift. FLFA achieves alignment of local client updates with the global objective by modifying the backpropagation procedure to use global model weights as fixed feedback matrices during backward passes. This yields robust empirical improvements in model accuracy, representation quality, and convergence with minimal additional computational and communication cost, especially under non-IID scenarios (Baek et al., 14 Dec 2025). FLFA should be contrasted with direct feedback alignment (DFA), which replaces local gradients with random fixed feedback matrices—a direction explored for resource-constrained federated learning (Colombo et al., 25 Nov 2024).
1. Federated Learning under Non-IID Data and Local Drift
In the canonical FL setup, clients each hold local datasets of size ; the global dataset has size and weights . Each client minimizes its local expected loss , while the server aims to minimize the weighted average objective . The most established algorithm is FedAvg, where in each round , clients initialize (the global model), perform local SGD steps, and communicate their resulting updates to the server, which sets .
A central challenge in federated settings is data heterogeneity: when clients' data distributions are non-identically and independently distributed (non-IID), local updates divaricate, leading to "local drift." The degree of drift is quantified by
where is the update from client and is the mean update. Large values impede global model convergence by causing update mismatch.
2. FLFA Algorithm: Incorporating Global Weight Feedback
The FLFA algorithm modifies the local backward pass on each client by using a fixed feedback matrix derived from current global model weights, rather than local weight transposes, in some or all layers. This process has negligible extra computation and incurs zero additional communication since the required global weights are already available at synchronization.
Let denote the number of layers. For a given layer , client 's local weight is , global weight is , and the set of layers where FA is applied is denoted . Forward activations and error signals are propagated as usual. In standard backpropagation (BP), error signals are given by
with corresponding weight updates.
In FLFA, for layers , the backward computation is
where the feedback matrix is initialized to (current global weight at layer at round ) and adaptively scaled to maintain norm parity with . Other layers use standard BP.
Adaptive Scaling: After each batch, the feedback matrices are rescaled:
Server Aggregation: After local epochs, clients send their updated weights to the server, which computes a weighted average.
This FA insertion mitigates local drift by ensuring that all clients receive backward signals informed by the same global reference, aligning the directions of local updates even under severe heterogeneity (Baek et al., 14 Dec 2025).
3. Theoretical Foundations and Convergence
FLFA's theoretical analysis rests on several key assumptions:
- A1: Lipschitz gradients: for all .
- A2: Unbiased stochastic gradients: , variance .
- A3: Bounded heterogeneity: .
- A4: Bounded FA approximation error: for the FA gradient , .
Main theoretical results include:
- Lemma 1 (local decrease): Local objective decreases by
- Lemma 2 (global decrease): Analogous result for the global objective, with a similar form but joint dependencies on and .
- Convergence Outline: These bounds imply convergence to a neighborhood of a stationary point (), with neighborhood size controlled by (FA error) and (heterogeneity). Critically, setting feedback ensures the global and local weights are well-aligned, minimizing and further suppressing drift by reducing between clients.
4. Empirical Results and Practical Considerations
FLFA was empirically validated on a diverse set of architectures (MobileNetV2, ResNet-50) and datasets (BloodMNIST, OrganCMNIST, OrganSMNIST, PathMNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, ImageNet-100), with up to 200 participating clients and strong non-IID partitioning using the Dirichlet- scheme (e.g., for maximal heterogeneity).
The experimental protocol comprised:
- 100 rounds (medical/FashionMNIST), 500 rounds (CIFAR-10), 5 local epochs per round.
- Training with SGD, momentum $0.9$, learning rate $0.01$ (decayed), batch size $64$.
- Randomly selecting of clients per round.
Key evaluation metrics included test accuracy, relative training time, drift ( per round), and representation quality (intra/inter-class variance and separability ratio).
Highlights from results:
- Test accuracy: FLFA improved on all baselines—e.g., FedAvg +2.47% (BloodMNIST), +6.45% (FMNIST); advanced baselines (FedRS, FedLC) gain +1–2%.
- Overhead: Computational overhead is negligible (), with zero added communication.
- Drift reduction: FLFA consistently reduced local drift versus BP, especially in early rounds.
- Representation: ~20% improvement in separability ratio on CIFAR-10 with FedAvg+FLFA.
- Robustness: Effective even under extreme data skew, low client participation (5%), and deep local training (15 epochs).
- Ablations: Using random feedback or dropping adaptive scaling degrades performance; single-layer FA often suffices; best gains achieved by choosing FA layers by lowest gradient cosine similarity.
5. Extensions and Related Approaches
FLFA's core innovation—using the current global model weights as feedback matrices—differs fundamentally from methods such as direct feedback alignment (DFA), which employs fixed random matrices. DFA is beneficial for resource-limited settings (low-precision, TinyML) but struggles to match BP's accuracy on high-dimensional tasks and convolutional architectures (Colombo et al., 25 Nov 2024).
Related empirical strategies:
- Random Feedback (DFA): Not effective on convolutional networks in federated contexts.
- Single-layer versus Multi-layer FA: Single FA layer is often sufficient; optimal layer selection depends on gradient cosine similarity statistics.
- Adaptive Feedback Scaling: Necessary for stable training; omitting scaling harms performance.
- Representation Learning Effects: FA improves latent representation separability beyond overall accuracy increases.
6. Symbol Table
| Symbol | Definition |
|---|---|
| Client 's weight at round , layer , step | |
| Global weight at layer , round | |
| Feedback matrix for client , layer | |
| Pre-activation, activation at layer | |
| Error signal at layer for client | |
| Element-wise (Hadamard) product | |
| Derivative of nonlinearity | |
| Learning rate | |
| Number of local steps | |
| Number of local epochs | |
| Lipschitz gradient constant, gradient noise bound, heterogeneity bound, FA error |
All symbols and workflow steps align directly with those stated in the original framework description (Baek et al., 14 Dec 2025).
7. Summary and Implications
FLFA provides an effective, efficient modification to the federated learning process, leveraging global model weights as fixed feedback matrices in the backward pass to align local updates, suppress local drift, and robustly improve convergence and downstream accuracy. Its minimal compute and communication overhead make it conducive for practical deployment, especially in highly heterogeneous and large-scale federated environments. Empirical and theoretical analyses confirm that FA—when instantiated with global weights and adaptive scaling—offers significant benefits over both standard BP and direct/random feedback methods. Representative benchmarks demonstrate consistent gains across modalities, architectural choices, and data regimes (Baek et al., 14 Dec 2025).