Federated Learning with Feedback Alignment

Updated 21 December 2025

FLFA is a federated learning technique that employs global weight feedback to align local client updates and mitigate drift in non-IID data settings.
It modifies the traditional backpropagation by substituting local weight transposes with fixed global matrices, leading to improvements in convergence and model accuracy.
Empirical results demonstrate robust gains, including up to a 20% boost in representation quality, with negligible computational cost and zero additional communication.

Federated Learning with Feedback Alignment (FLFA) refers to a class of techniques in federated learning (FL) where feedback alignment (FA) is incorporated into local model training to reduce the adverse effects of client data heterogeneity and local drift. FLFA achieves alignment of local client updates with the global objective by modifying the backpropagation procedure to use global model weights as fixed feedback matrices during backward passes. This yields robust empirical improvements in model accuracy, representation quality, and convergence with minimal additional computational and communication cost, especially under non-IID scenarios (Baek et al., 14 Dec 2025). FLFA should be contrasted with direct feedback alignment (DFA), which replaces local gradients with random fixed feedback matrices—a direction explored for resource-constrained federated learning (Colombo et al., 25 Nov 2024).

1. Federated Learning under Non-IID Data and Local Drift

In the canonical FL setup, $N$ clients each hold local datasets $D_i$ of size $|D_i|$ ; the global dataset has size $|D| = \sum_i |D_i|$ and weights $\pi_i = |D_i| / |D|$ . Each client minimizes its local expected loss $J_i(w) = \mathbb{E}_{(x,y)\sim D_i}[\ell(w; x, y)]$ , while the server aims to minimize the weighted average objective $J(w) = \sum_i \pi_i J_i(w)$ . The most established algorithm is FedAvg, where in each round $r$ , clients initialize $w_i^{(r,0)} = W^r$ (the global model), perform $s$ local SGD steps, and communicate their resulting updates to the server, which sets $W^{(r+1)} = \sum_i \pi_i w_i^{(r,s)}$ .

A central challenge in federated settings is data heterogeneity: when clients' data distributions are non-identically and independently distributed (non-IID), local updates divaricate, leading to "local drift." The degree of drift is quantified by

$H = \frac{1}{K} \sum_{i=1}^K \| \Delta w_i - \overline{\Delta w} \|_2,$

where $\Delta w_i$ is the update from client $i$ and $\overline{\Delta w}$ is the mean update. Large $H$ values impede global model convergence by causing update mismatch.

2. FLFA Algorithm: Incorporating Global Weight Feedback

The FLFA algorithm modifies the local backward pass on each client by using a fixed feedback matrix derived from current global model weights, rather than local weight transposes, in some or all layers. This process has negligible extra computation and incurs zero additional communication since the required global weights are already available at synchronization.

Let $L$ denote the number of layers. For a given layer $l$ , client $i$ 's local weight is $w_{i,l}$ , global weight is $W_l$ , and the set of layers where FA is applied is denoted $\mathcal{F}$ . Forward activations and error signals are propagated as usual. In standard backpropagation (BP), error signals are given by

$\delta_{i,l} = (w_{i,l+1}^\top \delta_{i,l+1}) \odot f'(z_{i,l}),$

with corresponding weight updates.

In FLFA, for layers $l \in \mathcal{F}$ , the backward computation is

$\delta_{i,l} = (B_{i,l+1}^\top \delta_{i,l+1}) \odot f'(z_{i,l}),$

where the feedback matrix $B_{i,l+1}$ is initialized to $W_{l+1}^r$ (current global weight at layer $l+1$ at round $r$ ) and adaptively scaled to maintain norm parity with $w_{i,l+1}$ . Other layers use standard BP.

Adaptive Scaling: After each batch, the feedback matrices are rescaled:

$B_{i,l+1} \leftarrow \left( \frac{\| w_{i,l+1} \|}{\| W_{l+1}^r \|} \right) W_{l+1}^r.$

Server Aggregation: After $E$ local epochs, clients send their updated weights to the server, which computes a weighted average.

This FA insertion mitigates local drift by ensuring that all clients receive backward signals informed by the same global reference, aligning the directions of local updates even under severe heterogeneity (Baek et al., 14 Dec 2025).

3. Theoretical Foundations and Convergence

FLFA's theoretical analysis rests on several key assumptions:

A1: Lipschitz gradients: $\|\nabla J_i(w) - \nabla J_i(v)\| \leq M \|w-v\|$ for all $i, w, v$ .
A2: Unbiased stochastic gradients: $E[\nabla \ell(w;x,y)] = \nabla J_i(w)$ , variance $\leq \sigma^2$ .
A3: Bounded heterogeneity: $\sum_i \pi_i \|\nabla J_i(w) - \nabla J(w)\|^2 \leq \gamma^2$ .
A4: Bounded FA approximation error: for the FA gradient $\nabla^B \ell$ , $\|\nabla^B \ell - \nabla \ell\|^2 \leq G^2$ .

Main theoretical results include:

Lemma 1 (local decrease): Local objective decreases by

$\mathbb{E}[J_i(w_i^r) - J_i(w_i^{r+1})] \geq \eta S (1 - M\eta) \|\nabla J_i(w_i^r)\|^2 - \eta S G \|\nabla J_i(w_i^r)\| - (M\eta^2 S/2)(\sigma^2 + G^2).$

Lemma 2 (global decrease): Analogous result for the global objective, with a similar form but joint dependencies on $G$ and $\gamma$ .
Convergence Outline: These bounds imply convergence to a neighborhood of a stationary point ( $\|\nabla J\| \approx 0$ ), with neighborhood size controlled by $G$ (FA error) and $\gamma$ (heterogeneity). Critically, setting feedback $B = W^r$ ensures the global and local weights are well-aligned, minimizing $G$ and further suppressing drift by reducing $\|w_{i,l+1} - w_{j,l+1}\|$ between clients.

4. Empirical Results and Practical Considerations

FLFA was empirically validated on a diverse set of architectures (MobileNetV2, ResNet-50) and datasets (BloodMNIST, OrganCMNIST, OrganSMNIST, PathMNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, ImageNet-100), with up to 200 participating clients and strong non-IID partitioning using the Dirichlet- $\beta$ scheme (e.g., $\beta = 0.1$ for maximal heterogeneity).

The experimental protocol comprised:

100 rounds (medical/FashionMNIST), 500 rounds (CIFAR-10), 5 local epochs per round.
Training with SGD, momentum $0.9$, learning rate $0.01$ (decayed), batch size $64$.
Randomly selecting $10\%$ of clients per round.

Key evaluation metrics included test accuracy, relative training time, drift ( $H$ per round), and representation quality (intra/inter-class variance and separability ratio).

Highlights from results:

Test accuracy: FLFA improved on all baselines—e.g., FedAvg +2.47% (BloodMNIST), +6.45% (FMNIST); advanced baselines (FedRS, FedLC) gain +1–2%.
Overhead: Computational overhead is negligible ( $1–2\%$ ), with zero added communication.
Drift reduction: FLFA consistently reduced local drift $H$ versus BP, especially in early rounds.
Representation: ~20% improvement in separability ratio on CIFAR-10 with FedAvg+FLFA.
Robustness: Effective even under extreme data skew, low client participation (5%), and deep local training (15 epochs).
Ablations: Using random feedback or dropping adaptive scaling degrades performance; single-layer FA often suffices; best gains achieved by choosing FA layers by lowest gradient cosine similarity.

FLFA's core innovation—using the current global model weights as feedback matrices—differs fundamentally from methods such as direct feedback alignment (DFA), which employs fixed random matrices. DFA is beneficial for resource-limited settings (low-precision, TinyML) but struggles to match BP's accuracy on high-dimensional tasks and convolutional architectures (Colombo et al., 25 Nov 2024).

Related empirical strategies:

Random Feedback (DFA): Not effective on convolutional networks in federated contexts.
Single-layer versus Multi-layer FA: Single FA layer is often sufficient; optimal layer selection depends on gradient cosine similarity statistics.
Adaptive Feedback Scaling: Necessary for stable training; omitting scaling harms performance.
Representation Learning Effects: FA improves latent representation separability beyond overall accuracy increases.

6. Symbol Table

Symbol	Definition
$w_{i,l}^{(r,k)}$	Client $i$ 's weight at round $r$ , layer $l$ , step $k$
$W_l^r$	Global weight at layer $l$ , round $r$
$B_{i,l}$	Feedback matrix for client $i$ , layer $l$
$z_{i,l}, h_{i,l}$	Pre-activation, activation at layer $l$
$\delta_{i,l}$	Error signal at layer $l$ for client $i$
$\odot$	Element-wise (Hadamard) product
$f'$	Derivative of nonlinearity
$\eta$	Learning rate
$S,s$	Number of local steps
$E$	Number of local epochs
$M, \sigma, \gamma, G$	Lipschitz gradient constant, gradient noise bound, heterogeneity bound, FA error

All symbols and workflow steps align directly with those stated in the original framework description (Baek et al., 14 Dec 2025).

7. Summary and Implications

FLFA provides an effective, efficient modification to the federated learning process, leveraging global model weights as fixed feedback matrices in the backward pass to align local updates, suppress local drift, and robustly improve convergence and downstream accuracy. Its minimal compute and communication overhead make it conducive for practical deployment, especially in highly heterogeneous and large-scale federated environments. Empirical and theoretical analyses confirm that FA—when instantiated with global weights and adaptive scaling—offers significant benefits over both standard BP and direct/random feedback methods. Representative benchmarks demonstrate consistent gains across modalities, architectural choices, and data regimes (Baek et al., 14 Dec 2025).

PDF Markdown Chat (Pro)

References (2)

Federated Learning with Feedback Alignment (2025)

TIFeD: a Tiny Integer-based Federated learning algorithm with Direct feedback alignment (2024)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Federated Learning with Feedback Alignment (FLFA).

Federated Learning with Feedback Alignment

1. Federated Learning under Non-IID Data and Local Drift

2. FLFA Algorithm: Incorporating Global Weight Feedback

3. Theoretical Foundations and Convergence

4. Empirical Results and Practical Considerations

5. Extensions and Related Approaches

6. Symbol Table

7. Summary and Implications

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics