Papers
Topics
Authors
Recent
2000 character limit reached

Federated Learning with Feedback Alignment

Updated 21 December 2025
  • FLFA is a federated learning technique that employs global weight feedback to align local client updates and mitigate drift in non-IID data settings.
  • It modifies the traditional backpropagation by substituting local weight transposes with fixed global matrices, leading to improvements in convergence and model accuracy.
  • Empirical results demonstrate robust gains, including up to a 20% boost in representation quality, with negligible computational cost and zero additional communication.

Federated Learning with Feedback Alignment (FLFA) refers to a class of techniques in federated learning (FL) where feedback alignment (FA) is incorporated into local model training to reduce the adverse effects of client data heterogeneity and local drift. FLFA achieves alignment of local client updates with the global objective by modifying the backpropagation procedure to use global model weights as fixed feedback matrices during backward passes. This yields robust empirical improvements in model accuracy, representation quality, and convergence with minimal additional computational and communication cost, especially under non-IID scenarios (Baek et al., 14 Dec 2025). FLFA should be contrasted with direct feedback alignment (DFA), which replaces local gradients with random fixed feedback matrices—a direction explored for resource-constrained federated learning (Colombo et al., 25 Nov 2024).

1. Federated Learning under Non-IID Data and Local Drift

In the canonical FL setup, NN clients each hold local datasets DiD_i of size Di|D_i|; the global dataset has size D=iDi|D| = \sum_i |D_i| and weights πi=Di/D\pi_i = |D_i| / |D|. Each client minimizes its local expected loss Ji(w)=E(x,y)Di[(w;x,y)]J_i(w) = \mathbb{E}_{(x,y)\sim D_i}[\ell(w; x, y)], while the server aims to minimize the weighted average objective J(w)=iπiJi(w)J(w) = \sum_i \pi_i J_i(w). The most established algorithm is FedAvg, where in each round rr, clients initialize wi(r,0)=Wrw_i^{(r,0)} = W^r (the global model), perform ss local SGD steps, and communicate their resulting updates to the server, which sets W(r+1)=iπiwi(r,s)W^{(r+1)} = \sum_i \pi_i w_i^{(r,s)}.

A central challenge in federated settings is data heterogeneity: when clients' data distributions are non-identically and independently distributed (non-IID), local updates divaricate, leading to "local drift." The degree of drift is quantified by

H=1Ki=1KΔwiΔw2,H = \frac{1}{K} \sum_{i=1}^K \| \Delta w_i - \overline{\Delta w} \|_2,

where Δwi\Delta w_i is the update from client ii and Δw\overline{\Delta w} is the mean update. Large HH values impede global model convergence by causing update mismatch.

2. FLFA Algorithm: Incorporating Global Weight Feedback

The FLFA algorithm modifies the local backward pass on each client by using a fixed feedback matrix derived from current global model weights, rather than local weight transposes, in some or all layers. This process has negligible extra computation and incurs zero additional communication since the required global weights are already available at synchronization.

Let LL denote the number of layers. For a given layer ll, client ii's local weight is wi,lw_{i,l}, global weight is WlW_l, and the set of layers where FA is applied is denoted F\mathcal{F}. Forward activations and error signals are propagated as usual. In standard backpropagation (BP), error signals are given by

δi,l=(wi,l+1δi,l+1)f(zi,l),\delta_{i,l} = (w_{i,l+1}^\top \delta_{i,l+1}) \odot f'(z_{i,l}),

with corresponding weight updates.

In FLFA, for layers lFl \in \mathcal{F}, the backward computation is

δi,l=(Bi,l+1δi,l+1)f(zi,l),\delta_{i,l} = (B_{i,l+1}^\top \delta_{i,l+1}) \odot f'(z_{i,l}),

where the feedback matrix Bi,l+1B_{i,l+1} is initialized to Wl+1rW_{l+1}^r (current global weight at layer l+1l+1 at round rr) and adaptively scaled to maintain norm parity with wi,l+1w_{i,l+1}. Other layers use standard BP.

Adaptive Scaling: After each batch, the feedback matrices are rescaled:

Bi,l+1(wi,l+1Wl+1r)Wl+1r.B_{i,l+1} \leftarrow \left( \frac{\| w_{i,l+1} \|}{\| W_{l+1}^r \|} \right) W_{l+1}^r.

Server Aggregation: After EE local epochs, clients send their updated weights to the server, which computes a weighted average.

This FA insertion mitigates local drift by ensuring that all clients receive backward signals informed by the same global reference, aligning the directions of local updates even under severe heterogeneity (Baek et al., 14 Dec 2025).

3. Theoretical Foundations and Convergence

FLFA's theoretical analysis rests on several key assumptions:

  • A1: Lipschitz gradients: Ji(w)Ji(v)Mwv\|\nabla J_i(w) - \nabla J_i(v)\| \leq M \|w-v\| for all i,w,vi, w, v.
  • A2: Unbiased stochastic gradients: E[(w;x,y)]=Ji(w)E[\nabla \ell(w;x,y)] = \nabla J_i(w), variance σ2\leq \sigma^2.
  • A3: Bounded heterogeneity: iπiJi(w)J(w)2γ2\sum_i \pi_i \|\nabla J_i(w) - \nabla J(w)\|^2 \leq \gamma^2.
  • A4: Bounded FA approximation error: for the FA gradient B\nabla^B \ell, B2G2\|\nabla^B \ell - \nabla \ell\|^2 \leq G^2.

Main theoretical results include:

  • Lemma 1 (local decrease): Local objective decreases by

E[Ji(wir)Ji(wir+1)]ηS(1Mη)Ji(wir)2ηSGJi(wir)(Mη2S/2)(σ2+G2).\mathbb{E}[J_i(w_i^r) - J_i(w_i^{r+1})] \geq \eta S (1 - M\eta) \|\nabla J_i(w_i^r)\|^2 - \eta S G \|\nabla J_i(w_i^r)\| - (M\eta^2 S/2)(\sigma^2 + G^2).

  • Lemma 2 (global decrease): Analogous result for the global objective, with a similar form but joint dependencies on GG and γ\gamma.
  • Convergence Outline: These bounds imply convergence to a neighborhood of a stationary point (J0\|\nabla J\| \approx 0), with neighborhood size controlled by GG (FA error) and γ\gamma (heterogeneity). Critically, setting feedback B=WrB = W^r ensures the global and local weights are well-aligned, minimizing GG and further suppressing drift by reducing wi,l+1wj,l+1\|w_{i,l+1} - w_{j,l+1}\| between clients.

4. Empirical Results and Practical Considerations

FLFA was empirically validated on a diverse set of architectures (MobileNetV2, ResNet-50) and datasets (BloodMNIST, OrganCMNIST, OrganSMNIST, PathMNIST, Fashion-MNIST, CIFAR-10, CIFAR-100, ImageNet-100), with up to 200 participating clients and strong non-IID partitioning using the Dirichlet-β\beta scheme (e.g., β=0.1\beta = 0.1 for maximal heterogeneity).

The experimental protocol comprised:

  • 100 rounds (medical/FashionMNIST), 500 rounds (CIFAR-10), 5 local epochs per round.
  • Training with SGD, momentum $0.9$, learning rate $0.01$ (decayed), batch size $64$.
  • Randomly selecting 10%10\% of clients per round.

Key evaluation metrics included test accuracy, relative training time, drift (HH per round), and representation quality (intra/inter-class variance and separability ratio).

Highlights from results:

  • Test accuracy: FLFA improved on all baselines—e.g., FedAvg +2.47% (BloodMNIST), +6.45% (FMNIST); advanced baselines (FedRS, FedLC) gain +1–2%.
  • Overhead: Computational overhead is negligible (12%1–2\%), with zero added communication.
  • Drift reduction: FLFA consistently reduced local drift HH versus BP, especially in early rounds.
  • Representation: ~20% improvement in separability ratio on CIFAR-10 with FedAvg+FLFA.
  • Robustness: Effective even under extreme data skew, low client participation (5%), and deep local training (15 epochs).
  • Ablations: Using random feedback or dropping adaptive scaling degrades performance; single-layer FA often suffices; best gains achieved by choosing FA layers by lowest gradient cosine similarity.

FLFA's core innovation—using the current global model weights as feedback matrices—differs fundamentally from methods such as direct feedback alignment (DFA), which employs fixed random matrices. DFA is beneficial for resource-limited settings (low-precision, TinyML) but struggles to match BP's accuracy on high-dimensional tasks and convolutional architectures (Colombo et al., 25 Nov 2024).

Related empirical strategies:

  • Random Feedback (DFA): Not effective on convolutional networks in federated contexts.
  • Single-layer versus Multi-layer FA: Single FA layer is often sufficient; optimal layer selection depends on gradient cosine similarity statistics.
  • Adaptive Feedback Scaling: Necessary for stable training; omitting scaling harms performance.
  • Representation Learning Effects: FA improves latent representation separability beyond overall accuracy increases.

6. Symbol Table

Symbol Definition
wi,l(r,k)w_{i,l}^{(r,k)} Client ii's weight at round rr, layer ll, step kk
WlrW_l^r Global weight at layer ll, round rr
Bi,lB_{i,l} Feedback matrix for client ii, layer ll
zi,l,hi,lz_{i,l}, h_{i,l} Pre-activation, activation at layer ll
δi,l\delta_{i,l} Error signal at layer ll for client ii
\odot Element-wise (Hadamard) product
ff' Derivative of nonlinearity
η\eta Learning rate
S,sS,s Number of local steps
EE Number of local epochs
M,σ,γ,GM, \sigma, \gamma, G Lipschitz gradient constant, gradient noise bound, heterogeneity bound, FA error

All symbols and workflow steps align directly with those stated in the original framework description (Baek et al., 14 Dec 2025).

7. Summary and Implications

FLFA provides an effective, efficient modification to the federated learning process, leveraging global model weights as fixed feedback matrices in the backward pass to align local updates, suppress local drift, and robustly improve convergence and downstream accuracy. Its minimal compute and communication overhead make it conducive for practical deployment, especially in highly heterogeneous and large-scale federated environments. Empirical and theoretical analyses confirm that FA—when instantiated with global weights and adaptive scaling—offers significant benefits over both standard BP and direct/random feedback methods. Representative benchmarks demonstrate consistent gains across modalities, architectural choices, and data regimes (Baek et al., 14 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Federated Learning with Feedback Alignment (FLFA).