Federated Learning Methodologies

Updated 4 April 2026

Federated learning methodologies are distributed approaches that enable collaborative model training while maintaining data privacy by keeping raw data local.
They employ advanced optimization algorithms, communication compression, and cryptographic techniques to address statistical, system, and model heterogeneity.
Practical implementations emphasize secure aggregation, adaptive client sampling, and local update strategies to ensure robust, scalable, and efficient learning.

Federated learning is a distributed machine learning paradigm that orchestrates the collaborative optimization of a global model across decentralized data silos—ranging from edge devices and personal mobiles to organizational data centers—while ensuring that raw data remains strictly local. This approach is distinguished by its capacity to mitigate privacy risks associated with centralized aggregation, adapt to data and system heterogeneity, and offer rigorous algorithmic and software solutions leveraging first- and second-order optimization, communication compression, and cryptographic techniques. Modern FL systems must navigate five interdependent methodological challenges: statistical and system heterogeneity, communication efficiency, privacy/security, implementation-to-theory feedback, and robust software design (Burlachenko, 9 Sep 2025, Horváth, 2022).

1. Formal Problem Statement and Objective Functions

Classical federated learning seeks a global model $x \in \mathbb{R}^d$ that (approximately) minimizes the global empirical risk, which is a weighted sum of local client losses: $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ where $F_i(x)$ is the empirical loss (plus local regularizer, as needed) on client $i$ . Weights $w_i$ are typically proportional to local dataset size $n_i$ (Burlachenko, 9 Sep 2025). Each $F_i(x)$ is generally written as

$F_i(x) = \frac{1}{n_i} \sum_{j=1}^{n_i} \mathcal{L}(\hat F(a_{ij}; x), b_{ij}) + R_i(x)$

covering both supervised and regularized objectives. The paradigm encompasses both global model training and, via appropriate coupling/decoupling, personalized objectives (e.g., local $\{x_i\}$ tied by consensus or regularization terms) (Horváth, 2022).

2. Optimization Algorithms: Core and Advanced Federated Methods

2.1 Federated Averaging (FedAvg) and Extensions

The canonical FedAvg protocol [McMahan et al.]:

Server samples a subset $S^{(t)}$ , broadcasts current global model $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 0.
Each client $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 1 initializes $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 2, performs $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 3 steps of local SGD: $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 4.
Compute update $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 5; server aggregates as $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 6 (Burlachenko, 9 Sep 2025).

Strong convexity and smoothness yield $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 7 convergence; general non-convex settings retain $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 8 rates (Burlachenko, 9 Sep 2025).

2.2 Heterogeneity-Robust Algorithms

FedProx: Adds proximal regularization $\min_{x\in\mathbb{R}^d}\; f(x) = \sum_{i=1}^n w_i F_i(x)$ 9 to each local subproblem to restrict drift under statistical heterogeneity ( $F_i(x)$ 0 tuned according to divergence) (Burlachenko, 9 Sep 2025).
SCAFFOLD: Introduces client and global control variates $F_i(x)$ 1, correcting for client-drift. Local updates subtract control variate $F_i(x)$ 2 and add $F_i(x)$ 3 (Burlachenko, 9 Sep 2025).
Custom local step-size adaptation ( $F_i(x)$ 4) removes objective bias with heterogeneous workloads as in FedShuffle (Horváth, 2022).

2.3 Communication-Efficient Algorithms

Compression and quantization are expressed via operator $F_i(x)$ 5:

Unbiased compressors: $F_i(x)$ 6, $F_i(x)$ 7.
Methods: Natural compression (randomized power-of-two quantization, $F_i(x)$ 8), natural dithering, top- $F_i(x)$ 9 sparsification, QSGD (Horváth, 2022).
Error-feedback: EF21 and EF21-W maintain a client-local "shift" $i$ 0 to correct bias from contractive compressors, tuned with $i$ 1 for optimal convergence under $i$ 2-smoothness (Burlachenko, 9 Sep 2025).

2.4 Client Sampling and Aggregation

When only $i$ 3 participate per round, optimal independent sampling weights $i$ 4, with variance-minimizing aggregation weights (Horváth, 2022). This approach reduces gradient-variance and improves convergence/fairness versus uniform sampling.

3. Handling Heterogeneity: Data, System, and Model

3.1 Statistical Heterogeneity

Non-IID data leads to $i$ 5 with disparate minimizers and smoothness parameters. Remedies include:

Proximal regularization (FedProx)
Variance reduction (SCAFFOLD, MARINA, DIANA)

3.2 Systems Heterogeneity

Device variability (capability, connectivity) leads to stragglers and partial participation. Approaches:

Asynchronous aggregation: Updates weighted by staleness or deadline-aware discarding (Burlachenko, 9 Sep 2025).
Dropout-based elasticity: Ordered Dropout (FjORD) samples submodels $i$ 6 per client-device capability, aggregates only overlapping coordinates, and applies knowledge distillation (Horváth, 2022).
Aggregation alignment: Slicing model weights across clients to match subnetwork granularity (Horváth, 2022).

3.3 Model Heterogeneity

Personalization is effected by:

Base + personalization layer splits: Shared backbone with private "head" (cf. FedPer, APFL) (Horváth, 2022)
Regularization-based decoupling: Multi-task or consensus constraints (see Section 1).
Knowledge distillation or mutual learning: Alignment of outputs/logits in lieu of parameter exchange.

4. Privacy, Security, and Cryptographic Primitives

4.1 Differential Privacy (DP)

Local/gradient perturbation: Adds calibrated noise $i$ 7 to per-round updates, guaranteeing $i$ 8-DP (Burlachenko, 9 Sep 2025, Horváth, 2022).
Privacy-utility trade-off: Aggregate statistics scale with $i$ 9; over-noising degrades convergence except at population scale.

4.2 Secure Aggregation and Homomorphic Encryption

Secure multi-party computation: Server computes only aggregate, never sees individual updates (Burlachenko, 9 Sep 2025).
Homomorphic encryption (CKKS): Aggregate encrypted gradients at high computational and memory cost.

4.3 Lightweight Cryptography and Correlated Compression

PermK+AES: Clients compress and encrypt model blocks; aggregation is pure concatenation and MAC verification, with no server arithmetic (Burlachenko, 9 Sep 2025). Ensures semantic security, MAC integrity, $w_i$ 0 communication.

5. Software Architecture, Implementation, and Theoretical Feedback

5.1 Modular FL Frameworks

Research simulators (e.g., FL_PyTorch) expose essential FedAvg skeletons with decoupled functional modules: initialization, local gradients, optimizers, aggregation, state updates. Support for plugin compressors, optimizers, DP, encryption routines (Burlachenko, 9 Sep 2025).
Design patterns: Strict broadcast→local update→send back→aggregate→update cycles, per-client in-memory state for compressors and control variates, minimal or no server arithmetic in secure/compressed modes.

5.2 Implementation-Driven Theoretical Discoveries

Weighted client aggregation by $w_i$ 1 (local smoothness) observed during EF21 implementation led to sharper convergence proofs and the EF21-W algorithm (Burlachenko, 9 Sep 2025).
Elimination of server-side computation via PermK+AES demonstrated feasibility of zero-arithmetic secure aggregation.

5.3 Empirical Benchmarks

Small, custom autodiff frameworks (e.g., BurTorch) enable $w_i$ 2– $w_i$ 3 speedup versus PyTorch/TF/JAX for certain compute graphs (Burlachenko, 9 Sep 2025).

6. Comparative Algorithmic Trade-offs

Method	Statistical Heterogeneity	Communication	State Overhead	Security/Privacy	Theoretical Rate
FedAvg	Poor	$w_i$ 4	Minimal	Baseline	$w_i$ 5
FedProx	Improved	$w_i$ 6	Prox update	Baseline	$w_i$ 7
SCAFFOLD	Strong	$w_i$ 8	Control variate	Baseline	$w_i$ 9 (practical)
EF21(–W)	Strong	$n_i$ 0	Shift state	Baseline	$n_i$ 1
PermK+AES	Baseline	$n_i$ 2	Encryption key	Semantic/MAC	Exact aggregation
HE (CKKS)	Baseline	Low	Encryption	High	High compute/memory cost
DP	Baseline	$n_i$ 3	N/A	$n_i$ 4-DP	Utility decreases with $n_i$ 5

FedAvg is simple but suffers with high data heterogeneity and scales linearly in model size. SCAFFOLD and FedProx mitigate client drift but have increased state. Compression with EF21(–W) and related schemes offers strong communication savings at the cost of additional local memory. Secure aggregation and DP provide strong privacy guarantees, but with commensurate trade-offs in computation and/or statistical efficiency.

7. Practical Recommendations and Guidelines

Employ natural compression ( $n_i$ 6) or advanced dithering for communication-sensitive deployments (Horváth, 2022).
Use variance-aware client sampling ( $n_i$ 7) to achieve fairness and minimize variance in mini-batch SGD.
Implement adaptive aggregation and per-step local learning rates in the presence of workload imbalance or device heterogeneity (Horváth, 2022).
For secure FL, combine correlated compression and lightweight symmetric encryption (e.g., AES-EAX) to eliminate server compute and preserve privacy.
Translate practical findings into algorithmic improvements: monitor implementation bottlenecks to discover unanticipated theoretical refinements (e.g., EF21-W).
Benchmark efficiency on deployment-relevant models with compact, modular codebases, exploiting parallelism at both thread and device level (Burlachenko, 9 Sep 2025).

In summary, federated learning methodologies now integrate optimization theory, advanced communication- and privacy-preserving techniques, and both software-practical and theoretical frameworks to enable scalable, robust, and provably secure distributed learning in realistic, heterogeneous environments. The field continues to advance through a dynamic interplay between theoretical innovation, practical engineering, and systematic empirical validation (Burlachenko, 9 Sep 2025, Horváth, 2022).

Markdown Report Issue Upgrade to Chat

References (2)

Optimization Methods and Software for Federated Learning (2025)

Better Methods and Theory for Federated Learning: Compression, Client Selection and Heterogeneity (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Federated Learning Methodologies.

Federated Learning Methodologies

1. Formal Problem Statement and Objective Functions

2. Optimization Algorithms: Core and Advanced Federated Methods

2.1 Federated Averaging (FedAvg) and Extensions

2.2 Heterogeneity-Robust Algorithms

2.3 Communication-Efficient Algorithms

2.4 Client Sampling and Aggregation

3. Handling Heterogeneity: Data, System, and Model

3.1 Statistical Heterogeneity

3.2 Systems Heterogeneity

3.3 Model Heterogeneity

4. Privacy, Security, and Cryptographic Primitives

4.1 Differential Privacy (DP)

4.2 Secure Aggregation and Homomorphic Encryption

4.3 Lightweight Cryptography and Correlated Compression

5. Software Architecture, Implementation, and Theoretical Feedback

5.1 Modular FL Frameworks

5.2 Implementation-Driven Theoretical Discoveries

5.3 Empirical Benchmarks

6. Comparative Algorithmic Trade-offs

7. Practical Recommendations and Guidelines

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Federated Learning Methodologies

1. Formal Problem Statement and Objective Functions

2. Optimization Algorithms: Core and Advanced Federated Methods

2.1 Federated Averaging (FedAvg) and Extensions

2.2 Heterogeneity-Robust Algorithms

2.3 Communication-Efficient Algorithms

2.4 Client Sampling and Aggregation

3. Handling Heterogeneity: Data, System, and Model

3.1 Statistical Heterogeneity

3.2 Systems Heterogeneity

3.3 Model Heterogeneity

4. Privacy, Security, and Cryptographic Primitives

4.1 Differential Privacy (DP)

4.2 Secure Aggregation and Homomorphic Encryption

4.3 Lightweight Cryptography and Correlated Compression

5. Software Architecture, Implementation, and Theoretical Feedback

5.1 Modular FL Frameworks

5.2 Implementation-Driven Theoretical Discoveries

5.3 Empirical Benchmarks

6. Comparative Algorithmic Trade-offs

7. Practical Recommendations and Guidelines

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research