Energy-Efficient Decentralized Federated Learning

Updated 6 January 2026

Energy-efficient decentralized federated learning is a distributed training paradigm that minimizes energy consumption using adaptive protocols, device scheduling, and model compression while maintaining privacy.
It employs decentralized update schemes such as mixing matrix design and peer selection to balance convergence speed and energy savings under heterogeneous device constraints.
The approach integrates adaptive local computations, network sparsification, and energy-communication trade-offs to optimize performance on resource-constrained, large-scale systems.

Energy-efficient decentralized federated learning (EE-DFL) comprises a spectrum of algorithmic, system-level, and architectural approaches that target minimum energy resource expenditure in large-scale, privacy-preserving distributed training—typically in the absence of a central coordinator. Energy efficiency is achieved through scheme selection, device scheduling, workload adaptation, communication minimization, model compression, and topology-aware protocol engineering, with guarantees for convergence and resilience under device heterogeneity and stochastic constraints.

1. Fundamental Principles and Problem Formulation

EE-DFL systems distribute model training across edge devices or silos, each constrained by local energy budgets, computational resources, and communication costs. The global objective is typically formulated as the minimization of aggregate empirical risk: $w^* = \arg\min_w f(w)\,,\qquad f(w) = \frac{1}{N}\sum_{i=1}^N F_i(w)$ where $F_i(w) = \mathbb{E}_{x \sim D_i}[\ell(w;x)]$ , and $N$ is the number of devices.

Key constraints include:

Per-device energy budgets $E_i^{\rm tot}$
Communication bandwidth/time constraints
Heterogeneous data distributions (non-IID settings)
Decentralized aggregation (peer-to-peer or clustered communication)

The joint optimization involves deciding local training rounds, peer aggregation schedules, and communication topologies to minimize total (often maximum) energy consumption subject to target accuracy and latency budgets (Yan et al., 2024, Zhang et al., 30 Dec 2025, Kim et al., 2021).

2. Decentralized Update Schemes and Communication Protocols

Decentralized FL moves away from the central aggregator paradigm. Protocols are built on synchronous, asynchronous, and semi-synchronous rounds, with decentralized peer aggregation via mixing matrices or region leaders.

Mixing Matrix Design: At each round $t$ , devices synchronize via a time-varying mixing matrix $W^{(t)}$ , taking local stochastic gradient steps and aggregating with neighbors: $x_i^{(t+1)} = \sum_{j=1}^N W^{(t)}[i,j](x_j^{(t)} - \eta\, g_j^{(t)})$ Optimizing $W^{(t)}$ over time and phase splits trades off “mixing speed” and per-node transmission energy (Zhang et al., 30 Dec 2025).
Peer Selection & Clustering: Opportunistic communication-efficient schemes select collaboration peers by maximizing knowledge gain per unit energy, subject to neighbor set cardinality and regularization constraints (Masmoudi et al., 2024). Hierarchical and overlapped clustering further reduce communication overhead, with core clients or bridge devices aggregating compressed updates (Zhu et al., 2024, Al-Abiad et al., 2022).
Aggregation Energy Models: Approaches include broadcast (single transmission per round) and unicast (per-link transmissions) modeling, with device activation probabilities tuned to energy budgets (Zhang et al., 30 Dec 2025).

3. Local Computation and Scheduling Optimization

Energy usage is tightly controlled by adjusting the number of local gradients (mini-batch steps) and computation frequency per device:

Adaptive local rounds: Devices allocate training steps over $T$ communication rounds by solving per-device constrained minimization problems: $\tau_{i,t}^* \propto \ln\left(\frac{\zeta - t+1}{T-t+1}\right)$ subject to total energy and latency limits. Early rounds are allocated fewer steps, preserving resources for later optimization (Yan et al., 2024).
Device Selection and Execution Targets: RL-based schedulers select which devices participate and which execution targets (CPU, DVFS state, GPU) are used per aggregation round to minimize the convergence time and total energy (Kim et al., 2021). The joint time-energy objective is

$F_i(w) = \mathbb{E}_{x \sim D_i}[\ell(w;x)]$ 0

with stochastic adaptation to runtime system variance.

4. Model Compression and Network Sparsification Techniques

Reducing communication and computation energy is realized through model sparsification, quantization, and lossy compression:

Model pruning and random projection: Selective thresholding removes low-magnitude weights, followed by dimensionality reduction via shared random matrices. Compression is applied at local devices before aggregation, further reducing transmitted payloads (Zhu et al., 2024, Domini et al., 10 Jul 2025).
Sparse self-organization: Devices form proximity-based federations and exchange only compressed (sparse) models with neighbors. Aggregated models are reconstructed by region leaders using FedAvg (Domini et al., 10 Jul 2025). Increasing sparsity to $F_i(w) = \mathbb{E}_{x \sim D_i}[\ell(w;x)]$ 1 halves energy with negligible accuracy loss; higher sparsity trades further energy savings for reduced model performance.
Slimmable and dynamic model architectures: Devices may train “width-adjustable” models (e.g., SlimFL) and report different model sizes per round. Aggregation adopts ordered-dropout mechanisms so that parameter alignment is maintained under heterogeneous model rates (Baek et al., 2021, Kumar et al., 2024).

5. Energy-Communication Trade-offs and Theoretical Guarantees

Convergence and energy analyses yield explicit trade-offs:

Time-varying mixing vs. sparsity: Multi-phase schedules employ sparse communication early, saving energy, followed by denser mixing for fast convergence, balancing network lifetime and accuracy (Zhang et al., 30 Dec 2025).
Decentralized graph-based aggregation: Minimum Spanning Tree (MST), Ring-AllReduce, and randomized gossip protocols adapt aggregation energy to dynamic link costs, with MST and ring structures yielding dramatic aggregation energy reductions versus naive gossip (Yan et al., 2024).
Over-the-air digital aggregation: Multi-bit AirComp permits simultaneous channel aggregation with quantized gradients, minimizing spectrum resource usage and device transmission energy. Communication energy is integrated into the global rounds via closed-form expressions (Li et al., 2022).
Peer selection impact: Trade-offs between energy and model loss are governed by the density of peer selection regularizers and collaboration topology; sparse peering can save up to 80% energy at sub-2% accuracy cost relative to all-to-all schemes (Masmoudi et al., 2024).

6. Experimental Benchmarks and Quantitative Outcomes

Extensive simulation and real-world experiments demonstrate the following:

Multi-phase mixing matrix design achieves up to 50% maximum per-node energy reduction at 1% test error compared to previous baselines (Zhang et al., 30 Dec 2025).
Sparse self-federated learning with $F_i(w) = \mathbb{E}_{x \sim D_i}[\ell(w;x)]$ 2– $F_i(w) = \mathbb{E}_{x \sim D_i}[\ell(w;x)]$ 3 delivers bandwidth and energy savings of 30–70%, with negligible test accuracy degradation (Domini et al., 10 Jul 2025).
LeanFed adapts data usage per device to battery constraints, consistently outperforming vanilla FedAvg by 7-10% test accuracy and preserving client availability across rounds (Pereira et al., 2024).
Hierarchical FL with adaptive clustering and model compression cuts per-round energy by 30–50% compared to multi-layer or quantization–only methods (Zhu et al., 2024).
AutoFL’s RL-based device scheduling achieves 3.6× faster model convergence and 4.7–5.2× higher energy efficiency than static participant selection (Kim et al., 2021).
Over-the-air computation yields up to 85% energy savings and orders of magnitude reduction in spectrum utilization without compromising model accuracy (Li et al., 2022).
FL-EOCD enables fully decentralized aggregation with up to 55% energy and 50% latency reductions compared to centralized or hierarchical alternatives (Al-Abiad et al., 2022).

7. Limitations, Extensions, and Future Directions

Current EE-DFL frameworks face several open challenges and directions:

Accurate energy consumption estimation and runtime profiling remain necessary for dynamic optimization (Pereira et al., 2024).
Heterogeneous sparse aggregation, lossy wireless links, and coded aggregation protocols present theoretical and engineering frontiers (Domini et al., 10 Jul 2025).
Adaptive phase scheduling and topology evolution are critical for balancing energy savings and accuracy under device and network churn (Zhang et al., 30 Dec 2025).
Extensions toward peer-to-peer SVD merge, incremental joining/removal, secure aggregation, and dynamic compression policies are being considered for large-scale, robust deployments (Fontenla-Romero et al., 2023, Domini et al., 10 Jul 2025).
Benchmarking frameworks require enhanced support for proximity federation, dynamic topology, and self-organizing coordination regions (Domini et al., 10 Jul 2025).

Collectively, energy-efficient decentralized federated learning architectures synthesize adaptive protocol engineering, compression, and scheduling techniques for scalable, sustainable, and privacy-preserving collaborative intelligence on energy-constrained distributed systems.