Flower Federated Orchestration
- Federated Orchestration (Flower) is a modular framework that coordinates distributed ML training using pluggable strategies, protocols, and architectural abstractions.
- It supports various topologies including centralized, decentralized (gossip-based), and serverless paradigms while integrating resource-aware scheduling and container orchestration.
- Experimental results show near-linear scaling, reduced round duration, and robust performance on heterogeneous clients with minimal orchestration overhead.
Federated Orchestration (Flower) is the set of algorithms, protocols, and architectural abstractions that enable scalable, fault-tolerant, and extensible coordination of ML workloads across geographically distributed clients—while minimizing direct data centralization—using the Flower framework. Flower supports a broad range of federated learning (FL) scenarios, from classical synchronous server–client architectures to fully decentralized and serverless topologies, resource-aware scheduling, and integration with container orchestration and edge platforms.
1. Flower Framework: Orchestration Core
Flower distinguishes itself by a modular, strategy-centric orchestration model that decouples client implementation, communication transport, and FL protocol. The Flower server maintains a global event loop (the “FL loop”), invoking a pluggable Strategy at each round to select clients, distribute training/evaluation instructions, aggregate model updates, and perform any required validation or policy control (Beutel et al., 2020).
A typical FL round in Flower is orchestrated via:
Strategy.configure_round(t, available_clients)→ select subset , with configuration .- Server dispatches
FitIns(config_t, \theta_t)to each client . - Clients perform local training (often multiple epochs), return update with sample counts.
- Server invokes `Strategy.aggregate_round(t, {(\Delta\theta_k, n_k)})\theta_{t+1} = \sum_{k=1}^{|S_t|} \frac{n_k}{\sum_{j=1}^{|S_t|} n_j} \theta_t^k.$$
- Optional: server orchestrates model evaluation and logs/monitors round statistics.
The server performs orchestration without embedding ML- or device-specific logic: extensibility is ensured at both server (via new Strategy subclasses) and client (implementing fit and evaluate hooks) (Beutel et al., 2020, Roth et al., 2024).
2. Orchestration Topologies: Centralized, Decentralized, and Serverless
Flower’s orchestration abstractions enable multiple deployment and topology paradigms:
Centralized Orchestration
Default FL scenarios employ a single server orchestrating synchronous rounds, with clients connecting over gRPC/TCP (Mathur et al., 2021, Belenguer et al., 15 Jan 2025). The server manages client selection, message serialization (e.g., Protobuf, Avro), and aggregation. Strategies like FedAvg, FedAdam, and FedProx are supported natively (Roth et al., 2024).
Decentralized and Gossip-Based Orchestration
GLow extends Flower by “rewiring” the FL loop for fully decentralized, gossip-style learning (Belenguer et al., 15 Jan 2025). In GLow:
- No central server exists. Each peer can act as a client or (rotating) head node.
- A Topology Generator builds a neighbor graph; in each iteration, a peer selected as head requests local weights from its neighbors, aggregates them with its own (equal-weight averaging across the models), and updates its model:
- Orchestration logic is implemented as a “strategy” injected into Flower’s FL loop, preserving simulation semantics but inverting the flow to support peer-to-peer handshakes. This enables simulation of ring, double ring, fully connected, or arbitrary topologies (Belenguer et al., 15 Jan 2025).
Serverless and Asynchronous Orchestration
The flwr-serverless extension further generalizes orchestration by eliminating the server: clients synchronize and aggregate updates purely via a shared object store (“weight store”), implementing synchronous or fully asynchronous (FedAvgAsync) protocols (Namjoshi et al., 2023). Each client runs an AsyncFederatedNode, pushing/pulling weights via S3/Azure/local folders and aggregating locally using standard Flower strategies. Asynchronous execution eliminates straggler bottlenecks and provides straggler-tolerant, fault-resilient orchestration (Namjoshi et al., 2023).
3. Advanced Scheduling, Profiling, and Resource Awareness
Resource heterogeneity and stragglers present core challenges for federated orchestration at scale. Flower provides several extensibility points and built-in mechanisms to profile, schedule, and adapt to client capabilities:
- Protea introduces a client-side profiling layer integrated with Flower, collecting per-round CPU, memory, and GPU utilization, and reporting a single resource score to the server (Zhao et al., 2022). The orchestration strategy (
ProteaFedAvg) selects clients for each round based on resource profiling, prioritizing the most capable clients to maximize throughput and reduce round duration. - Scheduling enhancements include support for per-client timeouts, weighted aggregation by partial work, and importance sampling to improve robustness under device/network variability (Beutel et al., 2020).
- Overhead of profiling and instrumentation is ∼1% per client round, with resource-aware orchestration achieving 1.29× speedup and ∼30% round duration reduction compared to standard FedAvg (Zhao et al., 2022).
4. Integration with External Orchestration Platforms
Federated orchestration using Flower extends into cloud-native and production environments:
- Kubernetes clusters can integrate Flower-based FL controllers for large-scale orchestration, as demonstrated in carbon-aware container scheduling scenarios (Saad et al., 4 Oct 2025). Here, Flower orchestrates XGBoost model training across distributed Kepler-instrumented agents, using specialized aggregation strategies (FedXgbBagging), and integrates with the cluster's scheduling plugins to route workloads toward nodes with optimal carbon–power profiles.
- By wrapping Flower apps as jobs within the NVIDIA FLARE production runtime, federated orchestration can benefit from additional resilience, certificate management, and multi-host reliability, while retaining Flower’s strategy flexibility (Roth et al., 2024).
5. Scalability, Extensibility, and Experimental Findings
Flower supports large-scale federated simulation and real-device experiments. Via its Virtual Client Engine (VCE), Flower can simulate up to 15 million clients on two V100 GPUs by instantiating proxies and swapping models/data on demand (Beutel et al., 2020):
- Experiments show near-linear wall-clock scaling as simulated active clients increase.
- Heterogeneity experiments on Jetson, Raspberry Pi, and Android devices evaluate round time and energy.
- Federated orchestration overhead is minimal compared to client-local training (Beutel et al., 2020).
Decentralized orchestration with GLow demonstrates convergence properties competitive with centralized and classical federated strategies. On MNIST, GLow achieves 0.987 accuracy (FedAvg: 0.985–0.986, Central: 0.989); on CIFAR10, 0.754 vs. 0.778–0.791 (FedAvg) (Belenguer et al., 15 Jan 2025).
In serverless settings, flwr-serverless achieves similar test accuracy and modest speedup (8–15%) compared to lockstep federated rounds, with only slight degradation in heavy label-skew scenarios (Namjoshi et al., 2023).
6. Limitations and Areas for Future Research
Flower’s current orchestration design, while modular, is subject to several limitations identified in the literature:
- Extremely large, non-IID, or highly variable client populations can slow convergence due to statistical heterogeneity (Beutel et al., 2020).
- Existing secure aggregation modules (e.g., Salvia) scale linearly with client count, but further research is required for millions of clients and for efficient decentralized privacy primitives (Beutel et al., 2020).
- Serverless orchestration is not yet staleness- or recency-aware, and scaling to 10+ real-network nodes with high concurrency requires further validation (Namjoshi et al., 2023).
- GLow and related decentralized strategies are in early development; Byzantine fault-tolerance and dynamic topology adaptation remain open topics (Belenguer et al., 15 Jan 2025).
Planned future directions include hierarchical orchestration (multi-level cloud/edge federations), adaptive resource- and heterogeneity-aware strategies, direct integration with containerized cloud scheduling primitives, and formal incorporation of privacy-preserving mechanisms such as differential privacy and hardware-enforced attestation (Beutel et al., 2020, Saad et al., 4 Oct 2025).
7. Summary Table: Key Flower Orchestration Extensions
| Name/Extension | Orchestration Mode | Distinguishing Mechanism |
|---|---|---|
| Standard FedAvg | Central server, sync rounds | Weighted averaging; stateless clients |
| GLow (Belenguer et al., 15 Jan 2025) | Decentralized/Gossip | Rotating head, peer-to-peer pulls, topology-driven |
| flwr-serverless (Namjoshi et al., 2023) | Serverless, async/sync | Shared object store for push/pull, local aggregation |
| Protea (Zhao et al., 2022) | Resource-aware scheduling | Per-client profiling, adaptive client selection |
| FedXgbBagging (Saad et al., 4 Oct 2025) | Container orchestration | Tree union aggregation (XGBoost), tight Kubernetes binding |
All orchestrations inherit from Flower’s extensible strategy and client interface, enabling custom aggregation, sampling, evaluation, and fault/scaling policies, and providing a reusable foundation for experimental and production federated ML systems.