Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 171 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 60 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 437 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Independent Client Sampling in Federated Learning

Updated 5 November 2025
  • Independent client sampling is defined by using independent Bernoulli or multinomial draws to select clients, ensuring unbiased gradient aggregation in distributed learning.
  • This methodology enhances scalability, fairness, and privacy by accommodating non-i.i.d. data and reducing the impact of stragglers in federated learning settings.
  • Adaptive and stratified extensions optimize convergence and resource allocation by tuning inclusion probabilities based on client statistics, system constraints, and privacy budgets.

Independent client sampling refers to a family of methodologies in distributed statistical learning and quality control—most prominently in federated learning (FL)—that select clients (or units) for participation in each round or stage via independent Bernoulli or multinomial draws, often with arbitrary inclusion probabilities. This design contrasts with fully deterministic or coordinated (dependent) sampling and has become foundational for scalability, privacy, fairness, and statistical efficiency in modern multi-device and privacy-aware distributed learning settings.

1. Principles and Mathematical Formulation

In independent client sampling, each client i{1,...,N}i \in \{1, ..., N\} is independently sampled in round tt with probability qitq_i^t, possibly dependent on system, statistical, or privacy constraints. Formally:

IitBernoulli(qit)\mathbb{I}_i^t \sim \mathrm{Bernoulli}(q_i^t)

where Iit\mathbb{I}_i^t indicates client ii’s participation. The set of sampled clients is St={i:Iit=1}S^t = \{i : \mathbb{I}_i^t = 1 \}. Aggregation rules are typically designed for unbiasedness; for FL gradient aggregation,

g^t=i=1NaiIitqitgit\hat{g}^t = \sum_{i=1}^N a_i \frac{\mathbb{I}_i^t}{q_i^t} g_i^t

where aia_i is client's data weight and gitg_i^t its local (stochastic) update. This structure ensures E[g^t]=i=1Naigit\mathbb{E}[\hat{g}^t] = \sum_{i=1}^N a_i g_i^t regardless of qit>0q_i^t > 0.

The analysis of convergence and efficiency depends on the properties of the induced random aggregation weights (variance, covariance), the system and statistical heterogeneity across clients, and possibly additional constraints (privacy budgets, bandwidth allocations). Notable theoretical results show that even non-uniform, arbitrary qitq_i^t can be accommodated while retaining convergence guarantees, provided the aggregation is appropriately debiased (Fraboni et al., 2021, Luo et al., 2021, Grudzień et al., 2022, Geng et al., 15 Feb 2024).

2. Historical Context and Motivating Applications

Independent client sampling has evolved from earlier acceptance sampling in industrial quality control (Steland, 2014), where independent test (unit) sampling enables tractable, unbiased inference under unknown distributions. In federated learning, where global data centralization is infeasible, independent sampling supports:

Early works focus on random sampling as a default, later extended and refined to stratified, importance-weighted, or fairness-aware variants.

3. Advanced Methodologies: Adaptive, Stratified, Privacy- and Fairness-Aware Sampling

A range of strategies build upon the independent client sampling paradigm to address practical challenges:

a) Adaptive Probability Tuning

Sampling probabilities are optimized to minimize convergence time or estimator variance, leveraging statistical heterogeneity (local data size/gradient norm) and system constraints (bandwidth, computation):

b) Stratified and Importance Sampling

Clients are grouped into strata based on compressed gradients or metadata, with Neyman allocation minimizing estimator variance under round-level sampling quotas (Slessor et al., 18 Dec 2024). Importance sampling adjusts qitq_i^t in proportion to informative statistics (e.g., gradient norms, diversity metrics).

c) Privacy-Aware and Individualized Sampling

To achieve personalized or heterogeneous differential privacy (DP) guarantees:

  • Individualized DP through Sampling: Each client sets a privacy budget εi\varepsilon_i. The server computes a group-specific sampling rate qiq_i such that cumulative participation attains (εi,δ)(\varepsilon_i, \delta)-DP, with a fixed noise multiplier (Lange et al., 29 Jan 2025).
  • Game-Theoretic and Incentive-Aligned Sampling: Sampling probabilities are coordinated via economic mechanisms—clients declare privacy cost functions, and Stackelberg equilibria yield an optimal trade-off between privacy, participation, and model utility (Yuan et al., 7 Dec 2024).

d) Fairness-Driven and Diversity-Promoting Approaches

Limitations of uniform sampling in representing all clients equitably motivate submodular maximization approaches (SUBTRUNC, UNIONFL) (Jiménez et al., 24 Aug 2024), or graph-based diversity constraints (Wang et al., 2022), to ensure balanced or diverse participation over time.

4. Implications: Convergence, Efficiency, Privacy, and Fairness

Empirical and theoretical studies support clear distinctions between independent and non-independent sampling:

Dimension Independent Sampling (with optimal qitq_i^t) Uniform/Dependent Sampling
Convergence Tight bounds; rate scales with inverse of qitq_i^t Slower for heterogeneous gig_i or tit_i
Wall-clock Time Minimized via qitq_i^t adaptive to slow clients Bottlenecked by stragglers
Variance Can be minimized/adapted round-wise Higher, may include redundant clients
Privacy Enables individualized budgets; efficient under DP Weak for heterogeneous privacy needs
Fairness Customizable for inclusion/diversity; resilient Exclusion or bias possible
Practicality Robust to client dropout, varying availability Less robust

Empirical benchmarks show speedups in wall-clock time of 1.57×1.5-7\times over uniform sampling (Geng et al., 15 Feb 2024, Luo et al., 2021), reduction in regret by a factor proportional to communication budget (Zeng et al., 2023), and up to several percent improvement in global model accuracy and fairness (Lange et al., 29 Jan 2025, Jiménez et al., 24 Aug 2024).

5. Specialized Domains and Extensions

a) Streaming and Online Sampling

Under streaming, non-i.i.d. client data, sample selection for local labeling must be performed instantaneously and independently, often under memory and budget constraints. Recent work introduces numerically robust online batch selection using volume sampling and Cholesky updates in high-dimensional embedding spaces (Röder et al., 30 Aug 2024).

b) Acceptance Sampling and Quality Control

In industrial settings (e.g., photovoltaics), independent sampling underpins control-inspection schemes where OC curves, decision limits, and sample size planning are derived under arbitrary distributional assumptions using nonparametric quantile estimates (Steland, 2014).

c) Privacy-Preserving Aggregation

FedSTaS and related approaches integrate locally differentially private reporting of client data statistics in sampling and aggregation (Slessor et al., 18 Dec 2024).

d) Arbitrary Client Availability

FedGS demonstrates independent diversity- and fairness-aware sampling under arbitrary, possibly adversarial, client availability modes using data-distribution graphs and constrained variance optimization (Wang et al., 2022).

6. Open Challenges and Future Directions

Despite broad utility, several open challenges remain:

  • Non-i.i.d. Data Regimes: Variance in updates and DP noise have amplified impacts when data is highly skewed and clients have limited, heterogeneous datasets, limiting achievable utility (Lange et al., 29 Jan 2025, Slessor et al., 18 Dec 2024).
  • Real-time Optimization: Estimating optimal qitq_i^t online in resource-constrained or privacy-limited settings (especially for rapidly changing environments) remains an active problem (Zeng et al., 2023, Zhao et al., 2021).
  • Scalability of Complex Sampling Algorithms: Approaches using submodular maximization, graph constraints, or advanced privacy mechanics must be engineered for massive-scale deployment with thousands to millions of devices (Jiménez et al., 24 Aug 2024, Wang et al., 2022).
  • Usability and User-Centric Privacy: Enabling informed user selection of privacy budgets and exposing the consequences to end users is an unsolved system and UI problem (Lange et al., 29 Jan 2025).

7. Summary Table of Representative Algorithms and Theoretical Guarantees

Method/Class Objective (Key Formula) Primary Guarantee/Result
Variance-minimizing IS pitα1ζG,i,t2+α2σL,i2p_i^t \propto \sqrt{\alpha_1\zeta_{G,i,t}^2 + \alpha_2\sigma_{L,i}^2} Optimal variance under unbiasedness (Wang et al., 2022)
Bandit/Online OSMD Minimize lt(q)=1/Kiaitqil_t(q)=1/K\sum_{i}\frac{a_{i}^{t}}{q_i} Dynamic regret bounds; adaptivity (Zhao et al., 2021)
Privacy-aware/IDP qi=getSampleRate(εi,δ,σ)q_i = \text{getSampleRate}(\varepsilon_i, \delta, \sigma) Per-client (εi,δ)(\varepsilon_i, \delta)-DP (Lange et al., 29 Jan 2025)
Fairness/submodular maxSG(S)+λmin(b,F(S))\max_S G(S) + \lambda\min(b, F(S)) Improved client dissimilarity, strong convergence (Jiménez et al., 24 Aug 2024)
Adaptive bandwidth Joint minq\min_{\mathbf{q}} expected wall-clock time 1.57×1.5-7\times speedup, full heterogeneity (Geng et al., 15 Feb 2024)

References to Seminal Works

  • "A General Theory for Client Sampling in Federated Learning" (Fraboni et al., 2021)
  • "Adaptive Federated Learning in Heterogeneous Wireless Networks with Independent Sampling" (Geng et al., 15 Feb 2024)
  • "Federated Learning With Individualized Privacy Through Client Sampling" (Lange et al., 29 Jan 2025)
  • "Enhanced Federated Optimization: Adaptive Unbiased Client Sampling with Reduced Variance" (Zeng et al., 2023)
  • "FedGS: Federated Graph-based Sampling with Arbitrary Client Availability" (Wang et al., 2022)
  • "SUBTRUNC and UNIONFL: Submodular Maximization Approaches for Equitable Client Selection in Federated Learning" (Jiménez et al., 24 Aug 2024)
  • "LOCKS: User Differentially Private and Federated Optimal Client Sampling" (Mulay, 2022)
  • "FedSTaS: Client Stratification and Client Level Sampling for Efficient Federated Learning" (Slessor et al., 18 Dec 2024)

The field continues to rapidly develop novel independent sampling paradigms optimized for privacy, fairness, statistical efficiency, and practical deployment constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Independent Client Sampling.