Weighted Sampling Strategy
- Weighted sampling strategy is a method where each item is selected with a probability proportional to its assigned weight, ensuring representativeness in diverse data applications.
- It employs various algorithmic techniques such as reservoir, WOR, and coordinated sampling to efficiently balance computational constraints and statistical accuracy.
- Applications span multiple domains—from temporal knowledge graphs and subgraph counting to language modeling—yielding improvements in estimation accuracy and model performance.
Weighted sampling strategy refers to any method in which items, sets, or interactions are drawn at random from a population with probabilities proportional to specified weights associated with the items. Weighted sampling appears as a central component in numerous subfields (e.g., streaming algorithms, graph analysis, machine learning, temporal knowledge graphs, survey inference, privacy-preserving computation). It encompasses a diverse array of algorithmic approaches depending on theoretical goals, structural constraints, and computational architectures.
1. Mathematical Foundations and Weight Construction
Weighted sampling is formally defined by associating with each unit a nonnegative weight . For a population of items, the probability of drawing item can be either
- With replacement: for each draw, or
- Without replacement: entries are drawn one by one, each time proportionally to their remaining (unpicked) weight, so the exact probability that appears in a sample of size is more intricate and involves combinatorial weighting over sampling orders (Ben-Hamou et al., 2016Hübschle-Schneider et al., 2019).
In many applications, the weighting function is not static but dynamically determined by properties of the data (e.g., frequency, importance, RL-predicted utility) or by statistical requirements (e.g., inverse-probability weights, stratification, sample design adjustments). For example, in temporal knowledge graphs, the weight of a quadruple is set as a symmetric function of the inverse frequencies of and , using statistics over the training stream so that rare entities are up-sampled (Mirtaheri et al., 25 Jul 2025):
0
In streaming subgraph counting, edge weights 1 are determined by local and temporal feature vectors, which may be optimized using RL to minimize estimation error for downstream tasks (Wang et al., 2022).
2. Weighted Sampling Algorithms: Core Procedures
Weighted sampling algorithms are distinguished by both sampling paradigm (with/without replacement, sequential/parallel) and by the structure of the population (flat sets, graphs, streams, joins, key‐value maps):
- Reservoir Sampling for Streams: In sequential streaming, weighted reservoir sampling ensures that at all times the reservoir contains 2 i.i.d. samples proportional to current weights (Meligrana, 2024Jayaram et al., 2019). For with‐replacement sampling, each new arrival 3 with weight 4 replaces existing reservoir entries with probability 5 (running total). A skip-based generalization computes, in expectation, the number of items to skip before the next replacement—greatly increasing efficiency for small 6.
- Without Replacement ("WOR") Sampling: Statistically, the concentration behavior of weighted sampling without replacement is controlled via martingale couplings and submartingale inequalities (Ben-Hamou et al., 2016). Algorithmically, WOR is often implemented using bottom-k or priority-key constructions: assign each item 7 a key 8 where 9 or related, and select the top 0 keys as the sample (Cohen et al., 2020Hübschle-Schneider et al., 2019).
- Parallel/Distributed Settings: Efficient constructions (e.g., distributed alias tables, mapping-based reductions) support shared/distributed-memory for high-velocity streaming or large populations, achieving near-linear speedup (Hübschle-Schneider et al., 2019).
- Batch/Minibatch Sampling in ML: Sampling batches with a fraction 1 chosen according to a weighted distribution (e.g., frequency-inverse) and the remainder uniformly is used in TKG and masked language modeling to prioritize rare or poorly-learned items while maintaining generalization (Mirtaheri et al., 25 Jul 2025Zhang et al., 2023).
- Coordinated/Correlated Sampling: For multiple related weight assignments (e.g., multi-period, multi-objective, multi-attribute data), coordinated bottom-k sampling via shared random seeds provides order-of-magnitude variance reduction for estimating aggregate functions involving max, min, or 2 differences (0906.4560).
3. Adaptive and Optimized Weighted Sampling
Optimizing weighted sampling schedules is essential for efficiency and variance reduction. Typical adaptive strategies include:
- Variance-driven bin allocation (weighted ensemble sampling): In multiscale/Markov chain contexts, particles/replicas are allocated according to the square root of local variance (as estimated from a coarse model), minimizing mean squared error in time- or steady-state averages. The allocation formula is (Aristoff et al., 2018Aristoff, 2016):
3
where 4 is an estimate of the local mutation variance in bin 5.
- Reinforcement-learning optimized weights: In online streaming, RL is used to adapt edge weights dynamically for subgraph-reservoir sampling, balancing the value of immediate vs. future subgraph closures (Wang et al., 2022).
- Active and stratified weighted walks: In high-skew graphs, stratified weighted random walks modulate edge weights according to strata and variance proxies, efficiently oversampling small or important categories while controlling Markov chain mixing (Kurant et al., 2011).
4. Applications Across Domains
Weighted sampling serves as a fundamental primitive in many research areas:
| Domain | Objective | Weighted Sampling Role |
|---|---|---|
| Streaming/Sketches | Sketch-based estimates of aggregates, heavy hitters | Bottom-6, Poisson, 7-norm, and reservoir techniques (Cohen et al., 2020Hübschle-Schneider et al., 2019) |
| Survey Inference | Design-based estimation with unequal inclusion probs | Weighted likelihood bootstrap, sandwich variance adjustment (Das et al., 15 Apr 2025) |
| Knowledge Graphs | Robust link prediction in long-tail, incremental graphs | Batch selection favoring rare-entity quadruples (Mirtaheri et al., 25 Jul 2025) |
| LLMs | Unbiased token embedding for rare-word representations | Token-masking probability proportional to inverse frequency or loss (Zhang et al., 2023) |
| Differential Privacy | Release of private samples/summary statistics | Post-processing nonprivate samples with DP-optimally adjusted weights (Cohen et al., 2020) |
| Graph Sampling | Extraction of representative subgraphs in massive graphs | Adaptive edge weighting and local update rules (Yousuf et al., 2019) |
| Multi-Criteria Optimization | Pareto front approximation in MCDM | Systematic grid, Dirichlet, stratified simple sampling (Williams et al., 2024) |
| Joins and Relational Data | Sampling from huge relational joins | Dynamic-programming weights, join-tree sampling (Shekelyan et al., 2022) |
Each setting tailors the notion of "importance" or "rarity" to a problem-specific signal measured by the weighting scheme, and the sampling algorithm is correspondingly adapted to exploit computational structure (e.g., streaming, batch, parallel).
5. Empirical Impact and Trade-offs
Numerous studies consistently demonstrate the impact of weighted sampling on estimation accuracy, model performance, and computational efficiency. For example, upweighting rare entities in TKG completion methods yields 8--9 MRR improvements over uniform sampling, with negligible overhead when applied at the data-loader level (Mirtaheri et al., 25 Jul 2025). In streaming subgraph estimation, fine-tuned RL-based weighting delivers 0--1 lower relative error and 2--3 faster updates compared to uniform sampling of edges (Wang et al., 2022). In unsupervised LLM training, dynamic or frequency-based weighted masking raises sentence-representation quality (Spearman's 4) by 5--6 points in STS tasks, mainly via improved rare-token embeddings (Zhang et al., 2023).
Key trade-offs include:
- Tuning the fraction 7 of weighted sampling vs. uniform to balance rare example focus and generalizability (best results often at 8).
- Computational complexity vs. statistical benefit: skip-based reservoir improves over naive 9-per-update for small sample-to-population ratios, but overhead dominates at high ratios (Meligrana, 2024).
- Memory and message complexity: distributed weighted SWOR achieves near-optimal 0 communication, in contrast to naive global coordination (Jayaram et al., 2019).
- Redundancy vs. coverage in weight simplex sampling: grid-based approaches guarantee uniformity but scale poorly with high objectives; random Dirichlet or stratified LHS/LHHS offer scalable alternatives with stochastic coverage (Williams et al., 2024).
6. Theoretical Guarantees and Statistical Properties
Weighted sampling algorithms are subject to rigorous unbiasedness and concentration guarantees:
- Horvitz-Thompson estimators: For any 1, 2 is unbiased when each 3 is included in the sample with known probability 4 (0906.4560).
- Martingale submartingale coupling: Sampling sums without replacement exhibit sub-Gaussian concentration similar to with-replacement, and variance improves as the unsampled mass decreases (Ben-Hamou et al., 2016).
- Bounds on sample complexity for sum estimation: In the proportional-sampling model, 5 samples suffice and are necessary for estimating 6 to relative error 7 with constant probability (Beretta et al., 2021).
For ensemble and stratified methods, rigorous optimization of allocation variables delivers provably minimal variance subject to budget constraints (Aristoff et al., 2018Aristoff, 2016Kurant et al., 2011). For private weighted sampling, the calibrated inclusion probabilities maximize reporting consistent with 8-DP constraints and rigorously outperform baseline histogram methods (Cohen et al., 2020).
7. Implementation, Tuning, and Best Practices
Best practices for weighted sampling depend on the application context and computational regime:
- Maintain efficient data structures (alias tables, Fenwick trees, hash-based maps) for 9 or 0 draw/update for static populations (Hübschle-Schneider et al., 2019).
- In streaming/minibatch contexts, update weighting statistics incrementally, avoiding reliance on full-data precomputation (Mirtaheri et al., 25 Jul 2025Zhang et al., 2023).
- Empirically tune 1, weighting functions (min, max, mean), smoothing parameters, and batch sizes to optimize out-of-sample performance or estimation error.
- For parallel/distributed, partition sampling responsibilities (e.g., multinomial over total weights, independent local skip-based sampling (Meligrana, 2024)), then merge for correct output distribution.
- For multiple objectives/attributes, construct coordinated sketches with shared randomization, and always use the inclusive estimator for multi-assignment aggregates (0906.4560).
Weighted sampling is thus a unifying paradigm underpinning variance reduction, fairness, rare-event capture, and scalable analytics in modern computational data science. Its rigorous theoretical footing and broad empirical success make it foundational in both classical statistical and modern machine learning pipelines.