Weighted Quantile Sketch: Algorithms & Applications

Updated 22 September 2025

Weighted quantile sketching is a technique using statistical methods and streaming algorithms to efficiently compute quantiles for data with arbitrary positive weights.
It generalizes traditional quantile estimators by adapting structures like the GK extension and KLL algorithms, ensuring low memory usage and fast update times.
Its applications span network monitoring, real-time analytics, and regression analysis, providing robust quantile estimations in massive, weighted data streams.

Weighted quantile sketching is an area comprising statistical methods and streaming algorithms designed to efficiently compute quantiles and related distributional summaries when observations carry arbitrary positive weights. Weighted quantile sketches generalize classic unweighted quantile estimators and streaming data structures by considering the importance or replication factor of each data point, allowing applications ranging from robust regression and risk estimation to online monitoring and cardinality estimation in massive data streams.

1. Principles and Motivation

Weighted quantile sketch techniques address scenarios where each observation $x_i$ is equipped with an associated nonnegative weight $w_i$ . The generalized empirical quantile function tracks the locations in the sorted list of observations, with location determined not by indices $i/n$ but by cumulative normalized weights $\bar{w}_i = w_i/\sum_j w_j$ and cumulative sums $s_i = \sum_{j=1}^i \bar{w}_j$ . This approach is essential in cases with duplicated, replicated, or differently significant records, as encountered in mixture modeling, time series with temporal decay, or data streams with frequency counts.

A central requirement for weighted quantile sketching is consistency: if all $w_i$ are equal, the estimator reduces to its classical unweighted form. Additionally, zero-weight observations must not affect the outcome, and the method should be stable under small perturbations in weights.

2. Algorithmic Streaming Sketches

Greenwald–Khanna (GK) Extension

Classical streaming quantile summaries use deterministic comparison-based algorithms such as the Greenwald–Khanna (GK) structure, which achieves $O(\frac{1}{\epsilon}\log(\epsilon n))$ space for $n$ elements and guarantees additive rank error $\epsilon n$ . When each element comes with weight $w_i$ , the naive approach of inserting each item $w_i$ times causes update time to scale with $w_i$ , which is computationally prohibitive [y]. The first nontrivial deterministic extension of GK for weighted inputs maintains a compact representation and achieves $O(\frac{1}{\epsilon}\log(\epsilon n))$ space and $O(\log(1/\epsilon)+\log\log(\epsilon n))$ update time per element, provided weights are bounded. This matches tight space lower bounds for comparison-based summaries (Assadi et al., 2023).

The critical data structure adapts the GK summary: it stores tuples $(v, g, \Delta)$ tracking value $v$ , lower bound $g$ on cumulative weight below $v$ , and uncertainty $\Delta$ on the weight window. Update procedures merge segments of consecutive items, adjusting $g$ and $\Delta$ to accommodate arbitrary weights and ensuring the summary maintains the quantile rank invariant with respect to cumulative weights.

KLL and Weighted Compactor Algorithms

Algorithms in the KLL family extend naturally to weighted streams, providing randomized sketches with improved space–accuracy trade-offs and $O(\log n)$ update complexity for weighted arrivals (Ivkin et al., 2019). Weighted item support is achieved either by binary decomposition of weights (base-2 representation feeding updates across compactor levels) or weight-aware compactor structures where weights for pairs $(a, w_a)$ , $(b, w_b)$ are merged via probabilistic retention and weight summing. These methods avoid the linear cost in $w_i$ associated with naive duplication.

Furthermore, memory usage is minimized by employing shared packed arrays and lazy compaction, ensuring practical efficiency in embedded and high-throughput scenarios.

3. Weighted Quantile Estimators

Generic frameworks for weighted quantile estimation revise classical estimators such as linear interpolation-based types or the Harrell–Davis estimator (Akinshin, 2023). For a sample $x_1,\dots,x_n$ and normalized weights $\bar{w}_i$ , cumulative sums $t^*_i = \sum_{j=1}^i \bar{w}_{(j)}$ (indexed by ordered statistics) replace uniform spacing.

Weighted Harrell–Davis estimator takes the form: $Q^*(x,w,p) = \sum_{i=1}^{n} W^*_{\mathrm{HD}, i}\, x_{(i)},$ where $W^*_{\mathrm{HD}, i} = I_{t^*_i}(\alpha^*,\beta^*) - I_{t^*_{i-1}}(\alpha^*,\beta^*)$ , and effective sample size $n^* = (\sum w_i)^2/\sum w_i^2$ replaces $n$ .

For time series quantile exponential smoothing, exponentially decaying weights (e.g., $w_i = 2^{-(n-i)/t_{1/2}}$ for half-life $t_{1/2}$ ) are employed, yielding adaptable estimates of evolving distributional tails. Robust "trimmed" versions further increase resistance to outliers by restricting the weight function to highest density intervals.

In mixture models, quantile estimation can be performed via weighted combinations of separate component CDFs, with the estimator respecting mixture weights and continuity requirements.

4. Applications in Data Streams and Real-Time Analytics

Weighted quantile sketches are indispensable in massive streaming applications:

Network Monitoring: Quantile summaries (e.g., p99 latencies) over weighted streams quickly detect performance anomalies, with sketches such as KLL or DDSketch adapted for weighted samples. DDSketch and its variants (UDDSketch) provide relative error guarantees and support full mergeability, maintaining accuracy across distributed systems and arbitrary data distributions (Masson et al., 2019, Epicoco et al., 2020).
Per-Item Quantiles: In network analytics, SQUAD and similar algorithms efficiently track quantiles for heavy hitters in the presence of weights, combining reservoir sampling and sketching to minimize space complexity and enable accurate tail estimation (Shahout et al., 2022).
Weighted Cardinality Estimation: QSketch applies quantization to weighted cardinality, estimating sums of weights of unique elements with extreme memory efficiency (8 bits per register) and $O(1)$ update time, outperforming traditional continuous-register sketches such as Lemiesz’s method or FastGM (Qi et al., 27 Jun 2024).
Regression Analysis: Weighted quantile regression techniques exploit replication in covariate groups, yielding more efficient estimators in contexts such as climate modeling or financial risk analysis. Weighted-average quantile regression further interfaces with quantile sketching by modeling integrated effects over the conditional quantile function (Jana et al., 2019, Chetverikov et al., 2022).

5. Theoretical Guarantees and Complexity

Deterministic streaming summaries for weighted quantiles match known lower bounds: $O(\frac{1}{\epsilon}\log n)$ or $O(\frac{1}{\epsilon}\log(\epsilon n))$ space (Assadi et al., 2023), [y]. Randomized sketches, including KLL variants, achieve improved constant factors and fast update time ( $O(\log n)$ ) for weighted items (Ivkin et al., 2019).

Weighted quantile estimators based on linear combinations of order statistics and effective sample size provide stable, consistent inference even under arbitrary weighting, and are robust to zero-weight items and small perturbations (Akinshin, 2023).

In streaming contexts, algorithms maintain error guarantees in rank or relative quantile estimation. For instance, DDSketch achieves for any quantile query $q$ that the returned estimate $\tilde{x}_q$ satisfies $|x_q-\tilde{x}_q|\leq \alpha x_q$ , with memory requirements scaling logarithmically in dataset size (Masson et al., 2019).

QSketch’s quantization and dynamic updating yield $O(1)$ update complexity and retain accuracy within 30% of state-of-the-art, using only $1/8$ the memory (Qi et al., 27 Jun 2024).

6. Extensions and Open Directions

Weighted quantile sketching remains an active research area with several open directions:

Robust Regression and Adaptive Weighting: Extensions to nonlinear or high-dimensional quantile regression using machine learning for conditional CDF estimation, adaptive weight function selection, and streaming updates (Chetverikov et al., 2022).
Cardinality, Frequency Moments, ℓₚ‐Sampling: Generalizing sketches for cardinality to support efficient sampling without replacement (WOR ℓₚ) allows accurate estimation of frequency-based statistics and tail quantiles even under heavy-weight skew (Cohen et al., 2020).
Distributed and Concurrent Algorithms: Multi-threaded and distributed sketch architectures (e.g., Quancurrent) support high-throughput, low-latency quantile inference, with potential extensions for weighted inputs via modified hierarchical weighting logic (Elias-Zada et al., 2022).
Real-Time Monitoring and Anomaly Detection: Applications extend to networked systems, cache management, and anomaly detection where weighted events and updates drive statistical summaries under tight constraints (Ben-Basat et al., 2022, Qi et al., 27 Jun 2024).

Weighted quantile sketching thus offers foundational tools for modern data analytics, combining statistical rigor, computational efficiency, and adaptability to arbitrary weighting schemes. Its integration into streaming systems, regression modeling, and density estimation continues to foster advances in scalable, online, and robust statistical computing.