Heap-Based Prioritized Sampling

Updated 8 January 2026

Heap-based prioritized sampling is a method that uses heap data structures to dynamically track and select items based on assigned numerical priorities, ensuring unbiased and scalable sampling.
The approach achieves efficient O(log k) update operations, enabling real-time selection in applications such as data stream aggregation, reinforcement learning, and large-scale sequence generation.
Empirical results demonstrate that these algorithms outperform uniform sampling by significantly reducing estimation error and improving task-specific performance metrics.

Heap-based prioritized sampling refers to a family of algorithms that leverage heap (priority queue) data structures to efficiently select, retain, or generate a subset of items—such as problems, rollouts, data points, or tokens—according to dynamically assigned priorities. This mechanism provides unbiased, scalable, and theoretically grounded alternatives to uniform or naively random sampling, with broad applicability in large-scale data summaries, reinforcement learning, and sequence generation.

1. Fundamental Principles and Definitions

Heap-based prioritized sampling uses heaps—typically min-heaps or max-heaps—to maintain a dynamically selected set of the most "important" items according to an application-specific priority function. The core idea is that each item (e.g., key in stream aggregation, problem in RL, node in sequence generation) is assigned a numerical priority and the heap efficiently tracks either the top- $k$ (for reservoir sampling), the largest, or some other order statistics among these priorities.

Priorities are generally assigned based on:

A function of the item's intrinsic weight (for weighted data streams: $q_i = w_i / h_i$ , $h_i \sim \mathrm{Uniform}(0,1]$ )
A data-driven informativeness metric (RL success statistics: $\omega = p(1-p)$ )
Model confidence for sequence generation ( $p(\mathbf{t}| \mathrm{prefix})$ )

The heap structure enables $O(\log k)$ insertion, update, and removal operations, ensuring scalability to millions of items or updates.

2. Algorithmic Instantiations and Data Structures

Stream Aggregation (Priority-Based Aggregation, PBA)

Each key $k$ in a data stream is retained in a fixed-size min-heap whose priority $r_k$ is the ratio $W_k/u_k$ (where $W_k$ is key's running weight, $u_k$ a fixed random variable).
Heap tracks the top $m$ priorities; the lowest among them is threshold $\tau$ .
Deferred update mechanism allows estimator corrections only when the key is next updated or read, amortizing computational cost to $O(1)$ per update, $O(\log m)$ per insertion (Duffield et al., 2017).

RL Post-training (Problem-level Heap-based Prioritized Replay)

Tasks or rollouts are maintained in a max-heap, keyed by the priority $\omega = p(1-p)$ , favoring intermediate-difficulty instances.
Side-pools (min-heaps keyed by timestamps) hold "solved" and "unsolved" items for periodic retesting, preventing starvation and forgetting.
Sampling from the heap can be deterministic (top- $C$ extraction) or probabilistic (sum-tree variant) (Fatemi, 6 Jan 2026).

Sequence Generation in LLMs (Priority Sampling)

Prefixes in sequence decoding are stored in a max-heap keyed by local model confidence $p(\mathrm{token} | \mathrm{prefix})$ .
At every step, the highest-priority incomplete continuation is expanded, guaranteeing unique, highest-confidence outputs in order (Grubisic et al., 2024).

Common Heap Structure and Operations

Algorithmic Context	Heap Type	Priority Key
Stream/PBA	min-heap	$r_k = W_k/u_k$
RL replay	max-heap	$\omega = p(1-p)$
Token sampling	max-heap	$p(\mathrm{token}\|~)$

All variants use parallel arrays or maps to link IDs to priorities and maintain the heap property for efficient access and updates.

3. Priority Functions and Theoretical Guarantees

Stream Aggregation and Weighted Sets

Item priorities are $q_i = w_i/h_i$ , ensuring that inclusion probabilities permit unbiased estimation for arbitrary subset sums and heavy-tailed distributions (Thorup, 2013).
The estimation procedure for a sampled item with threshold $\tau$ is $\widehat w_i = \max\{w_i, \tau\} \cdot \mathbf{1}_{q_i > \tau}$ .
These schemes guarantee unbiasedness: $\mathbb{E}[\widehat w_i] = w_i$ for all $i$ , with strong concentration bounds— $O(1/\sqrt{k})$ relative error with as little as 2-independent hash functions.

RL and Informativeness

In reinforcement learning, the mean-squared advantage $p(1-p)$ (empirical success rate) formalizes the learning signal, peaking at intermediate $p$ .
The heap maximizes sampling of tasks supplying maximal gradient information, accelerating training relative to uniform approaches (Fatemi, 6 Jan 2026).

Sequence Generation

Heap-based priority sampling for LLMs deterministically enumerates outputs by model confidence, providing optimal coverage of high-likelihood samples.
The approach avoids duplicate sequences and requires no temperature tuning, contrasting with stochastic methods such as nucleus sampling (Grubisic et al., 2024).

4. Computational Complexity and Scalability

Heap-based prioritized sampling methods achieve high throughput, as the dominant per-update cost is $O(\log k)$ for heap operations, with $k$ the heap size or sample budget. This complexity applies across settings:

Stream/PBA: Heap size $m \ll$ total key space; update is either $O(1)$ (hit) or $O(\log m)$ (eviction/insertion) (Duffield et al., 2017).
RL replay: Heap operations are $O(\log M)$ , where $M$ is the number of tracked problems; memory costs are $O(M)$ (Fatemi, 6 Jan 2026).
Sequence sampling: Heap size is $O(N)$ , with $O(\log N)$ per pop/push; main cost is dominated by model inference (Grubisic et al., 2024).
Empirically, heap overhead is negligible compared to core modeling costs in both RL and LLM sampling.

5. Empirical Results and Application Domains

Stream Aggregation

PBA achieves weighted relative error reductions of 40–65% over Adaptive Sample and Hold (ASH) at low sampling rates (5–17%), with $O(\log m)$ computational cost.
Subpopulation and rank queries are accurately estimated; heavy-hitter detection precision/recall improves by 10–20 points at 5% sampling.
PBA and PBASH outperform SH and ASH both statistically and computationally, especially with heavy-tailed data (Duffield et al., 2017).

RL Post-training

Heap-based prioritized sampling during RL post-training leads to consistently faster improvements in pass@1 and pass@4 on MATH-500 and AIME 2024 compared to uniform sampling.
Experiments demonstrate focused training on informative, mid-difficulty problems, with migration dynamics traceable in heap and side-pool statistics.
Prioritization by $p(1-p)$ enables superior final accuracy within constrained training steps (Fatemi, 6 Jan 2026).

LLM Generation

Priority sampling on compiler pass prediction with LLMs provides 100% unique, high-confidence outputs per batch, outperforming both nucleus and autotuner baselines.
Outperformance is consistent across 50k held-out examples, demonstrating both completeness and optimal ordering (Grubisic et al., 2024).

6. Variations, Hyperparameters, and Tuning

Key hyperparameters and algorithmic variations include:

Heap (reservoir) size $k$ or $m$ : increased size reduces variance and improves coverage, at cost of memory.
Priority function: application-specific; $w_i/h_i$ for aggregation, $p(1-p)$ for RL, model confidence for LLM sampling.
For RL, moving average parameters ( $\alpha$ ), retesting intervals, explored batch sizes, and exploration rates ( $\rho$ ) influence coverage, gradient magnitudes, and forgetting avoidance (Fatemi, 6 Jan 2026).
Sampling variants: sum-tree proportional sampling allows trade-off between deterministic and probabilistic selection (Fatemi, 6 Jan 2026).

7. Comparison with Alternative Methods

Heap-based prioritized sampling provides improvements in efficiency, statistical correctness, and adaptivity not matched by alternative approaches:

Uniform sampling lacks focus on informative or high-weight items, leading to slower convergence or higher error.
Sample and Hold fails to support fixed-size caches.
Adaptive Sample and Hold requires costly global resampling/rescaling when probabilities change ( $O(m)$ or higher per step).
Sketch-based (e.g., Count-Min) methods are space-efficient but do not yield unbiased per-key or subpopulation estimates.
Static curricula or difficulty tiers do not adapt to evolving data or model dynamics (Thorup, 2013, Duffield et al., 2017, Fatemi, 6 Jan 2026).

Heap-based prioritized sampling is distinguished by its theoretical guarantees, computational efficiency ( $O(\log k)$ per update), and ability to maintain high-fidelity summaries or prioritization in diverse high-volume and large-scale settings.

Markdown Upgrade to Chat

References (4)

Stream Aggregation Through Order Sampling (2017)

Prioritized Replay for RL Post-training (2026)

Priority Sampling of Large Language Models for Compilers (2024)

Bottom-k and Priority Sampling, Set Similarity and Subset Sums with Minimal Independence (2013)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Heap-Based Prioritized Sampling.