Stochastic Beam Search (SBS)

Updated 25 November 2025

Stochastic Beam Search (SBS) is a decoding algorithm for sequence models that uses stochastic sampling with beam search to improve output diversity and estimation accuracy.
It leverages techniques like the Gumbel-Top-k trick and temperature-controlled softmax to balance exploration and exploitation in candidate selection.
Variants incorporating conditional Poisson sampling and self-evaluation guidance enable calibrated multi-step reasoning in applications such as neural translation and large language models.

Stochastic Beam Search (SBS) is a family of decoding algorithms for sequence models that blend the structured exploration of beam search with the stochasticity of sampling. The method addresses longstanding limitations of deterministic beam search—such as low diversity and overemphasis on a narrow mode of the model’s distribution—while avoiding the inefficiency and high variance of pure sampling. SBS has foundational connections to the Gumbel-Top-k trick for sampling without replacement, and recent variants further incorporate evaluation signals (e.g., LLM self-critique) to robustly guide multi-step reasoning, especially in LLMs. The SBS framework unifies a spectrum of inference regimes, including deterministic beam search as a limiting case, and is now a critical component in settings requiring diverse candidate generation, low-variance expectation estimation, and calibrated multi-step reasoning.

1. Core Principles and Algorithmic Foundations

Stochastic Beam Search operates by propagating a set (beam) of $k$ partial hypotheses through the exponentially large space of possible sequences. At each decoding step, a larger candidate set is generated—typically by expanding each beam item with multiple possible continuations—and a stochastic, without-replacement selection is performed using a probability distribution over these candidates. The distribution is frequently parameterized by a temperature hyperparameter to interpolate between deterministic top-k (pure exploitation) and uniform random selection (pure exploration) (Kool et al., 2019, Xie et al., 2023).

A pivotal technique underlying SBS is the Gumbel-Top-k trick, which enables sampling $k$ items without replacement from a categorical or structured sequence distribution. At each expansion, Gumbel noise is added to (potentially unnormalized) log-probabilities, and the $k$ highest perturbed scores determine the selected candidates. This approach preserves the marginal sequence distribution while enforcing sample diversity (Kool et al., 2019).

Variants such as Conditional Poisson Stochastic Beam Search generalize the candidate sampling operation by employing conditional Poisson designs, giving rise to rigorously specified inclusion probabilities for unbiased downstream estimation (Meister et al., 2021).

2. Mathematical Formulation

In standard sequence modeling, the probability of a sequence $R = [s^1, \ldots, s^T]$ given prompt $x$ is

$P(R | x) = \prod_{t=1}^T P(s^t | x, s^{1:t-1})$

SBS defines a stochastic pruning operator at each step. At decoding step $t$ , for each of the current $k$ beams, $n$ continuations are sampled, forming $kn$ candidates. Each candidate $i$ is assigned an accumulated log-score $L(R^{1:t}_i)$ (optionally incorporating auxiliary signals such as self-evaluation confidence). The normalized selection probability for candidate $i$ is

$P_{\text{beam}}(R^{1:t}_i) = \frac{\exp(L(R^{1:t}_i) / \tau)}{\sum_{j \in S} \exp(L(R^{1:t}_j) / \tau)}$

where $\tau$ is a temperature parameter. $k$ sequences are sampled without replacement from this distribution to form the next beam (Xie et al., 2023).

In Gumbel-Top-k SBS, each sequence candidate is perturbed by an independent Gumbel(0,1) variable and the $k$ highest perturbed scores are selected. For sequence models, this is implemented implicitly in a top-down manner, maintaining only $O(kL)$ model evaluations, where $L$ is the sequence length (Kool et al., 2019).

Conditional Poisson SBS replaces the softmax/Gumbel sampling with an exact conditional Poisson sampling scheme, robustly enabling unbiased estimation of expectations and more stable sample inclusion probabilities (Meister et al., 2021).

3. Algorithmic Procedure

The typical SBS decoding cycle is as follows (parameters: $k$ beam size, $n$ rollouts per beam, $\tau$ temperature, $\lambda$ optional mixture weight):

Initialization: Start with an empty sequence or prompt. Set initial $\tau$ (e.g., $0.2 \leq \tau_0 \leq 0.5$ ).
Expansion: For each beam, sample $n$ candidate continuations from the sequence model.
Scoring: For each candidate, compute the accumulated log-probability, optionally augmenting with external guidance. In self-evaluation-guided SBS, the score is a mixture of model likelihood and a correctness confidence produced by a verifier model:

$L(R^{1:t}) = \sum_{j=1}^t \left[ \lambda \log P(s^j | x, s^{1:j-1}) + (1-\lambda) \log C(s^j) \right]$

Sampling: Compute a temperature-controlled softmax over all candidate log-scores. Sample $k$ distinct items without replacement according to these probabilities.
Annealing: Decrease $\tau$ (e.g., $\tau \leftarrow \alpha \tau$ with $0 < \alpha < 1$ ), shifting gradually from exploration to exploitation.
Iteration: Repeat for each sequence step up to a preset maximum, then collect final beam.

Key resource consumption per step is $O(k n)$ forward passes and scoring operations. The overall cost is $O(T k n)$ in model invocations, where $T$ is the sequence depth (Xie et al., 2023).

4. Temperature Hyperparameter and Exploration–Exploitation Balance

The temperature $\tau$ directly governs the exploration–exploitation trade-off in candidate selection:

As $\tau \to 0$ , the softmax curve sharpens, recovering deterministic beam search where the $k$ highest-scoring candidates are chosen.
As $\tau$ increases, lower-scoring candidates gain weight, increasing diversity and the likelihood of escaping local maxima.
High initial $\tau$ with subsequent annealing (e.g., $\alpha \approx 0.5$ per step) balances early diversity with later-stage focus, empirically improving multi-step reasoning and mitigating error accumulation (Xie et al., 2023).

This mechanism parallels the role of temperature in softmax sampling and stochastic optimization, and can be adapted for different diversity/accuracy requirements.

5. Variants and Theoretical Connections

Gumbel-Top-k Stochastic Beam Search

Employs the Gumbel-Top-k trick to sample $k$ sequences without replacement from posited sequence probabilities.
Bridges the regime between deterministic beam search and pure sequence sampling, interpolating from mode-seeking to distribution-covering behavior as $\tau$ is varied (Kool et al., 2019).
Enables low-variance, unbiased estimators for expectations of sequence-level functions (e.g., expected BLEU, entropy) by leveraging the properties of without-replacement sampling.
Theoretical cost is linear in $k$ and $L$ (sequence length), avoiding exponential blowup (Kool et al., 2019).

Conditional Poisson Stochastic Beam Search

Replaces the softmax or Gumbel procedure with a conditional Poisson sampling process, allowing for exact computation of inclusion probabilities and unbiased Horvitz–Thompson estimation.
Reduces variance in expectation estimation tasks, especially under high entropy, compared with Gumbel-based SBS and ancestral sampling.
Adds $O(NK)$ dynamic programming overhead, but remains practical with vocabulary truncation (Meister et al., 2021).

Comparative Table

Decoding Variant	Sampling Scheme	Variance in Expectation Estimators
Deterministic Beam Search	Top-k deterministic	Biased, low coverage
SBS (Gumbel-Top-k)	Gumbel w/o replacement	Lower variance, some bias
Conditional Poisson SBS	Cond. Poisson w/o repl.	Lowest variance, unbiased

6. Role of Self-Evaluation and Guided Reasoning

Recent adaptations of SBS integrate stepwise self-evaluation guidance for multi-step reasoning, notably in LLMs. At each step:

The continuation $s^t$ is both sampled and scored by the base model and evaluated by a self-evaluation LLM or component, which outputs a correctness confidence $C(s^t) \in [0,1]$ .
The score used in pruning and sampling is a convex combination of model likelihood and verifier confidence, calibrated via parameter $\lambda$ :

$E(R^{1:T}) = \prod_{t=1}^T \left[ P(s^t | x, s^{1:t-1})^{\lambda} \cdot C(s^t)^{1-\lambda} \right]$

This mechanism inhibits the propagation of early-step reasoning errors and increases robustness, particularly evident in benchmarks for arithmetic, symbolic, and commonsense reasoning. Improvements over Codex-backboned baselines include accuracy increases of 6.34%, 9.56%, and 5.46% on GSM8K, AQuA, and StrategyQA, respectively (Xie et al., 2023).

Self-evaluation-guided SBS realizes a lightweight, model-agnostic calibration strategy that integrates fluency and logical soundness without requiring model finetuning.

7. Applications, Trade-offs, and Empirical Insights

SBS is broadly applied in:

Neural machine translation, where it outperforms deterministic and diverse beam search in balancing BLEU and output diversity, and provides lower-variance estimators for sentence-level metrics (Kool et al., 2019, Meister et al., 2021).
Multi-step reasoning in LLMs, where it reduces error accumulation and increases solution consistency (Xie et al., 2023).
Settings requiring diverse solution sets, such as minimum Bayes risk decoding and expectation estimation in RL.

Key trade-offs include:

Larger $k$ and $n$ yield better coverage and accuracy but increase computational cost and latency.
Smaller $\tau$ expedites convergence but risks omitting viable alternatives.
Self-evaluation injects additional forward passes for confidence computation.

Empirical results indicate that SBS (with or without enhancements like conditional Poisson sampling or self-evaluation) achieves superior diversity–quality profiles and substantially lower estimator variance than classical alternatives (Kool et al., 2019, Meister et al., 2021, Xie et al., 2023).

References:

"Self-Evaluation Guided Beam Search for Reasoning" (Xie et al., 2023)
"Conditional Poisson Stochastic Beam Search" (Meister et al., 2021)
"Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement" (Kool et al., 2019)

PDF Markdown Chat (Pro)

References (3)

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement (2019)

Self-Evaluation Guided Beam Search for Reasoning (2023)

Conditional Poisson Stochastic Beam Search (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Stochastic Beam Search (SBS).