Conditional Poisson Stochastic Beam Search

Updated 20 November 2025

Conditional Poisson Stochastic Beam Search (CPSBS) is a decoding algorithm that replaces greedy top-K selection with conditional Poisson sampling to generate diverse candidate sets.
It leverages dynamic programming to compute explicit set inclusion probabilities, enabling unbiased expectation estimation via the Horvitz–Thompson estimator.
Empirical evaluations demonstrate that CPSBS improves diversity and lowers RMSE in machine translation tasks, balancing quality and diversity better than alternative methods.

Conditional Poisson Stochastic Beam Search (CPSBS) is a stochastic decoding algorithm designed for sequence generation tasks using locally normalized probabilistic models. It generalizes the standard deterministic beam search by replacing its greedy top-K selection with a conditional Poisson sampling design, allowing for set-valued sampling without replacement at each step. CPSBS provides a principled mechanism for sampling diverse hypotheses and enables the construction of unbiased or consistent estimators for arbitrary expectations under the sequence model, with quantifiable inclusion probabilities for each candidate in the final set (Meister et al., 2021).

1. Formal Framework and Motivation

Standard beam search iteratively selects the top-K scoring continuations (extensions) at each time step for sequence models defined by

$p(y) = \prod_{t=1}^T p(y_t \mid y_{<t}),$

by maximizing a set function:

$Q_t(Y\,|\,Y_{t-1}) = \begin{cases} \prod_{y\in Y} w_y & \text{if}\ |Y|=K,\ 0 & \text{otherwise}, \end{cases}$

with $w_y = p(y \mid y_{<t})$ . This approach suffers from two major drawbacks for expectation estimation: (i) high overlap among the K returned sequences, resulting in poor coverage of $p$ ’s support, and (ii) the induced summary set often leads to biased and high-variance estimates for model expectations such as $E_p[f(y)]$ .

CPSBS addresses both issues by introducing a conditional Poisson sampling (CPS) scheme [Hájek, 1964; Tillé, 2006] that samples K candidates without replacement at each timestep, according to a distribution proportional to the product of their local probabilities. The sampling distribution at time $t$ is given by:

$Q_t(Y\,|\,Y_{t-1}) = \frac{1}{Z_t} \prod_{y\in Y} w_y,$

where $B_t = Y_{t-1} \otimes V$ and normalization $Z_t = \sum_{Y'\subseteq B_t: |Y'|=K} \prod_{y\in Y'} w_y$ .

Sampling sets rather than individual hypotheses at each stage reduces hypothesis overlap and provides a more accurate representation of the model's support. Moreover, as the temperature is annealed ( $w_y \propto p(y)^{1/T}$ ), CPSBS recovers deterministic beam search in the limit $T \to 0$ .

2. Algorithmic Structure and Execution

The essential CPSBS algorithm involves:

Initializing the beam with $Y_0 = \{\text{BOS}\}$ .
Iteratively constructing candidate sets $B_t$ of all one-token extensions of current beams.
Assigning each candidate a weight

$w_n = \frac{p(y^{(n)} \mid y^{(n)}_{<t})}{1 - p(y^{(n)} \mid y^{(n)}_{<t})},$

and computing the normalizer $Z_t$ by dynamic programming:

W[0..N,0..K] ← 0; W[0,0]←1
for n=1..N:
  W[n,0]←1
  for k=1..K:
    W[n,k] ← W[n−1,k] + w_n·W[n−1,k−1]
Z ← W[N,K]

Drawing $K$ -element sets $Y_t$ according to inclusion probabilities

$\pi_n(t) = w_n \cdot \frac{Z_{n-1, K-1}}{Z_{N, K}},$

by running a standard CPS acceptance procedure.

Recursing until $t = T$ and returning $Y_T$ .

Compared to Kool et al. (2019)’s Stochastic Beam Search (SBS), which employs Gumbel-top-K noise and does not provide closed-form set-inclusion probabilities, CPSBS allows efficient, dynamic-program-based calculation of inclusion probabilities and maintains a natural connection to beam search through temperature annealing.

3. Statistical Properties and Consistent Estimation

A primary advantage of CPSBS is the availability of set-inclusion probabilities, enabling unbiased estimation of model expectations via the Horvitz-Thompson (HT) estimator. For a sampling set $Y_T$ of size $K$ and function $f(y)$ ,

$\hat{\mu}_{\mathrm{HT}} = \sum_{y \in Y_T} \frac{f(y)}{\Pi(y)},$

where $\Pi(y)$ is the marginal probability that $y$ appears in $Y_T$ . The HT estimator is unbiased, i.e., $E[\hat{\mu}_{\mathrm{HT}}] = E_{y \sim p}[f(y)]$ .

Direct computation of $\Pi(y)$ is intractable; instead, two approaches are provided:

Naïve Monte Carlo: estimate $\Pi(y)$ as the empirical fraction over $M$ CPSBS runs where $y$ appears in $Y_T$ .
Importance Sampling (IS): use a hindsight proposal that conditions each step on keeping $y$ in the current beam. The IS estimator is consistent, with bounded asymptotic variance under mild conditions (Proposition 4.2).

Variance analyses reveal that the naїve MC estimator for $1/\Pi(y)$ has infinite variance as $\Pi \to 0$ , while the IS-based reciprocal estimator remains consistent with controlled variance.

4. Empirical Evaluation and Practical Implications

CPSBS is empirically evaluated on WMT’14 En→Fr translation using a pre-trained Transformer with varying temperature settings $T \in \{0.1, 0.2, 0.3, 0.5\}$ . The primary tasks include sentence-level BLEU and conditional entropy expectation estimation. Competing baselines include Monte Carlo (MC) sampling, Sum-and-Sample (SAS) [Kool et al. 2020], and Stochastic Beam Search (SBS).

Using root mean squared error (RMSE) against high-precision MC as a metric, CPSBS with the HT estimator and M=1 IS estimate achieves the lowest RMSE across all temperatures and sample sizes, with the greatest improvements at low $T$ (peaky, low-entropy distributions). At higher temperatures, a slight bias is attributable to Horvitz-Thompson estimation with estimated inclusion probabilities, but RMSE remains lower than for alternatives.

In diverse-set sampling, CPSBS generates final $K$ -sets $Y_T$ with higher average $n$ -gram diversity compared to Diverse Beam Search (Vijayakumar et al., 2018) and pure ancestral sampling, while maintaining a competitive BLEU score range. SBS continues to produce a broader quality-diversity trade-off, but CPSBS serves as a robust approach balancing both aspects.

Method	Sampling Mechanism	Set-Inclusion Probabilities
Standard Beam Search	Deterministic top-K	None (greedy)
Stochastic Beam Search	Gumbel-top-K noise	Not explicit; needs integration
CPSBS	Conditional Poisson SWOR	Explicit, DP-computable

CPSBS distinguishes itself from both traditional deterministic and contemporaneous stochastic methods by directly modeling the set-wise sampling distribution (rather than augmenting with noise), exactly recovering beam search in the low-temperature limit, and supporting explicit inclusion probability computation, which is instrumental for unbiased or consistent statistical estimation.

6. Extensions and Applications

CPSBS generalizes to any structured prediction problem where locally normalized sequence models and standard beam search are feasible. This encompasses sequence-to-sequence tasks beyond translation (summarization, parsing, dialogue generation), as well as structured regression/classification trees, graph generation, and compound output domains such as image captioning.

Key applications include:

Consistent estimation of expectations $E_p[f]$ for loss-aware decoding, minimum-risk training, REINFORCE gradients, MBR decoding, and uncertainty quantification.
Generation of diverse candidate lists for n-best or k-best decoding settings, where diversity and representative coverage are desired.
Potential integration with diversity-promoting scoring or regularization schemes by modulating candidate weights $w_y$ .

Overall, Conditional Poisson Stochastic Beam Search is positioned as a mathematically principled, tractable, and empirically validated stochastic generalization of beam search that retains its structural advantages while overcoming its key statistical limitations (Meister et al., 2021).

PDF Markdown Chat (Pro)

References (1)

Conditional Poisson Stochastic Beam Search (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Conditional Poisson Stochastic Beam Search.

Conditional Poisson Stochastic Beam Search

1. Formal Framework and Motivation

2. Algorithmic Structure and Execution

3. Statistical Properties and Consistent Estimation

4. Empirical Evaluation and Practical Implications

5. Comparative Analysis with Related Decoding Methods

6. Extensions and Applications

Sponsor

Whiteboard

Follow Topic

Continue Learning

Related Topics