Papers
Topics
Authors
Recent
2000 character limit reached

Conditional Poisson Stochastic Beam Search

Updated 20 November 2025
  • Conditional Poisson Stochastic Beam Search (CPSBS) is a decoding algorithm that replaces greedy top-K selection with conditional Poisson sampling to generate diverse candidate sets.
  • It leverages dynamic programming to compute explicit set inclusion probabilities, enabling unbiased expectation estimation via the Horvitz–Thompson estimator.
  • Empirical evaluations demonstrate that CPSBS improves diversity and lowers RMSE in machine translation tasks, balancing quality and diversity better than alternative methods.

Conditional Poisson Stochastic Beam Search (CPSBS) is a stochastic decoding algorithm designed for sequence generation tasks using locally normalized probabilistic models. It generalizes the standard deterministic beam search by replacing its greedy top-K selection with a conditional Poisson sampling design, allowing for set-valued sampling without replacement at each step. CPSBS provides a principled mechanism for sampling diverse hypotheses and enables the construction of unbiased or consistent estimators for arbitrary expectations under the sequence model, with quantifiable inclusion probabilities for each candidate in the final set (Meister et al., 2021).

1. Formal Framework and Motivation

Standard beam search iteratively selects the top-K scoring continuations (extensions) at each time step for sequence models defined by

p(y)=t=1Tp(yty<t),p(y) = \prod_{t=1}^T p(y_t \mid y_{<t}),

by maximizing a set function:

Qt(YYt1)={yYwyif Y=K, 0otherwise,Q_t(Y\,|\,Y_{t-1}) = \begin{cases} \prod_{y\in Y} w_y & \text{if}\ |Y|=K,\ 0 & \text{otherwise}, \end{cases}

with wy=p(yy<t)w_y = p(y \mid y_{<t}). This approach suffers from two major drawbacks for expectation estimation: (i) high overlap among the K returned sequences, resulting in poor coverage of pp’s support, and (ii) the induced summary set often leads to biased and high-variance estimates for model expectations such as Ep[f(y)]E_p[f(y)].

CPSBS addresses both issues by introducing a conditional Poisson sampling (CPS) scheme [Hájek, 1964; Tillé, 2006] that samples K candidates without replacement at each timestep, according to a distribution proportional to the product of their local probabilities. The sampling distribution at time tt is given by:

Qt(YYt1)=1ZtyYwy,Q_t(Y\,|\,Y_{t-1}) = \frac{1}{Z_t} \prod_{y\in Y} w_y,

where Bt=Yt1VB_t = Y_{t-1} \otimes V and normalization Zt=YBt:Y=KyYwyZ_t = \sum_{Y'\subseteq B_t: |Y'|=K} \prod_{y\in Y'} w_y.

Sampling sets rather than individual hypotheses at each stage reduces hypothesis overlap and provides a more accurate representation of the model's support. Moreover, as the temperature is annealed (wyp(y)1/Tw_y \propto p(y)^{1/T}), CPSBS recovers deterministic beam search in the limit T0T \to 0.

2. Algorithmic Structure and Execution

The essential CPSBS algorithm involves:

  • Initializing the beam with Y0={BOS}Y_0 = \{\text{BOS}\}.
  • Iteratively constructing candidate sets BtB_t of all one-token extensions of current beams.
  • Assigning each candidate a weight

wn=p(y(n)y<t(n))1p(y(n)y<t(n)),w_n = \frac{p(y^{(n)} \mid y^{(n)}_{<t})}{1 - p(y^{(n)} \mid y^{(n)}_{<t})},

and computing the normalizer ZtZ_t by dynamic programming:

1
2
3
4
5
6
W[0..N,0..K] ← 0; W[0,0]←1
for n=1..N:
  W[n,0]←1
  for k=1..K:
    W[n,k] ← W[n−1,k] + w_n·W[n−1,k−1]
Z ← W[N,K]

  • Drawing KK-element sets YtY_t according to inclusion probabilities

πn(t)=wnZn1,K1ZN,K,\pi_n(t) = w_n \cdot \frac{Z_{n-1, K-1}}{Z_{N, K}},

by running a standard CPS acceptance procedure.

  • Recursing until t=Tt = T and returning YTY_T.

Compared to Kool et al. (2019)’s Stochastic Beam Search (SBS), which employs Gumbel-top-K noise and does not provide closed-form set-inclusion probabilities, CPSBS allows efficient, dynamic-program-based calculation of inclusion probabilities and maintains a natural connection to beam search through temperature annealing.

3. Statistical Properties and Consistent Estimation

A primary advantage of CPSBS is the availability of set-inclusion probabilities, enabling unbiased estimation of model expectations via the Horvitz-Thompson (HT) estimator. For a sampling set YTY_T of size KK and function f(y)f(y),

μ^HT=yYTf(y)Π(y),\hat{\mu}_{\mathrm{HT}} = \sum_{y \in Y_T} \frac{f(y)}{\Pi(y)},

where Π(y)\Pi(y) is the marginal probability that yy appears in YTY_T. The HT estimator is unbiased, i.e., E[μ^HT]=Eyp[f(y)]E[\hat{\mu}_{\mathrm{HT}}] = E_{y \sim p}[f(y)].

Direct computation of Π(y)\Pi(y) is intractable; instead, two approaches are provided:

  • Naïve Monte Carlo: estimate Π(y)\Pi(y) as the empirical fraction over MM CPSBS runs where yy appears in YTY_T.
  • Importance Sampling (IS): use a hindsight proposal that conditions each step on keeping yy in the current beam. The IS estimator is consistent, with bounded asymptotic variance under mild conditions (Proposition 4.2).

Variance analyses reveal that the naїve MC estimator for 1/Π(y)1/\Pi(y) has infinite variance as Π0\Pi \to 0, while the IS-based reciprocal estimator remains consistent with controlled variance.

4. Empirical Evaluation and Practical Implications

CPSBS is empirically evaluated on WMT’14 En→Fr translation using a pre-trained Transformer with varying temperature settings T{0.1,0.2,0.3,0.5}T \in \{0.1, 0.2, 0.3, 0.5\}. The primary tasks include sentence-level BLEU and conditional entropy expectation estimation. Competing baselines include Monte Carlo (MC) sampling, Sum-and-Sample (SAS) [Kool et al. 2020], and Stochastic Beam Search (SBS).

Using root mean squared error (RMSE) against high-precision MC as a metric, CPSBS with the HT estimator and M=1 IS estimate achieves the lowest RMSE across all temperatures and sample sizes, with the greatest improvements at low TT (peaky, low-entropy distributions). At higher temperatures, a slight bias is attributable to Horvitz-Thompson estimation with estimated inclusion probabilities, but RMSE remains lower than for alternatives.

In diverse-set sampling, CPSBS generates final KK-sets YTY_T with higher average nn-gram diversity compared to Diverse Beam Search (Vijayakumar et al., 2018) and pure ancestral sampling, while maintaining a competitive BLEU score range. SBS continues to produce a broader quality-diversity trade-off, but CPSBS serves as a robust approach balancing both aspects.

Method Sampling Mechanism Set-Inclusion Probabilities
Standard Beam Search Deterministic top-K None (greedy)
Stochastic Beam Search Gumbel-top-K noise Not explicit; needs integration
CPSBS Conditional Poisson SWOR Explicit, DP-computable

CPSBS distinguishes itself from both traditional deterministic and contemporaneous stochastic methods by directly modeling the set-wise sampling distribution (rather than augmenting with noise), exactly recovering beam search in the low-temperature limit, and supporting explicit inclusion probability computation, which is instrumental for unbiased or consistent statistical estimation.

6. Extensions and Applications

CPSBS generalizes to any structured prediction problem where locally normalized sequence models and standard beam search are feasible. This encompasses sequence-to-sequence tasks beyond translation (summarization, parsing, dialogue generation), as well as structured regression/classification trees, graph generation, and compound output domains such as image captioning.

Key applications include:

  • Consistent estimation of expectations Ep[f]E_p[f] for loss-aware decoding, minimum-risk training, REINFORCE gradients, MBR decoding, and uncertainty quantification.
  • Generation of diverse candidate lists for n-best or k-best decoding settings, where diversity and representative coverage are desired.
  • Potential integration with diversity-promoting scoring or regularization schemes by modulating candidate weights wyw_y.

Overall, Conditional Poisson Stochastic Beam Search is positioned as a mathematically principled, tractable, and empirically validated stochastic generalization of beam search that retains its structural advantages while overcoming its key statistical limitations (Meister et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Conditional Poisson Stochastic Beam Search.