Conditional Poisson Stochastic Beam Search
- Conditional Poisson Stochastic Beam Search (CPSBS) is a stochastic decoding algorithm that employs conditional Poisson sampling to select diverse candidate sequences without replacement.
- It efficiently computes selection probabilities using an O(NK) recurrence, bridging the gap between deterministic beam search and unbiased stochastic sampling.
- CPSBS enhances neural machine translation and related tasks by reducing estimator variance and redundancy while providing robust expectation estimates.
Conditional Poisson Stochastic Beam Search (CPSBS) is a stochastic decoding algorithm designed for sequence generation tasks, such as neural machine translation. CPSBS replaces the deterministic selection of top- candidates in traditional beam search with a sampling method based on conditional Poisson sampling, ensuring that each candidate is chosen without replacement and according to model-based probabilities. This approach offers improved diversity, lower variance in expectation estimation, and better consistency in high-entropy settings compared to alternatives such as stochastic beam search (SBS) and standard beam search.
1. Foundations and Motivation
Beam search is the default decoding method in sequence generation models for natural language processing, where the -best candidates are selected at each step by maximizing model probabilities. However, this deterministic selection leads to high redundancy among candidates and bias when estimating expectations under the model. Alternatives such as stochastic sampling address diversity but may induce high variance or inefficiency. CPSBS provides a principled stochasticization by directly connecting the beam search updating process with a classical statistical sampling scheme.
The central motivation for CPSBS is to produce diverse candidate sets while maintaining statistical consistency for expectation estimation tasks, such as BLEU score or model entropy. By sampling without replacement and respecting the model’s probability structure, CPSBS bridges the gap between greedy search and unbiased statistical sampling (Meister et al., 2021).
2. Methodological Framework
CPSBS operationalizes stochastic beam selection via conditional Poisson sampling. At each decoding step , given a set of candidates and weights for each candidate (commonly based on the model probabilities or a numerically stable version ), the method samples exactly candidates without replacement according to the distribution:
where , [Equation 32, (Meister et al., 2021)].
Calculating the normalizing constant across all subsets of size requires evaluating
[Equation 13, (Meister et al., 2021)]. Efficient O() computation is achieved via a recurrence analogous to the computation of elementary symmetric polynomials.
Candidate inclusion is determined by iteratively updating probabilities as elements are considered, guaranteeing that exactly items are selected per time step. As the weights are annealed (i.e., sharpened such that the top- candidates dominate the distribution), CPSBS’s sampling behavior converges to that of deterministic beam search; this connection underpins CPSBS as a "faithful stochasticization" of beam search.
3. Relationship to Stochastic Beam Search and the Gumbel-Top- Trick
CPSBS functions similarly to the stochastic beam search (SBS) formulated via the Gumbel-Top- trick (Kool et al., 2019). Both achieve sampling without replacement and avoid overlap in generated sequences. In SBS, sequence generation uses perturbed log-probabilities:
and the top indices, , are selected, forming an exact sample without replacement. The probability of a given selection sequence is
where is the set of remaining items at step .
CPSBS offers an alternative by directly leveraging conditional Poisson sampling and the associated inclusion probabilities, while SBS propagates stochastic perturbations recursively along a sequence tree. Computationally, CPSBS provides efficient selection via recurrence, whereas SBS’s efficiency is rooted in the structure of the Gumbel-Top- mechanism.
4. Statistical Estimators and Diversity
CPSBS samples can be leveraged to build consistent estimators for model-based expectations, such as expected sentence-level BLEU or entropy. The inclusion probabilities derived from conditional Poisson sampling enable use of Horvitz–Thompson-type estimators, which substantially reduce variance compared to naive Monte Carlo or SBS estimators.
By ensuring sampling without replacement, CPSBS increases support coverage within the candidate space and achieves higher diversity; overlap between returned hypotheses is reduced, mitigating redundancy common in deterministic beam search and even SBS under certain settings. In high entropy sampling regimes (e.g., lower temperature scaling), empirical results show that CPSBS maintains favorable trade-offs between translation quality (mean/max BLEU) and diversity (fraction of unique -grams).
5. Computational Efficiency and Scalability
Though conditional Poisson sampling naïvely requires evaluating exponentially many candidate subsets of size , the use of recurrence relations allows time computation for the sampling normalizer and decision process at every step. This scalability is comparable to deterministic beam search and SBS, where the number of model evaluations is linear in and sequence length. Practical implementations avoid explicit enumeration and efficiently update inclusion probabilities as candidate sets evolve.
Numerical stability considerations arise in both SBS and CPSBS when computing perturbed probabilities, particularly for large vocabularies or low-probability events. The CPSBS framework can use numerically stable forms of weight calculation, such as . A plausible implication is that further engineering advances in numerical computation could further extend CPSBS’s tractability for large-scale models.
6. Applications in Sequence Generation
CPSBS has been primarily validated on neural machine translation tasks, using pretrained Transformer models for WMT’14 English–French translation. In experiments, CPSBS consistently produces lower root mean square error (RMSE) for expectation estimation than SBS, Monte Carlo sampling, and sum-and-sample estimators.
CPSBS is a generic decoding strategy and can be applied to other structured output tasks, such as dialogue modeling, text summarization, image captioning, speech recognition, or structured object prediction. The method’s ability to produce diverse, unbiased candidate sets and enable robust estimator construction positions CPSBS as a promising tool for risk-sensitive generation tasks and minimum Bayes risk decoding.
7. Impact, Extensions, and Future Directions
CPSBS provides a rigorous bridge between deterministic and stochastic beam search algorithms, combining classical statistical sampling principles (conditional Poisson) with modern neural sequence models. The design offers variance reduction and estimation efficiency, particularly in high-entropy or diversity-demanding scenarios.
The availability of CPSBS codebases supports further adoption and methodological innovations. Potential future extensions include integrating CPSBS within training loops (e.g., to optimize sequence-level metrics using low-variance estimators), exploring variable or adaptive temperatures and noise models, and extending statistical estimator frameworks.
This suggests that conditional inclusion probabilities and sampling designs from statistical literature may play a growing role in the development of efficient, unbiased decoding strategies for neural models.
Table: Comparison of Beam Search Variants
| Method | Diversity Control | Estimator Variance | Candidate Overlap |
|---|---|---|---|
| Beam Search | None (Deterministic) | High | High |
| Stochastic Beam Search (SBS) | Temperature/Noise | Moderate | Moderate/Low |
| CPSBS | Conditional Poisson | Low | Low |
Deterministic beam search is efficient but non-diverse; SBS introduces stochasticity via perturbations and temperature control; CPSBS ensures diversity and minimal estimator variance by explicitly managing inclusion probabilities using conditional Poisson sampling.
Conclusion
Conditional Poisson Stochastic Beam Search (CPSBS) advances sequence decoding by integrating conditional Poisson sampling into the beam search framework, thereby achieving diversity, efficiency, and statistical consistency. CPSBS reconciles the diversity of stochastic sampling with the structured exploration of beam search, providing lower variance and greater estimator efficiency for sequence generation models. The framework’s extensibility and empirical performance on translation tasks suggest broad applicability within contemporary and future research on structured sequence generation and statistical estimation.