Constrained Beam Search & Sampling

Updated 6 April 2026

Constrained Beam Search and Sampling are advanced methods that enforce hard or soft constraints during autoregressive decoding to guide generated outputs.
They utilize techniques such as grid beam, finite-state, and stochastic beam search to ensure constraint satisfaction across tasks like translation, image captioning, and dialogue.
These approaches balance output fidelity and diversity, though they introduce increased computational complexity and require careful tuning for optimal performance.

Constrained beam search and sampling constitute core methodologies in sequence generation, providing algorithmic mechanisms for enforcing hard or soft constraints during decoding with autoregressive models. Historically motivated by the need to steer outputs toward specified content, structural properties, or safety guarantees, these approaches generalize standard beam search and sampling by integrating constraint satisfaction into hypothesis expansion, pruning, and selection. The landscape includes grid beam search, finite-state constrained decoding, stochastic beam search via the Gumbel-Top-k trick, deterministic and plug-and-play constraint enforcement, and hybridizations with nucleus or top-k sampling. Their relevance spans neural machine translation, image captioning, dialogue, reinforcement learning planning, and privacy risk quantification in LLMs.

1. Foundational Methods for Constrained Beam Search

Standard beam search maintains a fixed-size set of top-scoring partial sequences, expanding each candidate at every step to find likely completions under an autoregressive model. Constrained beam search generalizes this by ensuring generated sequences satisfy prespecified constraints such as the inclusion of designated tokens or phrases, avoidance of forbidden words, or adherence to state-dependent combinatorial rules.

Grid Beam Search introduces a two-dimensional beam structure indexed by output time and number of constraint tokens covered. Partial hypotheses are tracked along both axes, with transitions corresponding to unconstrained generation, starting a new constraint, or continuing an active constraint phrase. Beams are pruned and advanced such that only completions fully covering all constraints are considered feasible outputs (Hokamp et al., 2017).

Finite-State Constrained Beam Search further generalizes the framework by representing constraint satisfaction with the state space of a finite-state machine (FSM). Each FSM state encodes which constraints have been satisfied, and separate beams are maintained for each state (Anderson et al., 2016). Expanding a hypothesis transitions to a new state according to input symbol and current satisfaction flags, guaranteeing only hypotheses that eventually reach accepting states (full constraint coverage) can terminate.

Directed Beam Search (DBS) offers a plug-and-play alternative using logit manipulation and soft bias toward upcoming guide words. Constraints can be lexically enforced in order, with cosine similarity-based logit augmentations and segment-level quality functions guiding beam pruning. This approach works without retraining the model or implementing explicit automata, but coverage of all constraints is empirical rather than guaranteed (Pascual et al., 2020).

Reinforcement Learning Constrained Beam Search (RLCBS) applies similar constrained decoding logic to combinatorial optimization and planning contexts, supporting exclusion (negative constraints), forced inclusion (positive constraints), and local transition constraints via action masking and FSM-style bookkeeping (Chen et al., 21 Jan 2025).

2. Algorithmic Structures and Complexity

Constrained decoding incurs algorithmic overhead stemming from a combinatorial explosion in the number of constraint states or constraint-coverage axes:

Grid and DBA beams: Time and space complexity increase by a factor proportional to the number of constraint tokens or constraints, reaching O(k·T·C) for beams of size k, sequence length T, and C constraint tokens (Hokamp et al., 2017, Chousa et al., 2021).
FSM beam tracking: The number of beams is exponential in the number of constraints (2^m for m binary constraints) if all orderings and combinations are distinct (Anderson et al., 2016).
DBS and soft-constrained methods: Complexity is dominated by candidate expansions and quality scoring, but not by the number of constraints directly (Pascual et al., 2020).
RLCBS: Complexity is determined by the number of beams and action proposals, with dynamic allocation to constraint progress “banks” (Chen et al., 21 Jan 2025).

Sampling-based constrained search (e.g., constrained top-k or nucleus sampling at each step) partially mitigates complexity by injecting diversity but does not alleviate the need to track satisfaction state. For large output spaces or tight constraints, pruning and constraint-viability checks are essential to control memory and compute cost (Anderson et al., 2016).

3. Stochastic Beam Search and Sampling Without Replacement

Stochastic Beam Search (SBS) leverages the Gumbel-Top-k trick to perform exact sampling of k distinct sequences without replacement from an underlying factorized sequence model. The method applies Gumbel noise perturbations to sequence log-probabilities, propagates max-perturbations top-down in the decoding tree, and uses consistent truncations to maintain the correct marginal distributions for all partial hypotheses. The resulting algorithm grows linearly in both the number of samples and sequence length (O(kT)), in contrast to the exponential cost of naive full-tree sampling (Kool et al., 2019).

SBS forms a strict probabilistic analog to deterministic beam search: with fixed Gumbel noise, it degenerates to beam search on perturbed scores; averaged over perturbations, it yields unbiased, duplicate-free samples from the target distribution. Empirically, SBS dominates both deterministic beam search (in diversity) and naive sampling (in mean/max quality), and provides low-variance estimators for expected BLEU and entropy using importance-weighting derived from priority sampling (Kool et al., 2019).

4. Practical Applications and Empirical Findings

Constrained beam search and sampling underpin state-of-the-art systems in several domains:

Neural Machine Translation: Input-augmented and grid/DBA-based constrained decoding (e.g., LeCA+LCD) achieves high BLEU and perfect constraint coverage (Sent% = 100) while reducing inference cost via smaller beam sizes. Simultaneous soft (data-driven) and hard constraint incorporation enables both fluency and content control (Chousa et al., 2021).
Image Captioning: FSM-based constrained beam search integrates taggers at inference time, guaranteeing inclusion of arbitrary words (even OOV, using fixed pretrained embeddings), resulting in state-of-the-art out-of-domain performance (Anderson et al., 2016).
Privacy Risk Estimation: Decoding-constrained beam search yields tight, deterministic lower bounds on near-verbatim extraction risk by enumerating high-probability candidate continuations within a specified Hamming or Levenshtein radius, dramatically outperforming Monte Carlo estimators in efficiency and coverage (Cooper et al., 26 Mar 2026).
Combinatorial Optimization: RLCBS enables RL-based planners to satisfy flexible constraint sets at inference time, outperforming evolutionary baselines under complex design constraints and yielding 2–7× speedups via efficient beam and FSM management (Chen et al., 21 Jan 2025).

Method	Constraint Type	Guarantee
Grid Beam Search	Token/phrase (hard)	Enforced (deterministic)
FSM-constrained Beam	Arbitrary FSM (hard)	Enforced (deterministic)
Stochastic Beam Search	None (diversity)	Sampling w/o replacement
Directed Beam Search	Lexical (soft/ordered)	Empirical (no guarantee)
RLCBS	Combinatorial (mixed)	Enforced if modeled

5. Deterministic and Probabilistic Hybridization

Recent work has hybridized beam search with sampling-based approaches to exploit both diversity and deterministic quality. Deterministic nucleus (top-p) search methods such as p-exact and dynamic beam search restrict candidate expansions to tokens covering a fixed probability mass p, adapting beam size based on entropy or the local token distribution. While these modifications offer marginal improvements in coverage and quality, empirical results find no significant advantage over classical beam search on standard NMT and summarization metrics (Shaham et al., 2021).

Decoding-constrained methods for extraction risk estimation apply strict path-pruning criteria (e.g., maximum allowable edit distance from a target) within standard beam search to yield deterministic lower bounds on risk, substantially enhancing sensitivity to rare or near-verbatim outputs (Cooper et al., 26 Mar 2026).

6. Limitations, Theoretical Insights, and Future Directions

Constrained auto-regressive decoding on structured search spaces (e.g., generative retrieval) reveals fundamental limitations. When constraints are enforced via step-wise marginal renormalization, KL divergence gaps emerge between the model’s constrained marginals and the true joint under all constraints. Classical beam search, which selects hypotheses by marginal score at each step, can fail to maximize set-level or recall-optimized metrics, especially under sparse relevance distributions. Alternative aggregation and amplification strategies (e.g., max-heap training or document clustering) can reduce these mismatches, but are nontrivial to realize at scale (Wu et al., 14 Apr 2025).

Proper tuning of search strategies, constraint management, and post-hoc calibration (e.g., recalibrated marginals, lookahead, or multi-branch expansion) are recommended to improve both recall and diversity in strictly constrained settings.

A plausible implication is that for scenarios requiring precise statistical guarantees under hard constraints, deterministic or explicit FSM-based beam management remains essential, while stochastic or plug-and-play approaches offer tractable, high-quality alternatives for soft constraints or plug-and-play control (Kool et al., 2019, Pascual et al., 2020).

7. Comparative Evaluation and Recommendations

Empirical analyses consistently demonstrate the superiority of constraint-aware decoding over naive or post-hoc modification approaches. Notable findings include:

Grid/FSM beam search yields large BLEU gains for interactive MT and domain adaptation, as well as perfect constraint coverage (Hokamp et al., 2017, Chousa et al., 2021).
Stochastic beam search (Gumbel-Top-k) enables sampling without replacement, combining the statistical fidelity of MC methods with the set-level diversity and quality of deterministic search (Kool et al., 2019).
RL-constrained beam search enforces complex design constraints robustly and tractably, with demonstrated speedup and feasibility benefits in process optimization (Chen et al., 21 Jan 2025).
Efficient deterministic search for privacy risk estimation uncovers significantly greater risk masses than verbatim-only techniques (Cooper et al., 26 Mar 2026).
Nucleus search methods provide deterministic, entropy-adaptive heuristics yet do not fundamentally outperform standard beam search on established NLG benchmarks (Shaham et al., 2021).

Use of grid or FSM-based constrained beam search is recommended when hard satisfaction of arbitrary constraints is required. Stochastic beam search or adaptive pruning (nucleus search) is preferred to balance diversity with fidelity. In highly structured or combinatorial domains, constraint-aware beam partitioning, dynamic beam allocation, and dual-proposal expansion (top-k with constraint-specific candidates) maximize both feasibility and quality.

References:

Stochastic Beams and Where to Find Them: The Gumbel-Top-k Trick for Sampling Sequences Without Replacement (Kool et al., 2019)
Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search (Hokamp et al., 2017)
Guided Open Vocabulary Image Captioning with Constrained Beam Search (Anderson et al., 2016)
Input Augmentation Improves Constrained Beam Search for Neural Machine Translation: NTT at WAT 2021 (Chousa et al., 2021)
Constrained Auto-Regressive Decoding Constrains Generative Retrieval (Wu et al., 14 Apr 2025)
Estimating near-verbatim extraction risk in LLMs with decoding-constrained beam search (Cooper et al., 26 Mar 2026)
Directed Beam Search: Plug-and-Play Lexically Constrained Language Generation (Pascual et al., 2020)
What Do You Get When You Cross Beam Search with Nucleus Sampling? (Shaham et al., 2021)
Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints (Chen et al., 21 Jan 2025)