Grid Beam Search for Constrained Decoding

Updated 6 April 2026

Grid Beam Search is an algorithm that restructures beam search into a grid to guarantee the inclusion of specified lexical constraints.
It organizes hypotheses by output token position and constraint token coverage, ensuring precise control over constraint placement.
Empirical results show that GBS improves translation quality in domain adaptation and interactive NMT without modifying model parameters.

Grid Beam Search (GBS) is an extension of standard beam search algorithms for sequence generation that guarantees the inclusion of pre-specified lexical constraints—sequences of tokens that must appear verbatim in the output. GBS forms a two-dimensional search space, or "grid", organizing partial output hypotheses according to both the generated-token position and the number of constraint tokens covered. Originating with Hokamp and Liu (Hokamp et al., 2017), GBS was motivated by the challenges of enforcing terminology coverage in tasks like neural machine translation (NMT) without retraining or modifying model parameters.

1. Problem Formulation and Motivation

Consider a sequence-generation model such as an attentional encoder–decoder that defines the output probability as

$p_\theta(y|x) = \prod_{t=0}^{T} p_\theta(y_t | x; y_{<t}),$

where $y$ is the generated sequence, $x$ the input, and $y_{<t}$ the prefix up to $t-1$ . For lexically constrained decoding, a set of $n$ constraints $C = \{c_0, \ldots, c_{n-1}\}$ is specified, where each $c_i$ is a contiguous phrase (token sequence) to be covered exactly once in the hypothesis.

Traditional beam search lacks a mechanism to enforce arbitrary multi-token or phrase-level constraints. GBS instead ensures that all constraints appear as intended in any valid output. This is crucial in machine translation pipelines for accurate terminology injection—particularly in specialized or rapidly-evolving domains, where correct translation of technical terms is imperative (Hokamp et al., 2017, Odermatt et al., 2023).

2. Algorithmic Framework

GBS structures the search space as a $(T_{\max}+1) \times (N+1)$ grid, where $T_{\max}$ is maximum output length and $y$ 0 is the total number of constraint tokens across all $y$ 1 constraints (with $y$ 2 the length of $y$ 3). Each cell $y$ 4 maintains up to $y$ 5 hypotheses having produced $y$ 6 tokens and covered exactly $y$ 7 constraint tokens.

Hypotheses are classified as:

Open: not in the midst of outputting a multi-token constraint; may generate freely or start a new constraint.
Closed: currently outputting a constraint; must continue it until completion.

Transitions into $y$ 8 at each time step are:

Generate: From $y$ 9, open hypotheses consider free generation from the vocabulary $x$ 0.
Start Constraint: From $x$ 1, open hypotheses optionally start a yet-unused constraint $x$ 2 by emitting its initial token $x$ 3, entering closed state.
Continue Constraint: From $x$ 4, closed hypotheses continue emitting tokens of the currently active constraint $x$ 5.

After generating candidates for each transition type, only the top $x$ 6 scoring hypotheses at $x$ 7 are retained (beam pruning by cumulative log-probability). At $x$ 8, the top-scoring complete hypothesis in $x$ 9 emitting the EOS token is selected as output (Hokamp et al., 2017).

3. Mathematical Details and Complexity

Each hypothesis $y_{<t}$ 0 in $y_{<t}$ 1 encapsulates:

A generated prefix $y_{<t}$ 2
Pointer $y_{<t}$ 3 (position in constraint, or $y_{<t}$ 4 if open)
Constraint coverage vector (indicating completion status)
Score $y_{<t}$ 5

The score of successor hypothesis $y_{<t}$ 6 after transition is:

$y_{<t}$ 7

where $y_{<t}$ 8 is a predecessor from the relevant cell depending on the transition type.

Standard beam search has complexity $y_{<t}$ 9. GBS increases this to $t-1$ 0 in the worst case, due to maintaining additional beams per level of constraint coverage. However, many cells remain empty in practice, and significant parallelization across the constraint-coverage dimension is possible. Empirical evidence confirms practicality for moderate beam/constraint sizes (Hokamp et al., 2017).

4. Applications and Empirical Results

Neural Interactive-Predictive Translation: In interactive scenarios, human correction is modeled by successively adding missing constraints and re-decoding with GBS. Each additional three-token constraint yields 4–9 BLEU improvement per iteration; four corrections surpass 20 BLEU total gain on WMT EN→DE/FR/PT benchmarks.

Domain Adaptation via Terminology Injection: Source–target domain-specific phrase pairs are extracted (e.g., by high PMI). For test sentences triggering constraints, GBS improves BLEU by +1.8 (EN→DE), +2.6 (EN→FR), and +13.7 (EN→PT) compared to a strong general-domain baseline, without retraining. Ablations show proper constraint placement by GBS is essential for these gains (Hokamp et al., 2017).

Plug-and-Play Extensions: Cascaded Beam Search (Odermatt et al., 2023) integrates GBS with logit-boosting for constraint tokens and demonstrates competitive performance on terminology-forcing tasks, rivaling systems with heavily customized models.

5. Worked Example

Suppose $t-1$ 1 for translation of “the system suffered a failure in the black box” ( $t-1$ 2). The grid tracks progress both in the number of output tokens and number of constraint tokens covered. Emitting "black" via a start-constraint transition moves from $t-1$ 3 to $t-1$ 4 and closes the hypothesis on $t-1$ 5; the next step continues with "box", incrementing $t-1$ 6. Separately, "failure" can also be started and completed at any time. Remaining positions generate unconstrained output. The grid ensures all constraint tokens are incorporated exactly once, with the path alternating between constraint emission and free generation (Hokamp et al., 2017).

6. Relation to Other Beam Search Extensions

GBS generalizes standard beam search by adding constraint-coverage as a secondary axis, yielding guaranteed constraint satisfaction for multi-token or phrase constraints. By contrast, classical beam search tracks only hypothesis score at each time step and cannot ensure that any constraints appear.

Variants like Cascaded Beam Search (Odermatt et al., 2023) incorporate logit manipulation to bias the model towards constraint tokens, optionally relaxing tokenization requirements (e.g., via character-prefix matching) and enabling more flexible integration with LLMs. Disjunctive constraints and advanced filtering (e.g., ordering, minimum separation) can be integrated atop GBS with minimal algorithmic changes.

Method	Constraint Guarantee	Training Modification	Complexity Increase (vs. Beam)
Standard Beam Search	None	No	Baseline
Grid Beam Search (Hokamp et al., 2017)	Hard	No	$t-1$ 7 in constraint tokens
Cascaded Beam Search (Odermatt et al., 2023)	Hard (with extensions)	No	$t-1$ 8 in constraints

7. Advantages, Limitations, and Practical Considerations

Advantages:

Guarantees satisfaction of arbitrary lexical or phrase constraints, given sufficient beam width and reachable search space.
Does not require training or parameter modification; operates as a generic decoding procedure for any autoregressive model.
Flexible for interactive, domain adaptation, and plug-and-play applications.

Limitations:

Linear increase in runtime and memory with the number of constraints/constraint tokens.
Decoding latency grows by approximately $t-1$ 9 in the worst case, though parallelization mitigates impact.
Hypotheses require augmented state (coverage vector, open/closed status), increasing computational overhead.
Constraints must align exactly with model tokenization unless extended approaches (e.g., character-based matching) are employed (Odermatt et al., 2023).
Overlapping or discontinuous constraints require custom handling for correctness.

This suggests GBS is most practical when the number of constraints is moderate and precise placement of reserved terminology is essential, such as technical NMT or post-editing pipelines.

References

Hokamp, C., & Liu, Q. (2017). "Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search" (Hokamp et al., 2017).
Odermatt, F., Egressy, B., & Wattenhofer, R. (2023). "Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural Machine Translation" (Odermatt et al., 2023).

Markdown Report Issue Upgrade to Chat

References (2)

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search (2017)

Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural Machine Translation (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Grid Beam Search.

Grid Beam Search for Constrained Decoding

1. Problem Formulation and Motivation

2. Algorithmic Framework

3. Mathematical Details and Complexity

4. Applications and Empirical Results

5. Worked Example

6. Relation to Other Beam Search Extensions

7. Advantages, Limitations, and Practical Considerations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Grid Beam Search for Constrained Decoding

1. Problem Formulation and Motivation

2. Algorithmic Framework

3. Mathematical Details and Complexity

4. Applications and Empirical Results

5. Worked Example

6. Relation to Other Beam Search Extensions

7. Advantages, Limitations, and Practical Considerations

References

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research