Papers
Topics
Authors
Recent
Search
2000 character limit reached

Grid Beam Search for Constrained Decoding

Updated 6 April 2026
  • Grid Beam Search is an algorithm that restructures beam search into a grid to guarantee the inclusion of specified lexical constraints.
  • It organizes hypotheses by output token position and constraint token coverage, ensuring precise control over constraint placement.
  • Empirical results show that GBS improves translation quality in domain adaptation and interactive NMT without modifying model parameters.

Grid Beam Search (GBS) is an extension of standard beam search algorithms for sequence generation that guarantees the inclusion of pre-specified lexical constraints—sequences of tokens that must appear verbatim in the output. GBS forms a two-dimensional search space, or "grid", organizing partial output hypotheses according to both the generated-token position and the number of constraint tokens covered. Originating with Hokamp and Liu (Hokamp et al., 2017), GBS was motivated by the challenges of enforcing terminology coverage in tasks like neural machine translation (NMT) without retraining or modifying model parameters.

1. Problem Formulation and Motivation

Consider a sequence-generation model such as an attentional encoder–decoder that defines the output probability as

pθ(yx)=t=0Tpθ(ytx;y<t),p_\theta(y|x) = \prod_{t=0}^{T} p_\theta(y_t | x; y_{<t}),

where yy is the generated sequence, xx the input, and y<ty_{<t} the prefix up to t1t-1. For lexically constrained decoding, a set of nn constraints C={c0,,cn1}C = \{c_0, \ldots, c_{n-1}\} is specified, where each cic_i is a contiguous phrase (token sequence) to be covered exactly once in the hypothesis.

Traditional beam search lacks a mechanism to enforce arbitrary multi-token or phrase-level constraints. GBS instead ensures that all constraints appear as intended in any valid output. This is crucial in machine translation pipelines for accurate terminology injection—particularly in specialized or rapidly-evolving domains, where correct translation of technical terms is imperative (Hokamp et al., 2017, Odermatt et al., 2023).

2. Algorithmic Framework

GBS structures the search space as a (Tmax+1)×(N+1)(T_{\max}+1) \times (N+1) grid, where TmaxT_{\max} is maximum output length and yy0 is the total number of constraint tokens across all yy1 constraints (with yy2 the length of yy3). Each cell yy4 maintains up to yy5 hypotheses having produced yy6 tokens and covered exactly yy7 constraint tokens.

Hypotheses are classified as:

  • Open: not in the midst of outputting a multi-token constraint; may generate freely or start a new constraint.
  • Closed: currently outputting a constraint; must continue it until completion.

Transitions into yy8 at each time step are:

  • Generate: From yy9, open hypotheses consider free generation from the vocabulary xx0.
  • Start Constraint: From xx1, open hypotheses optionally start a yet-unused constraint xx2 by emitting its initial token xx3, entering closed state.
  • Continue Constraint: From xx4, closed hypotheses continue emitting tokens of the currently active constraint xx5.

After generating candidates for each transition type, only the top xx6 scoring hypotheses at xx7 are retained (beam pruning by cumulative log-probability). At xx8, the top-scoring complete hypothesis in xx9 emitting the EOS token is selected as output (Hokamp et al., 2017).

3. Mathematical Details and Complexity

Each hypothesis y<ty_{<t}0 in y<ty_{<t}1 encapsulates:

  • A generated prefix y<ty_{<t}2
  • Pointer y<ty_{<t}3 (position in constraint, or y<ty_{<t}4 if open)
  • Constraint coverage vector (indicating completion status)
  • Score y<ty_{<t}5

The score of successor hypothesis y<ty_{<t}6 after transition is:

y<ty_{<t}7

where y<ty_{<t}8 is a predecessor from the relevant cell depending on the transition type.

Standard beam search has complexity y<ty_{<t}9. GBS increases this to t1t-10 in the worst case, due to maintaining additional beams per level of constraint coverage. However, many cells remain empty in practice, and significant parallelization across the constraint-coverage dimension is possible. Empirical evidence confirms practicality for moderate beam/constraint sizes (Hokamp et al., 2017).

4. Applications and Empirical Results

Neural Interactive-Predictive Translation: In interactive scenarios, human correction is modeled by successively adding missing constraints and re-decoding with GBS. Each additional three-token constraint yields 4–9 BLEU improvement per iteration; four corrections surpass 20 BLEU total gain on WMT EN→DE/FR/PT benchmarks.

Domain Adaptation via Terminology Injection: Source–target domain-specific phrase pairs are extracted (e.g., by high PMI). For test sentences triggering constraints, GBS improves BLEU by +1.8 (EN→DE), +2.6 (EN→FR), and +13.7 (EN→PT) compared to a strong general-domain baseline, without retraining. Ablations show proper constraint placement by GBS is essential for these gains (Hokamp et al., 2017).

Plug-and-Play Extensions: Cascaded Beam Search (Odermatt et al., 2023) integrates GBS with logit-boosting for constraint tokens and demonstrates competitive performance on terminology-forcing tasks, rivaling systems with heavily customized models.

5. Worked Example

Suppose t1t-11 for translation of “the system suffered a failure in the black box” (t1t-12). The grid tracks progress both in the number of output tokens and number of constraint tokens covered. Emitting "black" via a start-constraint transition moves from t1t-13 to t1t-14 and closes the hypothesis on t1t-15; the next step continues with "box", incrementing t1t-16. Separately, "failure" can also be started and completed at any time. Remaining positions generate unconstrained output. The grid ensures all constraint tokens are incorporated exactly once, with the path alternating between constraint emission and free generation (Hokamp et al., 2017).

6. Relation to Other Beam Search Extensions

GBS generalizes standard beam search by adding constraint-coverage as a secondary axis, yielding guaranteed constraint satisfaction for multi-token or phrase constraints. By contrast, classical beam search tracks only hypothesis score at each time step and cannot ensure that any constraints appear.

Variants like Cascaded Beam Search (Odermatt et al., 2023) incorporate logit manipulation to bias the model towards constraint tokens, optionally relaxing tokenization requirements (e.g., via character-prefix matching) and enabling more flexible integration with LLMs. Disjunctive constraints and advanced filtering (e.g., ordering, minimum separation) can be integrated atop GBS with minimal algorithmic changes.

Method Constraint Guarantee Training Modification Complexity Increase (vs. Beam)
Standard Beam Search None No Baseline
Grid Beam Search (Hokamp et al., 2017) Hard No t1t-17 in constraint tokens
Cascaded Beam Search (Odermatt et al., 2023) Hard (with extensions) No t1t-18 in constraints

7. Advantages, Limitations, and Practical Considerations

Advantages:

  • Guarantees satisfaction of arbitrary lexical or phrase constraints, given sufficient beam width and reachable search space.
  • Does not require training or parameter modification; operates as a generic decoding procedure for any autoregressive model.
  • Flexible for interactive, domain adaptation, and plug-and-play applications.

Limitations:

  • Linear increase in runtime and memory with the number of constraints/constraint tokens.
  • Decoding latency grows by approximately t1t-19 in the worst case, though parallelization mitigates impact.
  • Hypotheses require augmented state (coverage vector, open/closed status), increasing computational overhead.
  • Constraints must align exactly with model tokenization unless extended approaches (e.g., character-based matching) are employed (Odermatt et al., 2023).
  • Overlapping or discontinuous constraints require custom handling for correctness.

This suggests GBS is most practical when the number of constraints is moderate and precise placement of reserved terminology is essential, such as technical NMT or post-editing pipelines.

References

  • Hokamp, C., & Liu, Q. (2017). "Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search" (Hokamp et al., 2017).
  • Odermatt, F., Egressy, B., & Wattenhofer, R. (2023). "Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural Machine Translation" (Odermatt et al., 2023).

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Grid Beam Search.