Grid Beam Search for Constrained Decoding
- Grid Beam Search is an algorithm that restructures beam search into a grid to guarantee the inclusion of specified lexical constraints.
- It organizes hypotheses by output token position and constraint token coverage, ensuring precise control over constraint placement.
- Empirical results show that GBS improves translation quality in domain adaptation and interactive NMT without modifying model parameters.
Grid Beam Search (GBS) is an extension of standard beam search algorithms for sequence generation that guarantees the inclusion of pre-specified lexical constraints—sequences of tokens that must appear verbatim in the output. GBS forms a two-dimensional search space, or "grid", organizing partial output hypotheses according to both the generated-token position and the number of constraint tokens covered. Originating with Hokamp and Liu (Hokamp et al., 2017), GBS was motivated by the challenges of enforcing terminology coverage in tasks like neural machine translation (NMT) without retraining or modifying model parameters.
1. Problem Formulation and Motivation
Consider a sequence-generation model such as an attentional encoder–decoder that defines the output probability as
where is the generated sequence, the input, and the prefix up to . For lexically constrained decoding, a set of constraints is specified, where each is a contiguous phrase (token sequence) to be covered exactly once in the hypothesis.
Traditional beam search lacks a mechanism to enforce arbitrary multi-token or phrase-level constraints. GBS instead ensures that all constraints appear as intended in any valid output. This is crucial in machine translation pipelines for accurate terminology injection—particularly in specialized or rapidly-evolving domains, where correct translation of technical terms is imperative (Hokamp et al., 2017, Odermatt et al., 2023).
2. Algorithmic Framework
GBS structures the search space as a grid, where is maximum output length and 0 is the total number of constraint tokens across all 1 constraints (with 2 the length of 3). Each cell 4 maintains up to 5 hypotheses having produced 6 tokens and covered exactly 7 constraint tokens.
Hypotheses are classified as:
- Open: not in the midst of outputting a multi-token constraint; may generate freely or start a new constraint.
- Closed: currently outputting a constraint; must continue it until completion.
Transitions into 8 at each time step are:
- Generate: From 9, open hypotheses consider free generation from the vocabulary 0.
- Start Constraint: From 1, open hypotheses optionally start a yet-unused constraint 2 by emitting its initial token 3, entering closed state.
- Continue Constraint: From 4, closed hypotheses continue emitting tokens of the currently active constraint 5.
After generating candidates for each transition type, only the top 6 scoring hypotheses at 7 are retained (beam pruning by cumulative log-probability). At 8, the top-scoring complete hypothesis in 9 emitting the EOS token is selected as output (Hokamp et al., 2017).
3. Mathematical Details and Complexity
Each hypothesis 0 in 1 encapsulates:
- A generated prefix 2
- Pointer 3 (position in constraint, or 4 if open)
- Constraint coverage vector (indicating completion status)
- Score 5
The score of successor hypothesis 6 after transition is:
7
where 8 is a predecessor from the relevant cell depending on the transition type.
Standard beam search has complexity 9. GBS increases this to 0 in the worst case, due to maintaining additional beams per level of constraint coverage. However, many cells remain empty in practice, and significant parallelization across the constraint-coverage dimension is possible. Empirical evidence confirms practicality for moderate beam/constraint sizes (Hokamp et al., 2017).
4. Applications and Empirical Results
Neural Interactive-Predictive Translation: In interactive scenarios, human correction is modeled by successively adding missing constraints and re-decoding with GBS. Each additional three-token constraint yields 4–9 BLEU improvement per iteration; four corrections surpass 20 BLEU total gain on WMT EN→DE/FR/PT benchmarks.
Domain Adaptation via Terminology Injection: Source–target domain-specific phrase pairs are extracted (e.g., by high PMI). For test sentences triggering constraints, GBS improves BLEU by +1.8 (EN→DE), +2.6 (EN→FR), and +13.7 (EN→PT) compared to a strong general-domain baseline, without retraining. Ablations show proper constraint placement by GBS is essential for these gains (Hokamp et al., 2017).
Plug-and-Play Extensions: Cascaded Beam Search (Odermatt et al., 2023) integrates GBS with logit-boosting for constraint tokens and demonstrates competitive performance on terminology-forcing tasks, rivaling systems with heavily customized models.
5. Worked Example
Suppose 1 for translation of “the system suffered a failure in the black box” (2). The grid tracks progress both in the number of output tokens and number of constraint tokens covered. Emitting "black" via a start-constraint transition moves from 3 to 4 and closes the hypothesis on 5; the next step continues with "box", incrementing 6. Separately, "failure" can also be started and completed at any time. Remaining positions generate unconstrained output. The grid ensures all constraint tokens are incorporated exactly once, with the path alternating between constraint emission and free generation (Hokamp et al., 2017).
6. Relation to Other Beam Search Extensions
GBS generalizes standard beam search by adding constraint-coverage as a secondary axis, yielding guaranteed constraint satisfaction for multi-token or phrase constraints. By contrast, classical beam search tracks only hypothesis score at each time step and cannot ensure that any constraints appear.
Variants like Cascaded Beam Search (Odermatt et al., 2023) incorporate logit manipulation to bias the model towards constraint tokens, optionally relaxing tokenization requirements (e.g., via character-prefix matching) and enabling more flexible integration with LLMs. Disjunctive constraints and advanced filtering (e.g., ordering, minimum separation) can be integrated atop GBS with minimal algorithmic changes.
| Method | Constraint Guarantee | Training Modification | Complexity Increase (vs. Beam) |
|---|---|---|---|
| Standard Beam Search | None | No | Baseline |
| Grid Beam Search (Hokamp et al., 2017) | Hard | No | 7 in constraint tokens |
| Cascaded Beam Search (Odermatt et al., 2023) | Hard (with extensions) | No | 8 in constraints |
7. Advantages, Limitations, and Practical Considerations
Advantages:
- Guarantees satisfaction of arbitrary lexical or phrase constraints, given sufficient beam width and reachable search space.
- Does not require training or parameter modification; operates as a generic decoding procedure for any autoregressive model.
- Flexible for interactive, domain adaptation, and plug-and-play applications.
Limitations:
- Linear increase in runtime and memory with the number of constraints/constraint tokens.
- Decoding latency grows by approximately 9 in the worst case, though parallelization mitigates impact.
- Hypotheses require augmented state (coverage vector, open/closed status), increasing computational overhead.
- Constraints must align exactly with model tokenization unless extended approaches (e.g., character-based matching) are employed (Odermatt et al., 2023).
- Overlapping or discontinuous constraints require custom handling for correctness.
This suggests GBS is most practical when the number of constraints is moderate and precise placement of reserved terminology is essential, such as technical NMT or post-editing pipelines.
References
- Hokamp, C., & Liu, Q. (2017). "Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search" (Hokamp et al., 2017).
- Odermatt, F., Egressy, B., & Wattenhofer, R. (2023). "Cascaded Beam Search: Plug-and-Play Terminology-Forcing For Neural Machine Translation" (Odermatt et al., 2023).