Papers
Topics
Authors
Recent
Search
2000 character limit reached

Grid Beam Search (GBS)

Updated 6 April 2026
  • Grid Beam Search (GBS) is an extension of beam search designed to enforce user-specified lexical constraints exactly once in generated sequences.
  • It organizes the search in a two-dimensional grid that efficiently manages open and closed hypotheses without modifying model parameters.
  • Empirical results show significant translation quality gains, with BLEU improvements of up to +9.20 in interactive post-editing and +13.74 in domain adaptation.

Grid Beam Search (GBS) is an extension of the classical left-to-right beam search algorithm that enables the incorporation of arbitrary lexical constraints—user-specified words or phrases that must be present exactly once in a generated output sequence. Unlike standard beam search, which seeks to maximize model likelihood without enforcing constraint inclusion, GBS ensures that every output sequence returned satisfies all such constraints, without requiring modification of model parameters or retraining. The algorithm is formulated for general sequence generation models, making it directly applicable in multiple scenarios such as interactive neural machine translation and domain adaptation (Hokamp et al., 2017).

1. Motivation and Problem Statement

Conventional beam search aims to identify the highest-probability sequence

y^=argmaxypθ(yx)=argmaxyt=0Tpθ(ytx,y<t)\hat{\mathbf y} = \arg\max_{\mathbf y} p_\theta(\mathbf y \mid \mathbf x) = \arg\max_{\mathbf y} \prod_{t=0}^T p_\theta(y_t \mid \mathbf x, y_{<t})

but lacks any mechanism to guarantee that specified lexical elements appear in the generated output. This becomes limiting in use-cases like interactive post-editing and domain-specific translation, where it is often crucial to force the decoder to include a set of constraints {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\} (each being a single- or multi-token word or phrase) in the output. GBS addresses this by constraining the search space, Y(c)\mathcal Y(\mathbf c), to sequences containing each ci\mathbf c_i as (contiguous) subsequences exactly once.

2. Formalization and Search Structure

GBS frames constrained decoding as the following maximization: y^=argmaxyY(c)logpθ(yx)=argmaxyY(c)t=0Tlogpθ(ytx,y<t)\hat{\mathbf y} = \arg\max_{\mathbf y \in \mathcal Y(\mathbf c)} \log p_\theta(\mathbf y \mid \mathbf x) = \arg\max_{\mathbf y \in \mathcal Y(\mathbf c)} \sum_{t=0}^T \log p_\theta(y_t \mid \mathbf x, y_{<t}) where Y(c)\mathcal Y(\mathbf c) is the set of possible outputs meeting all lexical constraints. The search is organized in a two-dimensional grid of beams parameterized by (t,c)(t, c): tt is the output timestep, and cc tracks the total number of constraint tokens covered. Each cell Grid[t][c]\texttt{Grid}[t][c] stores up to {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}0 best hypotheses with {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}1 generated tokens and {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}2 covered constraint tokens.

Hypotheses are labeled as either:

  • Open: permitted to freely generate any token (via GENERATE) or to START a new, unused constraint.
  • Closed: currently in the middle of realizing a constraint and required to CONTINUE emitting its tokens.

The output space is fully traversed until a hypothesis covering all {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}3 constraint tokens (with {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}4 the length of {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}5) has been generated and EOS has been emitted.

3. Algorithmic Operation

The high-level GBS decoding procedure is as follows:

  • The grid is initialized such that {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}6 contains the initial (BOS) hypothesis.
  • At each timestep {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}7 and constraint coverage {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}8, candidates are assembled from:
    • Extending open hypotheses in {c1,,cM}\{\mathbf c_1, \dots, \mathbf c_M\}9 using GENERATE.
    • Starting new constraints from open hypotheses in Grid[t1][c1]<code>with</code>START.</li><li>Continuinginprogressconstraintsfromclosedhypothesesin\texttt{Grid}[t-1] [c-1]<code>with</code>START`.</li> <li>Continuing in-progress constraints from closed hypotheses in \texttt{Grid}[t-1] [c-1]viaCONTINUE`.
  • After scoring, only the Y(c)\mathcal Y(\mathbf c)0-best hypotheses are retained in each grid cell.
  • Finished hypotheses in Y(c)\mathcal Y(\mathbf c)1 producing EOS are considered; the best is output.

The pseudocode explicitly implements this control flow, managing open/closed hypothesis status and ensuring each constraint is handled precisely once.

4. Computational and Practical Considerations

Computational complexity for GBS is Y(c)\mathcal Y(\mathbf c)2, compared to Y(c)\mathcal Y(\mathbf c)3 for unconstrained beam search. In practice, because the total number of constraint tokens Y(c)\mathcal Y(\mathbf c)4 is typically small (Y(c)\mathcal Y(\mathbf c)5–Y(c)\mathcal Y(\mathbf c)6), efficiency is maintained via parallelization across timestep grids and conservative beam sizes (typically Y(c)\mathcal Y(\mathbf c)7–Y(c)\mathcal Y(\mathbf c)8).

Additional practical measures include:

  • Aggressive hypothesis pruning, or imposing a length cap Y(c)\mathcal Y(\mathbf c)9 (set e.g. to ci\mathbf c_i0, ci\mathbf c_i1).
  • Using subword vocabularies (such as BPE) for robust constraint matching, including previously unseen words.
  • Merging overlapping constraints or tokenizing all constraints with the same pre-processing as the main model for alignment.

5. Illustrative Example

A toy scenario constrains the output to include “Paris” and “visited” as single-token constraints. At ci\mathbf c_i2, the initial hypothesis is present. At ci\mathbf c_i3, possible operations include:

  • GENERATE yielding “He” or “She” (ci\mathbf c_i4),
  • START “Paris” or “visited” (ci\mathbf c_i5).

As decoding proceeds, it is possible to alternate between generating unconstrained tokens or further constraints, ensuring that all permutations (“She visited Paris”, “Paris was visited”) that include both constraints are considered. The grid structure guarantees exhaustive but efficient exploration of the constrained output space.

6. Empirical Results

Two empirical domains highlight GBS benefits:

  • Interactive Post-Editing (Pick–Revise): Simulated 4-cycle iterative translation, adding one up-to-3-word constraint per cycle, yielded progressive BLEU increases (EN→DE): 18.44 (baseline), 27.64 (+9.20), 36.66 (+9.01), 43.92 (+7.26).
  • Domain Adaptation (Terminology Injection): Domain-agnostic NMT constrained using automatically extracted terminology pairs achieved BLEU gains for Autodesk IT: EN→DE 26.17→27.99 (+1.82), EN→FR 32.45→35.05 (+2.60), EN→PT 15.41→29.15 (+13.74).

These results confirm large improvements in translation quality for both interactive and zero-shot domain adaptation scenarios, solely by imposing lexical constraints at inference (Hokamp et al., 2017).

7. Relationship to Alternative Methods

Standard beam search is incapable of guaranteeing constraint satisfaction—in experiments, BLEU remains unchanged because desired phrasings often do not occur. Prefix-based interactive translation systems can generate outputs consistent with a fixed initial constraint, but can only enforce a single prefix, not multiple or internal constraints. Phrase-based SMT Pick–Revise approaches require phrase tables and explicit alignment, unlike GBS’s token/subword approach that needs no retraining. Soft-constraint and joint attention models require additional training and architectural complexity, whereas GBS operates out-of-the-box atop any pretrained sequence model.

8. Extensions, Limitations, and Best Practices

GBS can accommodate discontinuous constraints (such as phrasal verbs with intervening tokens) by filtering valid start/continue points. Subword vocabularies enable the handling of out-of-vocabulary constraint tokens. The principal limitation is linear runtime scaling with ci\mathbf c_i6; efficient implementation entails capping ci\mathbf c_i7, minimizing ci\mathbf c_i8, and leveraging beam-level parallelism. Merging overlapping constraints and tokenizing constraints identically to main inputs further enhances efficiency. Early exit upon full constraint coverage and EOS generation is recommended, avoiding unnecessary grid expansion.

In summary, Grid Beam Search represents a straightforward but effective generalization of beam search that tightly integrates arbitrary lexical constraints into output sequences, with demonstrable gains across interactive and domain-adaptation use-cases, and with broad applicability to sequence generation tasks (Hokamp et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Grid Beam Search (GBS).