Papers
Topics
Authors
Recent
Search
2000 character limit reached

Force Sampling Beam Search (FSBS)

Updated 25 February 2026
  • FSBS is a hybrid decoding algorithm that integrates deterministic beam search with forced sampling to mitigate duplication and broaden output coverage.
  • It forces a fixed number of stochastic samples per decoding step, ensuring a balance between high-probability branches and diverse hypothesis generation.
  • Empirical evaluations show FSBS reduces estimator variance and improves uncertainty quantification metrics compared to standard methods.

Force Sampling Beam Search (FSBS) is a hybrid decoding algorithm that integrates the rigorous, deterministic exploration of beam search with injected stochasticity from sampling at controlled points within the beam expansion. FSBS was developed as a response to the problem of duplication, low coverage of model output space, and high variance in uncertainty quantification (UQ) when relying on pure multinomial sampling. FSBS is used in LLM generation and uncertainty estimation, where diverse, faithful hypotheses are required without sacrificing the mass coverage and reproducibility typically associated with beam search (Fadeeva et al., 10 Dec 2025).

1. Formulation and Algorithmic Procedure

FSBS prescribes beam search with explicit mechanisms to force a fixed number of sampled continuations into the beam at each generation step. Two integers define the process: the total beam width MM and the number of forced samples F≤MF \leq M. At every decoding step tt, FSBS:

  1. Computes the next-token distribution p(v∣y<t,x)p(v \mid y_{<t}, x).
  2. Expands (M−F)(M-F) beams deterministically using the highest-probability next tokens (as standard beam search).
  3. Expands the remaining FF beams by drawing i.i.d. samples from p(⋅∣y<t,x)p(\cdot \mid y_{<t}, x), optionally with temperature scaling.
  4. From all MM hypotheses, re-ranks by sequence probability and retains the top MM for the next step.

The formal procedure is expressed as follows:

  • Let the final FSBS beam set be BF,M(x)={b(1),…,b(M)}B_{F,M}(x) = \{ b^{(1)}, \dots, b^{(M)} \}, with model-assigned probabilities F≤MF \leq M0.
  • Importance weights for candidates are given by

F≤MF \leq M1

  • These are used for UQ estimation.

FSBS pseudocode instantiates these steps, with deterministic expansions from the highest-probability branches and explicit forced sampling for a user-specified fraction of the beam (Fadeeva et al., 10 Dec 2025).

2. Theoretical Properties and Comparison to Standard Approaches

FSBS provides a middle ground between deterministic beam search and fully stochastic multinomial sampling:

  • Coverage and Diversity: By construction, FSBS provides at least F≤MF \leq M2 distinct, stochastically-sampled candidates per step, increasing coverage of the output distribution and systematically reducing duplicates. Pure sampling is prone to high duplication under peaked distributions; FSBS's beam structure remedies this.
  • Variance and Mean-Squared Error: The estimator variance for UQ using FSBS is strictly lower than multinomial sampling when the total probability mass F≤MF \leq M3 covered by the beams is large:

F≤MF \leq M4

A distribution-free sufficient condition is F≤MF \leq M5.

  • Bias–Variance Tradeoff: FSBS admits a small bias, determined by the mass and distributional difference between covered (F≤MF \leq M6) and uncovered hypotheses, but gains a substantial reduction in estimator variance and increases reliability of output sets.
  • Computational Cost: The per-step cost remains that of standard beam search, as forced-sampled expansions simply substitute for deterministic expansions; no extra forward passes are required (Fadeeva et al., 10 Dec 2025).

Table: Key Differences Among Decoding Strategies

Method Stochasticity Injected Candidate Diversity Mass Coverage Control
Beam Search None Low–Moderate High (top F≤MF \leq M7)
Multinomial Sampling All candidates Low (duplicative) Varies (depends on F≤MF \leq M8)
FSBS Partial (F≤MF \leq M9 out of tt0 per step) High (guaranteed tt1 distinct samples) Consistently high with ranking

FSBS’s construction addresses the duplication and poor support coverage endemic to multinomial sampling while retaining beam search’s interpretable ranking by probability (Fadeeva et al., 10 Dec 2025).

3. Mathematical Framework for Uncertainty Quantification

For an answer tt2 (reference or produced), FSBS underpins UQ estimation using semantic-similarity–based metrics (e.g., NLI or STS cross-encoders):

tt3

where tt4 is a semantic similarity score. The FSBS estimator is

tt5

with tt6 as above.

The reduction in variance arises because FSBS deterministically explores high-probability beams while ensuring that sampled continuations add randomness only among lower-probability candidates, stabilizing the set of proposals and their weights across runs (Fadeeva et al., 10 Dec 2025).

4. Empirical Evaluation and Implementation Considerations

FSBS has been evaluated on six short-form QA datasets (TriviaQA, WebQuestions, CoQA, HotpotQA, CommonSenseQA, ARC-Challenge) and three LLMs (Gemma 3 4B, Llama 3 8B, Qwen 3 8B, both base and instruct). For consistency-based UQ metrics—Dissimilarity, Eccentricity, Eigenvectors Dissimilarity, CoCoA—FSBS (with typical configuration tt7, tt8, tt9) achieves higher Prediction–Rejection Ratio (PRR) than standard beam search (uniform weighting) or multinomial sampling of equal budget (p(v∣y<t,x)p(v \mid y_{<t}, x)0) in 23/24 experiments.

Summary performance on Gemma 3 4B base model (averaged):

  • Dissimilarity (multinomial): PRR = 0.630
  • Beam search (uniform): PRR ≈ 0.620
  • FSBS (prob-weighted): PRR = 0.650

ROC-AUC and PR-AUC reflect parallel gains. Variance of PRR estimates decreases (p(v∣y<t,x)p(v \mid y_{<t}, x)1). FSBS is robust to the similarity metric used (NLI-based entailment or RoBERTa-STS) and tuning hyperparameters (p(v∣y<t,x)p(v \mid y_{<t}, x)2, p(v∣y<t,x)p(v \mid y_{<t}, x)3, temperature p(v∣y<t,x)p(v \mid y_{<t}, x)4). Saturation occurs with p(v∣y<t,x)p(v \mid y_{<t}, x)5 for open-ended QA and even p(v∣y<t,x)p(v \mid y_{<t}, x)6 for multiple-choice. A normalization floor (p(v∣y<t,x)p(v \mid y_{<t}, x)7) before reweighting is recommended for probabilistic stability (Fadeeva et al., 10 Dec 2025).

5. Relationship to Deterministic Nucleus-Decoding Approaches

Prior to FSBS, deterministic nucleus-based beam search algorithms were introduced (p-exact search, dynamic beam search) (Shaham et al., 2021). These methods prune the candidate set at each step by a probability-mass threshold p(v∣y<t,x)p(v \mid y_{<t}, x)8 (nucleus sampling) but operate deterministically, never drawing samples:

  • p-Exact search solves for the most probable sequence constrained to tokens in the p(v∣y<t,x)p(v \mid y_{<t}, x)9-nucleus at every step, using Dijkstra’s algorithm over the induced pruned graph. This maintains optimality wrt the pruned token space.
  • Dynamic beam search adapts beam width to the entropy of the probability distribution, expanding when uncertainty is high and contracting when it is low. At each step, the minimal nucleus supporting mass (M−F)(M-F)0 determines the next beam width.

Despite their probabilistic roots, these methods behave similarly to standard beam search in terms of BLEU and ROUGE metrics on translation and summarization—with differences less than 0.2 points—and do not outperform well-tuned small-beam baselines. The key insight from these studies is that probabilistic tail-pruning (as in nucleus sampling) can be inserted within deterministic beam-search frameworks without loss of fidelity, and that such mechanisms can be safely combined with forced sampling to construct FSBS (Shaham et al., 2021).

6. Design Implications and Extensions

The core insight of the FSBS paradigm is the seamless blending of deterministic and stochastic hypotheses: forced sampling provides guaranteed diversity, while beam search ensures probability mass coverage and stable candidate support. A plausible implication is that FSBS can be further extended:

  • By combining with nucleus pruning—e.g., sampling or expanding beams only within the top-(M−F)(M-F)1 nucleus at each step.
  • By adopting diverse beam search for deterministic candidates, in concert with random forced sampling for others.
  • By tuning the ratio (M−F)(M-F)2 adaptively according to model uncertainty or entropy.

Since trimming beams to a (M−F)(M-F)3-nucleus is empirically harmless or slightly helpful, lightweight probabilistic pruning can be stacked atop FSBS mechanisms without detrimental quality impact, enabling further control over runtime and memory (Shaham et al., 2021).

7. Summary and Availability

FSBS is a practical, theoretically motivated extension of beam search for autoregressive LLM decoding and UQ that introduces forced stochasticity via sampled beam continuations. It delivers:

  • Higher support coverage versus multinomial sampling and standard beam search.
  • Lower output duplication, particularly with short-answer distributions.
  • Reduced estimator variance and more stable UQ measurements across random seeds and runs.
  • No increase in computational cost beyond standard beam search.

A reference implementation is provided in the extended LM-Polygraph toolkit (Fadeeva et al., 10 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Force Sampling Beam Search (FSBS).