Force Sampling Beam Search (FSBS)

Updated 25 February 2026

FSBS is a hybrid decoding algorithm that integrates deterministic beam search with forced sampling to mitigate duplication and broaden output coverage.
It forces a fixed number of stochastic samples per decoding step, ensuring a balance between high-probability branches and diverse hypothesis generation.
Empirical evaluations show FSBS reduces estimator variance and improves uncertainty quantification metrics compared to standard methods.

Force Sampling Beam Search (FSBS) is a hybrid decoding algorithm that integrates the rigorous, deterministic exploration of beam search with injected stochasticity from sampling at controlled points within the beam expansion. FSBS was developed as a response to the problem of duplication, low coverage of model output space, and high variance in uncertainty quantification (UQ) when relying on pure multinomial sampling. FSBS is used in LLM generation and uncertainty estimation, where diverse, faithful hypotheses are required without sacrificing the mass coverage and reproducibility typically associated with beam search (Fadeeva et al., 10 Dec 2025).

1. Formulation and Algorithmic Procedure

FSBS prescribes beam search with explicit mechanisms to force a fixed number of sampled continuations into the beam at each generation step. Two integers define the process: the total beam width $M$ and the number of forced samples $F \leq M$ . At every decoding step $t$ , FSBS:

Computes the next-token distribution $p(v \mid y_{<t}, x)$ .
Expands $(M-F)$ beams deterministically using the highest-probability next tokens (as standard beam search).
Expands the remaining $F$ beams by drawing i.i.d. samples from $p(\cdot \mid y_{<t}, x)$ , optionally with temperature scaling.
From all $M$ hypotheses, re-ranks by sequence probability and retains the top $M$ for the next step.

The formal procedure is expressed as follows:

Let the final FSBS beam set be $B_{F,M}(x) = \{ b^{(1)}, \dots, b^{(M)} \}$ , with model-assigned probabilities $F \leq M$ 0.
Importance weights for candidates are given by

$F \leq M$ 1

These are used for UQ estimation.

FSBS pseudocode instantiates these steps, with deterministic expansions from the highest-probability branches and explicit forced sampling for a user-specified fraction of the beam (Fadeeva et al., 10 Dec 2025).

2. Theoretical Properties and Comparison to Standard Approaches

FSBS provides a middle ground between deterministic beam search and fully stochastic multinomial sampling:

Coverage and Diversity: By construction, FSBS provides at least $F \leq M$ 2 distinct, stochastically-sampled candidates per step, increasing coverage of the output distribution and systematically reducing duplicates. Pure sampling is prone to high duplication under peaked distributions; FSBS's beam structure remedies this.
Variance and Mean-Squared Error: The estimator variance for UQ using FSBS is strictly lower than multinomial sampling when the total probability mass $F \leq M$ 3 covered by the beams is large:

$F \leq M$ 4

A distribution-free sufficient condition is $F \leq M$ 5.

Bias–Variance Tradeoff: FSBS admits a small bias, determined by the mass and distributional difference between covered ( $F \leq M$ 6) and uncovered hypotheses, but gains a substantial reduction in estimator variance and increases reliability of output sets.
Computational Cost: The per-step cost remains that of standard beam search, as forced-sampled expansions simply substitute for deterministic expansions; no extra forward passes are required (Fadeeva et al., 10 Dec 2025).

Table: Key Differences Among Decoding Strategies

Method	Stochasticity Injected	Candidate Diversity	Mass Coverage Control
Beam Search	None	Low–Moderate	High (top $F \leq M$ 7)
Multinomial Sampling	All candidates	Low (duplicative)	Varies (depends on $F \leq M$ 8)
FSBS	Partial ( $F \leq M$ 9 out of $t$ 0 per step)	High (guaranteed $t$ 1 distinct samples)	Consistently high with ranking

FSBS’s construction addresses the duplication and poor support coverage endemic to multinomial sampling while retaining beam search’s interpretable ranking by probability (Fadeeva et al., 10 Dec 2025).

3. Mathematical Framework for Uncertainty Quantification

For an answer $t$ 2 (reference or produced), FSBS underpins UQ estimation using semantic-similarity–based metrics (e.g., NLI or STS cross-encoders):

$t$ 3

where $t$ 4 is a semantic similarity score. The FSBS estimator is

$t$ 5

with $t$ 6 as above.

The reduction in variance arises because FSBS deterministically explores high-probability beams while ensuring that sampled continuations add randomness only among lower-probability candidates, stabilizing the set of proposals and their weights across runs (Fadeeva et al., 10 Dec 2025).

4. Empirical Evaluation and Implementation Considerations

FSBS has been evaluated on six short-form QA datasets (TriviaQA, WebQuestions, CoQA, HotpotQA, CommonSenseQA, ARC-Challenge) and three LLMs (Gemma 3 4B, Llama 3 8B, Qwen 3 8B, both base and instruct). For consistency-based UQ metrics—Dissimilarity, Eccentricity, Eigenvectors Dissimilarity, CoCoA—FSBS (with typical configuration $t$ 7, $t$ 8, $t$ 9) achieves higher Prediction–Rejection Ratio (PRR) than standard beam search (uniform weighting) or multinomial sampling of equal budget ( $p(v \mid y_{<t}, x)$ 0) in 23/24 experiments.

Summary performance on Gemma 3 4B base model (averaged):

Dissimilarity (multinomial): PRR = 0.630
Beam search (uniform): PRR ≈ 0.620
FSBS (prob-weighted): PRR = 0.650

ROC-AUC and PR-AUC reflect parallel gains. Variance of PRR estimates decreases ( $p(v \mid y_{<t}, x)$ 1). FSBS is robust to the similarity metric used (NLI-based entailment or RoBERTa-STS) and tuning hyperparameters ( $p(v \mid y_{<t}, x)$ 2, $p(v \mid y_{<t}, x)$ 3, temperature $p(v \mid y_{<t}, x)$ 4). Saturation occurs with $p(v \mid y_{<t}, x)$ 5 for open-ended QA and even $p(v \mid y_{<t}, x)$ 6 for multiple-choice. A normalization floor ( $p(v \mid y_{<t}, x)$ 7) before reweighting is recommended for probabilistic stability (Fadeeva et al., 10 Dec 2025).

5. Relationship to Deterministic Nucleus-Decoding Approaches

Prior to FSBS, deterministic nucleus-based beam search algorithms were introduced (p-exact search, dynamic beam search) (Shaham et al., 2021). These methods prune the candidate set at each step by a probability-mass threshold $p(v \mid y_{<t}, x)$ 8 (nucleus sampling) but operate deterministically, never drawing samples:

p-Exact search solves for the most probable sequence constrained to tokens in the $p(v \mid y_{<t}, x)$ 9-nucleus at every step, using Dijkstra’s algorithm over the induced pruned graph. This maintains optimality wrt the pruned token space.
Dynamic beam search adapts beam width to the entropy of the probability distribution, expanding when uncertainty is high and contracting when it is low. At each step, the minimal nucleus supporting mass $(M-F)$ 0 determines the next beam width.

Despite their probabilistic roots, these methods behave similarly to standard beam search in terms of BLEU and ROUGE metrics on translation and summarization—with differences less than 0.2 points—and do not outperform well-tuned small-beam baselines. The key insight from these studies is that probabilistic tail-pruning (as in nucleus sampling) can be inserted within deterministic beam-search frameworks without loss of fidelity, and that such mechanisms can be safely combined with forced sampling to construct FSBS (Shaham et al., 2021).

6. Design Implications and Extensions

The core insight of the FSBS paradigm is the seamless blending of deterministic and stochastic hypotheses: forced sampling provides guaranteed diversity, while beam search ensures probability mass coverage and stable candidate support. A plausible implication is that FSBS can be further extended:

By combining with nucleus pruning—e.g., sampling or expanding beams only within the top- $(M-F)$ 1 nucleus at each step.
By adopting diverse beam search for deterministic candidates, in concert with random forced sampling for others.
By tuning the ratio $(M-F)$ 2 adaptively according to model uncertainty or entropy.

Since trimming beams to a $(M-F)$ 3-nucleus is empirically harmless or slightly helpful, lightweight probabilistic pruning can be stacked atop FSBS mechanisms without detrimental quality impact, enabling further control over runtime and memory (Shaham et al., 2021).

7. Summary and Availability

FSBS is a practical, theoretically motivated extension of beam search for autoregressive LLM decoding and UQ that introduces forced stochasticity via sampled beam continuations. It delivers:

Higher support coverage versus multinomial sampling and standard beam search.
Lower output duplication, particularly with short-answer distributions.
Reduced estimator variance and more stable UQ measurements across random seeds and runs.
No increase in computational cost beyond standard beam search.

A reference implementation is provided in the extended LM-Polygraph toolkit (Fadeeva et al., 10 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search (2025)

What Do You Get When You Cross Beam Search with Nucleus Sampling? (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Force Sampling Beam Search (FSBS).