Force Sampling Beam Search (FSBS)
- FSBS is a hybrid decoding algorithm that integrates deterministic beam search with forced sampling to mitigate duplication and broaden output coverage.
- It forces a fixed number of stochastic samples per decoding step, ensuring a balance between high-probability branches and diverse hypothesis generation.
- Empirical evaluations show FSBS reduces estimator variance and improves uncertainty quantification metrics compared to standard methods.
Force Sampling Beam Search (FSBS) is a hybrid decoding algorithm that integrates the rigorous, deterministic exploration of beam search with injected stochasticity from sampling at controlled points within the beam expansion. FSBS was developed as a response to the problem of duplication, low coverage of model output space, and high variance in uncertainty quantification (UQ) when relying on pure multinomial sampling. FSBS is used in LLM generation and uncertainty estimation, where diverse, faithful hypotheses are required without sacrificing the mass coverage and reproducibility typically associated with beam search (Fadeeva et al., 10 Dec 2025).
1. Formulation and Algorithmic Procedure
FSBS prescribes beam search with explicit mechanisms to force a fixed number of sampled continuations into the beam at each generation step. Two integers define the process: the total beam width and the number of forced samples . At every decoding step , FSBS:
- Computes the next-token distribution .
- Expands beams deterministically using the highest-probability next tokens (as standard beam search).
- Expands the remaining beams by drawing i.i.d. samples from , optionally with temperature scaling.
- From all hypotheses, re-ranks by sequence probability and retains the top for the next step.
The formal procedure is expressed as follows:
- Let the final FSBS beam set be , with model-assigned probabilities 0.
- Importance weights for candidates are given by
1
- These are used for UQ estimation.
FSBS pseudocode instantiates these steps, with deterministic expansions from the highest-probability branches and explicit forced sampling for a user-specified fraction of the beam (Fadeeva et al., 10 Dec 2025).
2. Theoretical Properties and Comparison to Standard Approaches
FSBS provides a middle ground between deterministic beam search and fully stochastic multinomial sampling:
- Coverage and Diversity: By construction, FSBS provides at least 2 distinct, stochastically-sampled candidates per step, increasing coverage of the output distribution and systematically reducing duplicates. Pure sampling is prone to high duplication under peaked distributions; FSBS's beam structure remedies this.
- Variance and Mean-Squared Error: The estimator variance for UQ using FSBS is strictly lower than multinomial sampling when the total probability mass 3 covered by the beams is large:
4
A distribution-free sufficient condition is 5.
- Bias–Variance Tradeoff: FSBS admits a small bias, determined by the mass and distributional difference between covered (6) and uncovered hypotheses, but gains a substantial reduction in estimator variance and increases reliability of output sets.
- Computational Cost: The per-step cost remains that of standard beam search, as forced-sampled expansions simply substitute for deterministic expansions; no extra forward passes are required (Fadeeva et al., 10 Dec 2025).
Table: Key Differences Among Decoding Strategies
| Method | Stochasticity Injected | Candidate Diversity | Mass Coverage Control |
|---|---|---|---|
| Beam Search | None | Low–Moderate | High (top 7) |
| Multinomial Sampling | All candidates | Low (duplicative) | Varies (depends on 8) |
| FSBS | Partial (9 out of 0 per step) | High (guaranteed 1 distinct samples) | Consistently high with ranking |
FSBS’s construction addresses the duplication and poor support coverage endemic to multinomial sampling while retaining beam search’s interpretable ranking by probability (Fadeeva et al., 10 Dec 2025).
3. Mathematical Framework for Uncertainty Quantification
For an answer 2 (reference or produced), FSBS underpins UQ estimation using semantic-similarity–based metrics (e.g., NLI or STS cross-encoders):
3
where 4 is a semantic similarity score. The FSBS estimator is
5
with 6 as above.
The reduction in variance arises because FSBS deterministically explores high-probability beams while ensuring that sampled continuations add randomness only among lower-probability candidates, stabilizing the set of proposals and their weights across runs (Fadeeva et al., 10 Dec 2025).
4. Empirical Evaluation and Implementation Considerations
FSBS has been evaluated on six short-form QA datasets (TriviaQA, WebQuestions, CoQA, HotpotQA, CommonSenseQA, ARC-Challenge) and three LLMs (Gemma 3 4B, Llama 3 8B, Qwen 3 8B, both base and instruct). For consistency-based UQ metrics—Dissimilarity, Eccentricity, Eigenvectors Dissimilarity, CoCoA—FSBS (with typical configuration 7, 8, 9) achieves higher Prediction–Rejection Ratio (PRR) than standard beam search (uniform weighting) or multinomial sampling of equal budget (0) in 23/24 experiments.
Summary performance on Gemma 3 4B base model (averaged):
- Dissimilarity (multinomial): PRR = 0.630
- Beam search (uniform): PRR ≈ 0.620
- FSBS (prob-weighted): PRR = 0.650
ROC-AUC and PR-AUC reflect parallel gains. Variance of PRR estimates decreases (1). FSBS is robust to the similarity metric used (NLI-based entailment or RoBERTa-STS) and tuning hyperparameters (2, 3, temperature 4). Saturation occurs with 5 for open-ended QA and even 6 for multiple-choice. A normalization floor (7) before reweighting is recommended for probabilistic stability (Fadeeva et al., 10 Dec 2025).
5. Relationship to Deterministic Nucleus-Decoding Approaches
Prior to FSBS, deterministic nucleus-based beam search algorithms were introduced (p-exact search, dynamic beam search) (Shaham et al., 2021). These methods prune the candidate set at each step by a probability-mass threshold 8 (nucleus sampling) but operate deterministically, never drawing samples:
- p-Exact search solves for the most probable sequence constrained to tokens in the 9-nucleus at every step, using Dijkstra’s algorithm over the induced pruned graph. This maintains optimality wrt the pruned token space.
- Dynamic beam search adapts beam width to the entropy of the probability distribution, expanding when uncertainty is high and contracting when it is low. At each step, the minimal nucleus supporting mass 0 determines the next beam width.
Despite their probabilistic roots, these methods behave similarly to standard beam search in terms of BLEU and ROUGE metrics on translation and summarization—with differences less than 0.2 points—and do not outperform well-tuned small-beam baselines. The key insight from these studies is that probabilistic tail-pruning (as in nucleus sampling) can be inserted within deterministic beam-search frameworks without loss of fidelity, and that such mechanisms can be safely combined with forced sampling to construct FSBS (Shaham et al., 2021).
6. Design Implications and Extensions
The core insight of the FSBS paradigm is the seamless blending of deterministic and stochastic hypotheses: forced sampling provides guaranteed diversity, while beam search ensures probability mass coverage and stable candidate support. A plausible implication is that FSBS can be further extended:
- By combining with nucleus pruning—e.g., sampling or expanding beams only within the top-1 nucleus at each step.
- By adopting diverse beam search for deterministic candidates, in concert with random forced sampling for others.
- By tuning the ratio 2 adaptively according to model uncertainty or entropy.
Since trimming beams to a 3-nucleus is empirically harmless or slightly helpful, lightweight probabilistic pruning can be stacked atop FSBS mechanisms without detrimental quality impact, enabling further control over runtime and memory (Shaham et al., 2021).
7. Summary and Availability
FSBS is a practical, theoretically motivated extension of beam search for autoregressive LLM decoding and UQ that introduces forced stochasticity via sampled beam continuations. It delivers:
- Higher support coverage versus multinomial sampling and standard beam search.
- Lower output duplication, particularly with short-answer distributions.
- Reduced estimator variance and more stable UQ measurements across random seeds and runs.
- No increase in computational cost beyond standard beam search.
A reference implementation is provided in the extended LM-Polygraph toolkit (Fadeeva et al., 10 Dec 2025).