Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 54 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 105 tok/s Pro
Kimi K2 182 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4 40 tok/s Pro
2000 character limit reached

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design (2508.21184v1)

Published 28 Aug 2025 in cs.CL, cs.AI, and stat.ML

Abstract: We propose a general-purpose approach for improving the ability of LLMs to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with LLMs), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated in a principled way using a probabilistic model derived from the LLM's belief distribution and provide detailed insights into key decisions in its construction. Further key to the success of BED-LLM are a number of specific innovations, such as a carefully designed estimator for the EIG, not solely relying on in-context updates for conditioning on previous responses, and a targeted strategy for proposing candidate queries. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20-questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces BED-LLM, which maximizes expected information gain using sequential Bayesian experimental design combined with LLMs for adaptive multi-turn querying.
  • It employs a joint probabilistic model and a sample–then–filter belief update to maintain historical consistency and mitigate mode collapse.
  • Experimental results show BED-LLM significantly outperforms baselines in 20–Questions and user preference elicitation tasks, achieving up to 95% success rates.

BED-LLM: Intelligent Information Gathering with LLMs and Bayesian Experimental Design

Introduction and Motivation

The paper introduces BED-LLM, a principled framework for enabling LLMs to perform adaptive, multi-turn information gathering by leveraging sequential Bayesian experimental design (BED). The motivation stems from the observation that while LLMs can generate coherent questions in one-shot settings, they often fail to adaptively tailor queries based on previously collected responses, limiting their effectiveness in interactive tasks such as multi-step guessing games, user preference elicitation, and iterative tool use. BED-LLM addresses this by formulating information gathering as a sequential experimental design problem, iteratively selecting queries that maximize expected information gain (EIG) about a latent target θ\theta.

Sequential Bayesian Experimental Design for LLMs

BED-LLM operationalizes sequential BED in the LLM context by constructing a joint probabilistic model over hypotheses θ\theta and outcomes yy given queries xx. The framework iterates over four key steps:

  1. Candidate Question Generation: Propose a diverse set of candidate questions using the LLM, either via unconstrained sampling or conditional generation based on a pool of candidate hypotheses.
  2. EIG Estimation: For each candidate, estimate the EIG using a Rao-Blackwellized Monte Carlo estimator, leveraging the LLM's logits for entropy calculations.
  3. Optimal Question Selection: Select and pose the question with maximal EIG, observe the response, and update the history.
  4. Belief State Update: Update the joint model by filtering sampled hypotheses for compatibility with the observed history, mitigating mode collapse and ensuring historical coherence.

This approach is distinguished by its use of a prior–likelihood pairing for joint model construction, careful filtering of hypotheses, and explicit avoidance of predictive entropy as a proxy for EIG.

Model Construction and Updating

A central technical challenge is constructing a faithful joint model from the LLM. The prior–likelihood pairing (p(θ)p(y[θ,x])p(\theta)p(y|[\theta,x])) is preferred when the hypothesis space θ\theta is more complex than the outcome space yy, as is typical in open-ended tasks. Belief updating is performed via a sample–then–filter strategy: hypotheses are sampled from the in-context LLM distribution and rejected if incompatible with the history, with diversity encouraged through high-temperature sampling and batch generation. This approach avoids the computational intractability of full Bayesian updates and the inadequacy of naive in-context learning, which often fails to enforce historical consistency.

EIG Estimation and the Pitfalls of Predictive Entropy

The paper rigorously demonstrates that predictive entropy is not a suitable surrogate for EIG in adaptive information gathering. The expected conditional entropy of the likelihood varies across questions, and omitting it leads to selection of ambiguous or uninformative queries. The Rao-Blackwellized estimator directly utilizes LLM logits for entropy calculations, yielding lower variance and more faithful information gain estimates. Figure 1

Figure 1: Success rate against turn for the 20--Questions game. Error bars represent the standard error of the mean estimate across 100 target entities in each dataset. The same model used as both questioner and answerer.

Experimental Evaluation

BED-LLM is evaluated on two tasks: the 20–Questions game and active user preference elicitation. In 20–Questions, the hypothesis space is unconstrained, and the questioner must identify a hidden target entity via binary queries. BED-LLM consistently outperforms both the Entropy baseline (predictive entropy maximization) and Naive QA (direct LLM question generation), achieving success rates up to 95% (Mistral-Large, Animals dataset), compared to 14–45% for Naive QA and 43–88% for Entropy. The advantage of BED-LLM becomes pronounced after several turns, as the effect of principled EIG-driven question selection accumulates. Figure 2

Figure 2: Success rates for the 20--Questions game with different models: the questioner Q and answerer A.

Robustness to model misspecification is demonstrated by using different LLMs for the questioner and answerer, with BED-LLM maintaining superior performance relative to baselines.

In active preference elicitation, where θ\theta is a free-form user persona, BED-LLM again outperforms baselines in recommending films aligned with user preferences, as measured by LLM-as-judge ratings. The framework is shown to be robust to both model misspecification and the increased complexity of the hypothesis space. Figure 3

Figure 3

Figure 3

Figure 3: Qwen2.5-72B model performance in active preference elicitation.

Figure 4

Figure 4

Figure 4: Cross-model evaluation: Qwen2.5-72B as questioner, GPT-4o-mini as answerer.

Ablations and Theoretical Insights

Ablation studies confirm the superiority of the prior–likelihood pairing over data–estimation pairings, especially in large or unbounded hypothesis spaces. The sample–then–filter updating procedure is shown to be critical for maintaining historical consistency and avoiding premature mode collapse. Theoretical analysis clarifies that the choice of joint model factorization should be guided by the relative complexity of θ\theta and yy, and by the intended belief updating procedure. Figure 5

Figure 5: The average information gains for each strategy at each turn for the Qwen2.5-72B vs Qwen2.5-72B example. Initial questions have similar information gains, but EIG gains become much larger in later turns, explaining the superior performance of BED-LLM.

Implications and Future Directions

BED-LLM demonstrates that principled, information-theoretic strategies can substantially improve the adaptive information gathering capabilities of LLMs, enabling more effective multi-turn conversational agents and interactive systems. The framework is broadly applicable to domains such as medical diagnostics, personalized tutoring, automated surveys, and scientific discovery, where adaptive querying is essential. The results highlight the importance of careful joint model construction, robust belief updating, and faithful EIG estimation.

Future work may explore policy-based BED approaches for non-myopic planning, integration with external simulators for scientific applications, and extensions to richer interaction modalities (e.g., document retrieval, function calling). Further research into improving LLM uncertainty quantification and belief state extraction will enhance the reliability and applicability of BED-LLM in real-world deployments.

Conclusion

BED-LLM provides a rigorous, general-purpose framework for adaptive information gathering with LLMs, grounded in sequential Bayesian experimental design. By maximizing expected information gain through principled joint model construction and robust belief updating, BED-LLM achieves substantial gains over existing baselines in both deductive reasoning and user preference elicitation tasks. The approach sets a new standard for interactive intelligence in LLMs and opens avenues for future research in adaptive, multi-turn AI systems.

Youtube Logo Streamline Icon: https://streamlinehq.com