Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling (2402.19471v2)

Published 29 Feb 2024 in cs.CL and cs.AI

Abstract: Questions combine our mastery of language with our remarkable facility for reasoning about uncertainty. How do people navigate vast hypothesis spaces to pose informative questions given limited cognitive resources? We study these tradeoffs in a classic grounded question-asking task based on the board game Battleship. Our language-informed program sampling (LIPS) model uses LLMs to generate natural language questions, translate them into symbolic programs, and evaluate their expected information gain. We find that with a surprisingly modest resource budget, this simple Monte Carlo optimization strategy yields informative questions that mirror human performance across varied Battleship board scenarios. In contrast, LLM-only baselines struggle to ground questions in the board state; notably, GPT-4V provides no improvement over non-visual baselines. Our results illustrate how Bayesian models of question-asking can leverage the statistics of language to capture human priors, while highlighting some shortcomings of pure LLMs as grounded reasoners.

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that combining LLMs with Bayesian models achieves near-human performance in generating informative Battleship questions.
The LIPS framework translates natural language questions into symbolic programs to compute expected information gain and optimize sampling.
Results indicate that textual board state inputs modestly boost efficiency, while visual representations do not significantly enhance performance.

Modeling Question Generation in a Grounded Task Using Language-Informed Program Sampling

Introduction

Understanding how humans generate informative questions within a constrained environment, like a game, presents a unique challenge at the intersection of cognitive science and artificial intelligence. This paper explores the efficiency of question generation in the classic board game Battleship using a novel framework named Language-Informed Program Sampling (LIPS). Here, the primary aim is to translate natural language questions into symbolic programs, enabling the computation of expected information gain (EIG) to ascertain the informativeness of questions.

Models and Experimentation

The paper introduces a dual-role model where LLMs serve as both a distribution over potential questions and as a mechanism for translating questions from natural language to a "language of thought" (LoT). This process effectively leverages the statistical properties of language to model human-like question generation. In particular, the models employ a probabilistic context-free grammar (PCFG) and two LLMs (GPT-4 and CodeLlama-7b) to generate initial question sets, which are then filtered to identify those that maximize EIG.

The LIPS model operates by sampling a subset of k questions, translating them into symbolic representations, and computing their informativeness through simulation. This sampling strategy controls the model's computational effort, allowing an investigation into how computational constraints affect question generation's efficiency and effectiveness.

Results

The paper's findings highlight the capability of LLMs, combined with Bayesian models, to approximate human-like performance in generating informative questions within the Battleship game context. Notably, even with a modest value of k, the models were able to reach near-human performance levels, suggesting that a relatively simple sampling-based approach can significantly capture the essence of human question-asking strategies. However, the models demonstrated some limitations related to grounding and producing redundant or uninformative questions.

Experimental results indicated that while textual representations of the board state modestly improved the efficiency of question generation, visual representations did not result in significant performance gains. This outcome implies challenges in LLMs' ability to extract and utilize structured visual information effectively. Furthermore, comparisons between models based on PCFG and LLM-provided priors revealed nuanced differences in the types of questions generated, underscoring the influence of diverse modeling approaches on question generation's inductive biases.

Theoretical and Practical Implications

Analyzing human-like question generation through the lens of LIPS offers several theoretical insights into how language and computational reasoning intertwine in information-seeking behaviors. Practically, this research could inform the development of AI systems capable of engaging in more human-like dialogues, particularly in educational, gaming, and interactive information retrieval systems.

Future Directions

The paper points towards several potential paths for future research, including exploring more sophisticated inference techniques for question selection and investigating the integration of multimodal data sources to enhance grounding capabilities. Additionally, applying the LIPS framework to more complex, multi-turn interaction contexts could further unveil the nuances of human question-asking strategies.

Conclusion

This paper contributes to our understanding of question generation in grounded tasks by introducing and evaluating the LIPS framework. The results underscore the potential of combining Bayesian models with LLMs to closely model human-like question generation, while also highlighting areas for future improvement and exploration.

PDF Markdown

Related Papers

Tweets

https://twitter.com/gabe_grand/status/1778030692774883702