Systematic Exploration of Admissible Hypothesis Sets by LLMs under Underdetermination
Determine whether large language models can systematically explore the full set of admissible explanatory hypotheses consistent with a fixed set of observations in controlled underdetermination settings, by generating multiple non-redundant, valid hypotheses rather than converging on a small subset or a single answer.
References
Yet contemporary LLM evaluations largely reward one-shot correctness~\citep{shojaee2025llm,koblischke2025gravity,shojaee2024llm,wang2024mmlu,coignion2024performance,hendrycks2020measuring}, leaving open whether models can systematically explore sets of valid explanations under controlled underdetermination.
— HypoSpace: Evaluating LLM Creativity as Set-Valued Hypothesis Generators under Underdetermination
(2510.15614 - Chen et al., 17 Oct 2025) in Section 1 (Introduction), Page 1