Reliability of Agentic LLMs in Physics-Governed Planning Domains
Determine whether current agentic Large Language Model systems can reliably operate in complex real-world planning domains governed by physical laws, establishing their robustness and effectiveness under strict physical constraints and long-horizon decision-making requirements.
References
Consequently, it remains unclear whether current agentic systems can reliably operate in complex real-world planning domains governed by physical laws.
— AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems
(2601.11354 - Wang et al., 16 Jan 2026) in Section 1 (Introduction)