Forecasting the Persistence of LLM Failures on MAPF

Ascertain the duration over which large language models will continue to fail on challenging multi-agent path finding scenarios, given observed failure modes on room and maze maps and the rapid evolution of model capabilities.

Background

Empirical results show LLMs succeed in simple, obstacle-free environments but fail in more complex maps, often due to oscillations and long detours. Despite rapid LLM progress, the authors emphasize uncertainty about when such failures will abate, raising a question of how long current and near-future models will remain inadequate for MAPF.

This open question invites systematic tracking and analysis of capability shifts in LLM generations and the identification of metrics and benchmarks that can signal when core MAPF failure modes have been overcome.

References

Because LLMs are evolving rapidly, it is unclear how long LLMs will still fail.

— Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet (2401.03630 - Chen et al., 8 Jan 2024) in Section 4 Cause of Failures

Forecasting the Persistence of LLM Failures on MAPF

Sponsor

Background

References

Related Problems