Forecasting the Persistence of LLM Failures on MAPF
Ascertain the duration over which large language models will continue to fail on challenging multi-agent path finding scenarios, given observed failure modes on room and maze maps and the rapid evolution of model capabilities.
References
Because LLMs are evolving rapidly, it is unclear how long LLMs will still fail.
— Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet
(2401.03630 - Chen et al., 8 Jan 2024) in Section 4 Cause of Failures