Find Open Problems
Find Open Problems
Search for open problems in our database
Submit a Problem
Submit a new open problem to our database (not available yet)
Dice Question Streamline Icon: https://streamlinehq.com

Does the time-budget performance trend extend to simpler tasks?

Determine whether the reduced performance gains from increased time budgets observed for complex AI R&D tasks also hold for simpler, long-running tasks, in order to isolate and assess long-term coherence capabilities independently of task complexity.

References

METR's investigation focused on very complex tasks (specifically, AI R{content}D), but it is not clear if this trend holds for more simple tasks.

Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents (Backlund et al., 20 Feb 2025) in Introduction