Does the time-budget performance trend extend to simpler tasks?

Determine whether the reduced performance gains from increased time budgets observed for complex AI R&D tasks also hold for simpler, long-running tasks, in order to isolate and assess long-term coherence capabilities independently of task complexity.

References

METR's investigation focused on very complex tasks (specifically, AI R{content}D), but it is not clear if this trend holds for more simple tasks.

— Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents (Backlund et al., 20 Feb 2025) in Introduction

Does the time-budget performance trend extend to simpler tasks?

References

Related Problems