Causal Effect of Agentic Task Artifacts on Difficulty
Establish whether agentic task artifacts—specifically repository state, test patches, and solution patches—causally influence task difficulty in agentic coding benchmarks, as opposed to merely revealing latent information already present in the problem statement; determine the direction and magnitude of any such causal effects, for example by constructing counterfactual tasks that vary artifact properties while holding the problem statement fixed.
References
However, because our experiments relied on predictive modeling, we cannot conclude that agentic task artifacts have a causal effect on difficulty; we cannot distinguish whether they expose latent information that is already present in the problem statement, or if aspects of the artifacts like the thoroughness of the test patch inherently generate difficulty. A potential avenue for future work is to investigate a causal relation by constructing counterfactual tasks that have one aspect of these artifacts varied.