Dice Question Streamline Icon: https://streamlinehq.com

Human Effort and Resource Expenditure in Kaggle Competitions Underlying MLE-Bench Comparisons

Determine the amount of time and computational resources expended by human participants on the Kaggle competitions from which MLE-Bench tasks are sourced, to enable cost- and resource-matched comparisons between AI systems and human performance.

Information Square Streamline Icon: https://streamlinehq.com

Background

The authors note that MLE-Bench reports metrics tied to percentiles of human performance on Kaggle competitions, but direct cost-competitive comparisons with AI agents are hindered by unknown human effort and resource usage. Teams often work over months with significant compute resources, complicating fair comparisons.

Quantifying human time and resources would allow equitable benchmarking and clearer interpretation of agent results relative to human effort invested in these competitions.

References

A more significant issue is that the amount of time and resources that are spent by humans on the competitions is unknown, so it's impossible to compare AI systems with the same amount of resources.

HCAST: Human-Calibrated Autonomy Software Tasks (2503.17354 - Rein et al., 21 Mar 2025) in Appendix, Section Related Work