Benchmarks to quantify cost–accuracy frontiers for surrogate training
Develop open benchmarks that pair instrumented corpora with surrogate models to quantify when training surrogates amortizes the generation cost of instrumented data and to characterize the resulting cost–accuracy trade-offs.
References
Nine open questions will determine whether instrumented data matures into a recognised substrate for scientific machine learning. Cost--accuracy frontiers. Open benchmarks pairing instrumented corpora with surrogates are needed to quantify when surrogate training amortises the substrate's cost.
— Instrumented data for causal scientific machine learning
(2606.07865 - Wilke, 5 Jun 2026) in Section 7, Methodological questions for the community, Item 5