Empirically estimate data’s contribution in AI production functions
Establish the functional form by which data enters AI production (e.g., Cobb–Douglas or CES), derive the marginal product of data and its elasticities with compute and labor, and empirically estimate these parameters using firm-level training datasets, model performance, and revenue outcomes, complemented by controlled and natural experiments when firm-level data is unavailable.
References
Building on these foundations, we outline four open research problems foundational to data economics: measuring context-dependent value, balancing governance with privacy, estimating data's contribution to production, and designing mechanisms for heterogeneous, compositional goods.
— The Economics of AI Training Data: A Research Agenda
(2510.24990 - Oderinwale et al., 28 Oct 2025) in Abstract