Dice Question Streamline Icon: https://streamlinehq.com

Empirically estimate data’s contribution in AI production functions

Establish the functional form by which data enters AI production (e.g., Cobb–Douglas or CES), derive the marginal product of data and its elasticities with compute and labor, and empirically estimate these parameters using firm-level training datasets, model performance, and revenue outcomes, complemented by controlled and natural experiments when firm-level data is unavailable.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper argues data should be modeled explicitly as a factor in production functions, distinct from capital, labor, and technology due to nonrivalry and emergent rivalry via contamination. However, it does not specify the functional form and emphasizes the need for empirical estimation of marginal products and elasticities.

Section 6 proposes combining controlled experiments, natural experiments, and evidence on firms’ data investments to isolate causal effects and quantify returns to scale and complementarities.

References

Building on these foundations, we outline four open research problems foundational to data economics: measuring context-dependent value, balancing governance with privacy, estimating data's contribution to production, and designing mechanisms for heterogeneous, compositional goods.

The Economics of AI Training Data: A Research Agenda (2510.24990 - Oderinwale et al., 28 Oct 2025) in Abstract