Identify inputs and procedures used to train frontier foundation models

Identify and document the training datasets, preprocessing, and training procedures employed in frontier foundation models developed by private laboratories, to enable independent evaluation, replication, and governance of such systems.

Background

The authors highlight that due to secrecy in commercial AI labs, external stakeholders lack knowledge of the specific inputs and processes used to train frontier models. This opacity undermines efforts to evaluate safety and performance and complicates the design of appropriate regulatory regimes, making transparency about training inputs a key unresolved need.

References

And because frontier models are now trained with considerable secrecy within private labs, we don't even know what goes into such models.

— An Economy of AI Agents (2509.01063 - Hadfield et al., 1 Sep 2025) in Institutions for AI agents, Subsection “Rethinking the legal boundaries of the corporation”

Identify inputs and procedures used to train frontier foundation models

Background

References

Related Problems