Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

The Societal Impact of Foundation Models: Advancing Evidence-based AI Policy (2506.23123v1)

Published 29 Jun 2025 in cs.AI, cs.CY, and cs.ET

Abstract: Artificial intelligence is humanity's most promising technology because of the remarkable capabilities offered by foundation models. Yet, the same technology brings confusion and consternation: foundation models are poorly understood and they may precipitate a wide array of harms. This dissertation explains how technology and society coevolve in the age of AI, organized around three themes. First, the conceptual framing: the capabilities, risks, and the supply chain that grounds foundation models in the broader economy. Second, the empirical insights that enrich the conceptual foundations: transparency created via evaluations at the model level and indexes at the organization level. Finally, the transition from understanding to action: superior understanding of the societal impact of foundation models advances evidence-based AI policy. View together, this dissertation makes inroads into achieving better societal outcomes in the age of AI by building the scientific foundations and research-policy interface required for better AI governance.

Summary

  • The paper establishes foundation models as a distinct paradigm by linking emergent capabilities with algorithmic homogeneity and associated risks.
  • The paper introduces the HELM evaluation platform and FMTI composite index to assess performance, transparency, fairness, and systemic opacities.
  • The paper demonstrates how empirical methods inform evidence-based AI policy, influencing key US, EU, and G7 regulatory initiatives.

The Societal Impact of Foundation Models: Advancing Evidence-based AI Policy

This dissertation provides a comprehensive, multi-layered analysis of the societal impact of foundation models, with a particular focus on the intersection of technical developments, empirical transparency, and the evolution of AI policy. The work is organized around three central themes: conceptual framing of foundation models, empirical methods for transparency and evaluation, and the translation of research insights into evidence-based policy.

Conceptual Framing: Foundation Models as a Technological Paradigm

The dissertation positions foundation models as a distinct technological paradigm, characterized by the dual axes of emergence and homogeneity. Emergence refers to the appearance of qualitatively new capabilities as models are scaled, while homogeneity denotes the consolidation of methods and architectures, leading to algorithmic monoculture. This framing is supported by a historical analysis of AI paradigms, tracing the evolution from classical AI to machine learning, deep learning, and finally to foundation models.

The author’s definition of foundation models—large-scale, self-supervised models trained on broad data and adaptable to a wide range of downstream tasks—has been widely adopted in both academic and policy contexts. The work highlights the risks associated with the paradigm, including concentration of power, systemic single points of failure, and the unpredictable emergence of new capabilities and risks.

Empirical Transparency: Evaluation Platforms and Composite Indexes

A major contribution of the dissertation is the development of empirical methods to increase transparency in the foundation model ecosystem. Two complementary approaches are advanced:

  1. Model-level Evaluation Platforms (HELM): The Holistic Evaluation of LLMs (HELM) platform is introduced as the first large-scale, third-party evaluation suite for foundation models. HELM systematically evaluates models across a broad set of scenarios and metrics, including accuracy, robustness, fairness, calibration, efficiency, and generative harms. The platform’s design emphasizes coverage, context-sensitivity, and standardization, enabling meaningful comparison across models and surfacing trade-offs between different performance dimensions.

Key findings from HELM include: - Strong correlation between accuracy, robustness, and fairness, but weak or scenario-dependent relationships with calibration and generative harms. - Significant variance in model performance across tasks and adaptation strategies, highlighting the brittleness of prompt-based adaptation. - Open models, while generally less performant than closed models on certain benchmarks, are competitive in many scenarios and offer greater transparency.

  1. Organization-level Composite Indexes (FMTI): The Foundation Model Transparency Index (FMTI) is introduced as a novel application of composite indices to AI, inspired by methodologies from economics and social science. FMTI scores major foundation model developers on 100 indicators spanning upstream (data, labor, compute), model-level (capabilities, risks, mitigations), and downstream (distribution, usage policy, impact) transparency.

The 2023 and 2024 FMTI results reveal: - Systemic opacity across the industry, especially regarding upstream data, labor, and compute. - Open model developers consistently outperform closed developers on transparency, particularly in upstream domains. - Most transparency indicators are feasible, as evidenced by at least one developer achieving each, suggesting that industry-wide improvement is attainable. - The process of index-based assessment itself catalyzes new disclosures and improved practices among developers.

Policy Interface: Evidence-based AI Policy

The dissertation advances a vision for evidence-based AI policy, arguing for a research-policy interface that is both scientifically rigorous and responsive to societal needs. The author demonstrates how systematic evidence review and the development of bespoke empirical tools can directly inform policy decisions, such as legislative processes in the US and EU.

Notable policy impacts include:

  • Uptake of the foundation model terminology and conceptual framing in major US and EU policy instruments (e.g., Executive Order 14110, EU AI Act).
  • Influence of HELM and FMTI on regulatory initiatives, including the UK Competition and Markets Authority’s market surveillance and the G7 International Code of Conduct.
  • Direct engagement with policymakers to clarify the implications of emergent capabilities, algorithmic monoculture, and supply chain transparency.

The work also critically examines the limitations of transparency as a policy tool, noting risks of transparency-washing, gamification, and the need for complementary mechanisms such as independent audits and post-deployment monitoring.

Implications and Future Directions

Practical Implications:

  • The empirical tools and frameworks developed are directly applicable for organizations seeking to benchmark, audit, or improve the transparency and accountability of their AI systems.
  • The composite index methodology provides a scalable template for tracking and incentivizing responsible practices across the AI industry.
  • The evaluation platform approach can be extended to new modalities, languages, and deployment contexts, supporting both internal governance and external regulatory compliance.

Theoretical Implications:

  • The paradigm analysis foregrounds the need for sociotechnical approaches in AI research, integrating technical, organizational, and policy perspectives.
  • The work challenges the field to move beyond narrow, task-centric evaluation toward holistic, ecosystem-level analysis.
  • The findings on emergence and homogeneity raise open questions about the predictability of AI capabilities and the systemic risks of monoculture.

Speculation on Future Developments:

  • As foundation models become further entrenched as public infrastructure, the demand for independent, proactive evaluation and transparency will intensify.
  • The interplay between open and closed development models will continue to shape the competitive and regulatory landscape, with open models driving transparency but also raising new challenges for monitoring and control.
  • The integration of empirical evidence into policy processes will become increasingly institutionalized, with composite indices and evaluation platforms serving as standard tools for governance.

Conclusion

This dissertation establishes a robust foundation for understanding and governing the societal impact of foundation models. By bridging conceptual analysis, empirical transparency, and policy engagement, it offers a model for how AI research can meaningfully inform public outcomes. The work demonstrates that academia, through multidisciplinary collaboration and proactive engagement, can play a central role in shaping the trajectory of AI for the public good.

X Twitter Logo Streamline Icon: https://streamlinehq.com