Foundation Model Transparency Index

Updated 3 July 2025

Foundation Model Transparency Index is a standardized framework that quantifies transparency in the entire lifecycle of AI foundation models using 100 indicators.
It evaluates aspects from data sourcing and model development to governance, enabling side-by-side benchmarking for accountability and regulatory review.
Empirical findings show that FMTI spurs improvements in disclosure practices, driving policy interventions and greater societal accountability.

The Foundation Model Transparency Index (FMTI) is a standardized, indicator-based framework designed to measure, benchmark, and catalyze transparency within the rapidly advancing ecosystem of foundation models. Developed in response to societal, regulatory, and scientific demands for greater accountability in artificial intelligence, the FMTI transforms transparency from an abstract ideal into a quantifiable, actionable, and policy-ready property. It enables systematic evaluation of how openly foundation model developers disclose information about the entire supply chain and lifecycle of their AI systems, thereby supporting evidence-based AI governance and improved societal outcomes.

1. Origin, Rationale, and Scope

Initiated by the Center for Research on Foundation Models at Stanford, the FMTI was designed to address growing concerns over the opacity of foundation models and their developmental pipelines. Foundation models—large-scale, general-purpose AI models underpinning a wide array of applications—are often constructed using multi-terabyte, heterogeneous data, enormous compute resources, and distributed labor. The lack of visibility into these processes creates risks at both technical and societal levels, including the undocumented propagation of harms, imbalanced market power, and insufficient means for external audit or regulatory oversight (Bommasani et al., 2023, Bommasani, 29 Jun 2025).

FMTI employs a composite, multi-dimensional approach, evaluating not just the end artifacts (models themselves), but also upstream resources (data, compute, labor), model characteristics and documentation, and downstream use, deployment, and impact. Its explicit aim is to provide a standardized, public measure enabling side-by-side comparison, cross-temporal tracking, and policy intervention.

2. Architecture: Indicators, Domains, and Scoring

The index comprises 100 fine-grained transparency indicators, structured hierarchically by domain and subdomain. The taxonomy reflects the end-to-end AI supply chain and associated organizational practices:

Domain	Example Indicators
Upstream Resources	Data sources, selection criteria, compute hardware, energy
Model Details	Model architecture, size, training process, documentation
Downstream/Deployment	Usage policies, distribution channels, impact, feedback
Governance	Safety audits, recourse mechanisms, evaluation procedures

Each indicator is scored for a developer’s flagship model (e.g., GPT-4, PaLM 2, Llama 2):

0: No disclosure
0.5: Partial or vague disclosure
1: Complete, high-quality public disclosure

Overall and domain scores are computed via simple averaging or weighted sums. For a developer $j$ ,

$\text{FMTI Score}_j = \frac{1}{N} \sum_{i=1}^N s_{ij}$

where $N$ is the number of indicators and $s_{ij}$ is the score on indicator $i$ for developer $j$ . The formula can be extended to set different weights per indicator.

3. Empirical Findings and Evolution (v1.0 to v1.1)

The inaugural FMTI in 2023 (v1.0) assessed 10 major foundation model developers, revealing that transparency was low and uneven: no developer scored above 65 out of 100, with a median of 37. Systemic opacity was especially acute in areas such as upstream data, compute usage, and downstream societal impact (Bommasani et al., 17 Jul 2024).

The v1.1 iteration six months later documented substantial improvement: average transparency increased to 58/100, with newer disclosures frequently spurred by the index itself and the competitive/peer pressure it generated. Developers published new transparency reports, directly referencing FMTI indicators. However, persistent opacity remained around sensitive areas: data composition, licensing, data labor practices, model architecture, and actual real-world impact (Bommasani et al., 17 Jul 2024).

Notably, the process of repeated, indicator-level measurement enabled both longitudinal tracking (temporal analysis of progress) and targeted intervention—regulators and advocacy groups can point to explicit gaps to direct future policy or transparency demands.

4. Regulatory and Policy Interface

FMTI’s structure aligns or overlaps with transparency requirements in emerging global AI policy frameworks, including the EU AI Act, US Executive Order on AI Safety, G7 Hiroshima Code of Conduct, and the proposed US Foundation Model Transparency Act. The majority of these policies address only subsets of FMTI indicators; for example, the EU AI Act overlaps with 30/100 indicators, while most others overlap with fewer than 15 (Bommasani et al., 26 Feb 2024).

Reporting against the FMTI can reduce compliance costs by pre-aligning developer documentation with policy-mandated requirements, enhancing cross-jurisdictional comparability, and lowering the barrier for regulatory review and market entry. Policymakers are encouraged to benchmark minimum transparency requirements using FMTI-style indexes, promote standardized disclosure templates, and make regulatory approvals or procurement contingent on sufficient FMTI-aligned reports (Bommasani et al., 17 Jul 2024, Bommasani et al., 26 Feb 2024).

5. Integration With Transparency Mechanisms and Industry Practice

The FMTI incentivizes and systematizes the adoption of a broad suite of transparency artifacts and tools:

Data Documentation: Data statements, datasheets for datasets, and membership inference artifacts (“Data Portraits” (Marone et al., 2023)) facilitate traceability and auditability of training corpora.
Executable and Machine-Readable Documentation: Incorporation of code, scripts, and data lineage for reproducible processing steps.
Transparency Reports: Comprehensive, indicator-linked reports are increasingly being published by developers, often mapping directly to FMTI domains.
Safety, Governance, and Use Policies: Standardization and disclosure of acceptable use policies, safety measures (e.g., PRISM frameworks for modular, independent safety in open-source models (Neumann et al., 14 Jun 2024)), and feedback or recourse mechanisms.
Benchmarking and Evaluation: Integration with model evaluation leaderboards and holistic evaluation frameworks (e.g., HELM).

Empirical findings indicate that structured benchmarking through FMTI encourages the release of accompanying transparency reports, catalyzes competition on transparency, and enhances the overall baseline of information accessible to researchers, civil society, and regulators (Bommasani, 29 Jun 2025).

6. Persistent Challenges and Areas of Opacity

Despite progress, the FMTI consistently reveals enduring deficits:

Upstream Data and Labor: Dataset composition, copyright/licensing, data labor wages, and curation procedures are rarely fully disclosed due to commercial sensitivity, privacy/IP concerns, and lack of standardization.
Model Architecture and Training: Detailed architectures and training regimens are often omitted.
Downstream Impact and Feedback: Information about real-world usage, societal risks, harm mitigation, and user/community recourse is especially rare (Bommasani et al., 2023, Worth et al., 5 Sep 2024).
Enforcement of Use Policies: Acceptable use policies (AUPs) are heterogeneous, and enforcement is inconsistently reported, making actual risk management and redressability difficult (Klyman, 29 Aug 2024).
System vs. Model Transparency: Documentation and evaluations often focus on models rather than deployed systems, neglecting composition, orchestration, safety mechanisms, and context-specific risks (Longpre et al., 24 Jun 2024, Basdevant et al., 17 May 2024).

These gaps are structurally resistant to simple voluntary disclosure and often require regulatory pressure or community-driven standards to overcome.

7. Methodological Developments and Future Directions

Ongoing work suggests several enhancements and new frontiers for FMTI:

Expansion Beyond Flagship Models: Broadening to cover a spectrum of AI systems, including smaller providers, sector-specific systems, and non-English/multimodal models (Worth et al., 5 Sep 2024).
Model vs. System-Level Assessment: Distinguishing between transparency in isolated models vs. fully deployed AI systems, accounting for all layers—data, code, weights, interfaces, infrastructure, and safety controls (Basdevant et al., 17 May 2024).
Multidimensional and Machine-Readable Indices: Movement towards machine-readable, componentized reporting (e.g., “Leaderboard Bill of Materials” in leaderboard infrastructure (Zhao et al., 4 Jul 2024)) and automated documentation frameworks.
Integration of Stakeholder Perspectives: Tailoring indicators and reporting to diverse audiences, including domain experts, regulators, deployers, and affected communities.
Empirical and Theoretical Validation: Incorporating interpretable, theory-grounded assessment of model generalization, expressiveness, and ethical risk, as surveyed in emerging work (Fu et al., 15 Oct 2024).
Policy Impact and Scientific Governance: Employing indexes as operational “public goods” and infrastructure for robust regulatory science and democratic oversight (Bommasani, 29 Jun 2025).

Aspect	Description/Status
Domains	Upstream, Model, Downstream, Governance
Indicators (v1.1)	100 (binary/ordinal)
Average Transparency	0.37 (all developers, v1.1)
Most Opaque Areas	Data composition/licensing, labor, model details, downstream impact
Impact	Benchmarks transparency, catalyzes reporting/improvement, policy
Policy Alignment	Overlaps with EU AI Act, US/EU policy, G7, etc.
Notable Improvements	Post-v1.0, new transparency reports and increased indicator coverage

The Foundation Model Transparency Index serves as both a scientific instrument and a catalyst for change—defining, measuring, and promoting transparency across critical dimensions of foundation model development and deployment. Its adoption in both policy and practice represents a shift towards evidence-based AI governance and more accountable, robust, and socially beneficial foundation model ecosystems.