Foundation Model Transparency Index
- Foundation Model Transparency Index is a standardized framework that quantifies transparency in the entire lifecycle of AI foundation models using 100 indicators.
- It evaluates aspects from data sourcing and model development to governance, enabling side-by-side benchmarking for accountability and regulatory review.
- Empirical findings show that FMTI spurs improvements in disclosure practices, driving policy interventions and greater societal accountability.
The Foundation Model Transparency Index (FMTI) is a standardized, indicator-based framework designed to measure, benchmark, and catalyze transparency within the rapidly advancing ecosystem of foundation models. Developed in response to societal, regulatory, and scientific demands for greater accountability in artificial intelligence, the FMTI transforms transparency from an abstract ideal into a quantifiable, actionable, and policy-ready property. It enables systematic evaluation of how openly foundation model developers disclose information about the entire supply chain and lifecycle of their AI systems, thereby supporting evidence-based AI governance and improved societal outcomes.
1. Origin, Rationale, and Scope
Initiated by the Center for Research on Foundation Models at Stanford, the FMTI was designed to address growing concerns over the opacity of foundation models and their developmental pipelines. Foundation models—large-scale, general-purpose AI models underpinning a wide array of applications—are often constructed using multi-terabyte, heterogeneous data, enormous compute resources, and distributed labor. The lack of visibility into these processes creates risks at both technical and societal levels, including the undocumented propagation of harms, imbalanced market power, and insufficient means for external audit or regulatory oversight (2310.12941, 2506.23123).
FMTI employs a composite, multi-dimensional approach, evaluating not just the end artifacts (models themselves), but also upstream resources (data, compute, labor), model characteristics and documentation, and downstream use, deployment, and impact. Its explicit aim is to provide a standardized, public measure enabling side-by-side comparison, cross-temporal tracking, and policy intervention.
2. Architecture: Indicators, Domains, and Scoring
The index comprises 100 fine-grained transparency indicators, structured hierarchically by domain and subdomain. The taxonomy reflects the end-to-end AI supply chain and associated organizational practices:
Domain | Example Indicators |
---|---|
Upstream Resources | Data sources, selection criteria, compute hardware, energy |
Model Details | Model architecture, size, training process, documentation |
Downstream/Deployment | Usage policies, distribution channels, impact, feedback |
Governance | Safety audits, recourse mechanisms, evaluation procedures |
Each indicator is scored for a developer’s flagship model (e.g., GPT-4, PaLM 2, Llama 2):
- 0: No disclosure
- 0.5: Partial or vague disclosure
- 1: Complete, high-quality public disclosure
Overall and domain scores are computed via simple averaging or weighted sums. For a developer ,
where is the number of indicators and is the score on indicator for developer . The formula can be extended to set different weights per indicator.
3. Empirical Findings and Evolution (v1.0 to v1.1)
The inaugural FMTI in 2023 (v1.0) assessed 10 major foundation model developers, revealing that transparency was low and uneven: no developer scored above 65 out of 100, with a median of 37. Systemic opacity was especially acute in areas such as upstream data, compute usage, and downstream societal impact (2407.12929).
The v1.1 iteration six months later documented substantial improvement: average transparency increased to 58/100, with newer disclosures frequently spurred by the index itself and the competitive/peer pressure it generated. Developers published new transparency reports, directly referencing FMTI indicators. However, persistent opacity remained around sensitive areas: data composition, licensing, data labor practices, model architecture, and actual real-world impact (2407.12929).
Notably, the process of repeated, indicator-level measurement enabled both longitudinal tracking (temporal analysis of progress) and targeted intervention—regulators and advocacy groups can point to explicit gaps to direct future policy or transparency demands.
4. Regulatory and Policy Interface
FMTI’s structure aligns or overlaps with transparency requirements in emerging global AI policy frameworks, including the EU AI Act, US Executive Order on AI Safety, G7 Hiroshima Code of Conduct, and the proposed US Foundation Model Transparency Act. The majority of these policies address only subsets of FMTI indicators; for example, the EU AI Act overlaps with 30/100 indicators, while most others overlap with fewer than 15 (2402.16268).
Reporting against the FMTI can reduce compliance costs by pre-aligning developer documentation with policy-mandated requirements, enhancing cross-jurisdictional comparability, and lowering the barrier for regulatory review and market entry. Policymakers are encouraged to benchmark minimum transparency requirements using FMTI-style indexes, promote standardized disclosure templates, and make regulatory approvals or procurement contingent on sufficient FMTI-aligned reports (2407.12929, 2402.16268).
5. Integration With Transparency Mechanisms and Industry Practice
The FMTI incentivizes and systematizes the adoption of a broad suite of transparency artifacts and tools:
- Data Documentation: Data statements, datasheets for datasets, and membership inference artifacts (“Data Portraits” (2303.03919)) facilitate traceability and auditability of training corpora.
- Executable and Machine-Readable Documentation: Incorporation of code, scripts, and data lineage for reproducible processing steps.
- Transparency Reports: Comprehensive, indicator-linked reports are increasingly being published by developers, often mapping directly to FMTI domains.
- Safety, Governance, and Use Policies: Standardization and disclosure of acceptable use policies, safety measures (e.g., PRISM frameworks for modular, independent safety in open-source models (2406.10415)), and feedback or recourse mechanisms.
- Benchmarking and Evaluation: Integration with model evaluation leaderboards and holistic evaluation frameworks (e.g., HELM).
Empirical findings indicate that structured benchmarking through FMTI encourages the release of accompanying transparency reports, catalyzes competition on transparency, and enhances the overall baseline of information accessible to researchers, civil society, and regulators (2506.23123).
6. Persistent Challenges and Areas of Opacity
Despite progress, the FMTI consistently reveals enduring deficits:
- Upstream Data and Labor: Dataset composition, copyright/licensing, data labor wages, and curation procedures are rarely fully disclosed due to commercial sensitivity, privacy/IP concerns, and lack of standardization.
- Model Architecture and Training: Detailed architectures and training regimens are often omitted.
- Downstream Impact and Feedback: Information about real-world usage, societal risks, harm mitigation, and user/community recourse is especially rare (2310.12941, 2409.03307).
- Enforcement of Use Policies: Acceptable use policies (AUPs) are heterogeneous, and enforcement is inconsistently reported, making actual risk management and redressability difficult (2409.09041).
- System vs. Model Transparency: Documentation and evaluations often focus on models rather than deployed systems, neglecting composition, orchestration, safety mechanisms, and context-specific risks (2406.16746, 2405.15802).
These gaps are structurally resistant to simple voluntary disclosure and often require regulatory pressure or community-driven standards to overcome.
7. Methodological Developments and Future Directions
Ongoing work suggests several enhancements and new frontiers for FMTI:
- Expansion Beyond Flagship Models: Broadening to cover a spectrum of AI systems, including smaller providers, sector-specific systems, and non-English/multimodal models (2409.03307).
- Model vs. System-Level Assessment: Distinguishing between transparency in isolated models vs. fully deployed AI systems, accounting for all layers—data, code, weights, interfaces, infrastructure, and safety controls (2405.15802).
- Multidimensional and Machine-Readable Indices: Movement towards machine-readable, componentized reporting (e.g., “Leaderboard Bill of Materials” in leaderboard infrastructure (2407.04065)) and automated documentation frameworks.
- Integration of Stakeholder Perspectives: Tailoring indicators and reporting to diverse audiences, including domain experts, regulators, deployers, and affected communities.
- Empirical and Theoretical Validation: Incorporating interpretable, theory-grounded assessment of model generalization, expressiveness, and ethical risk, as surveyed in emerging work (2410.11444).
- Policy Impact and Scientific Governance: Employing indexes as operational “public goods” and infrastructure for robust regulatory science and democratic oversight (2506.23123).
Aspect | Description/Status |
---|---|
Domains | Upstream, Model, Downstream, Governance |
Indicators (v1.1) | 100 (binary/ordinal) |
Average Transparency | 0.37 (all developers, v1.1) |
Most Opaque Areas | Data composition/licensing, labor, model details, downstream impact |
Impact | Benchmarks transparency, catalyzes reporting/improvement, policy |
Policy Alignment | Overlaps with EU AI Act, US/EU policy, G7, etc. |
Notable Improvements | Post-v1.0, new transparency reports and increased indicator coverage |
The Foundation Model Transparency Index serves as both a scientific instrument and a catalyst for change—defining, measuring, and promoting transparency across critical dimensions of foundation model development and deployment. Its adoption in both policy and practice represents a shift towards evidence-based AI governance and more accountable, robust, and socially beneficial foundation model ecosystems.