Model Cards and Datasheets

Updated 12 September 2025

Model cards and datasheets are structured documentation tools that capture provenance, limitations, and evaluation metrics of machine learning models and datasets.
They standardize key information including assumptions, performance metrics, and ethical considerations to enhance reproducibility and informed decision-making.
Emerging automated and regulatory approaches aim to streamline these documentation practices, ensuring compliant, sustainable, and transparent AI deployments.

Model cards and datasheets are structured documentation tools developed to increase transparency, accountability, and responsible practice in ML and AI. Originating from the analogy with industrial datasheets and nutrition labels, these frameworks encode provenance, limitations, evaluation, and context for datasets and models, facilitating reproducibility, ethical reflection, and informed decision-making by various stakeholders.

1. Principles and Motivations

Model cards (for ML models) and datasheets (for datasets) emerge from the recognition that undocumented or under-documented artifacts in ML pipelines can propagate and even amplify biases, risks, and misapplications, especially in high-stakes domains such as justice or finance (Gebru et al., 2018, Mitchell et al., 2018). The documentation deficit in ML has led to the following:

Lack of accountability: without archival records on how datasets/models were constructed, validating and tracing sources of bias or error is challenging.
Barrier to reproducibility: unclear or absent provenance, composition, and pre-processing details stymie attempts to reproduce results or fairly benchmark models.
Mismatched assumptions and harmful deployment: the intended uses and limitations of datasets and models are not reliably communicated to downstream consumers.

To address these issues, model cards and datasheets are designed to:

Ensure systematic disclosure of assumptions, risks, and technical properties;
Shift the cultural norm from ad hoc reporting to standardized, community-guided processes;
Support regulatory, ethical, and safety requirements by introducing documentation checkpoints throughout the AI lifecycle (Brajovic et al., 2023).

2. Structure and Content

Datasheets for Datasets

Datasheets (Gebru et al., 2018) are organized by lifecycle stages and typically include:

Motivation: Purpose, creators, funding sources, and assumptions.
Composition: Description of instances, labels, errors, and relationships (e.g., full set vs. curated sample).
Collection Process: Temporal, procedural, and ethical dimensions of data procurement.
Preprocessing/Cleaning/Labeling: Preprocessing steps, retention of raw data, and tooling.
Uses: Prior, intended, and contraindicated uses, with explicit risk statements.
Distribution: Access, licensing, and restrictions.
Maintenance: Plans for updating, retention of sensitive data, and stewardship roles.

Model Cards

Model cards for model reporting (Mitchell et al., 2018) include:

Model Details: Developer, date, version, training parameters.
Intended Use: Targeted application domains, user classes, and out-of-scope usage warnings.
Factors: Contextual/environmental factors (demographics, instrumentation) influencing performance.
Metrics: Reported via confusion matrix (FPR, FNR, FDR, etc.), as well as threshold-agnostic measures (e.g., AUC, KL divergence).
Evaluation Data: Details and rationale of datasets used for evaluation.
Training Data: Dataset description, group representation, bias reporting.
Quantitative Analyses: Disaggregated performance (unitary, intersectional groups).
Ethical Considerations: Sensitive attributes, risk of misuse.
Caveats and Recommendations: Limitations, ethical risks, and areas requiring further research.

Both datasheets and model cards are often presented in human-readable form, although machine-readable schemas and ontologically structured representations are now emerging (Amith et al., 2023, Čop et al., 19 May 2025).

A burgeoning ecosystem of related artifacts refines, generalizes, or systematizes documentation:

Data Cards: Modular, user-centric summaries with layered detail (telescopic, periscopic, microscopic) for capturing rationale, evolution, and operational impact (Pushkarna et al., 2022).
Data Readiness Reports: Focused on data quality, readiness, and transformation lineage, serving governance and auditability (Afzal et al., 2020).
Network Cards: Optimized for summarizing network datasets, emphasizing topological statistics and data construction process (Bagrow et al., 2022).
DAG Cards: Documentation for the entire ML pipeline (directed acyclic graph of workflow steps), automatically generated from code and run metadata (Tagliabue et al., 2021).
Method Cards: Prescriptive guidance for ML development processes, offering actionable, method-level instructions (beyond descriptive records) (Adkins et al., 2022).
Care Labels: Concise, at-a-glance ratings (A–D scale) that summarize static (theoretical) and dynamic (empirical) properties for end-user comprehension (Morik et al., 2021).
Ontology-based and Machine-Readable Schemas: Compute-friendly representations (OWL2, JSON schema, KG ontology) supporting automated analysis, FAIR compliance, and registry integration (Amith et al., 2023, Čop et al., 19 May 2025).
Regulatory Cards: Unified bundles supporting use-case, data, model, and operational requirements compliance with legal standards (e.g., EU AI Act) (Brajovic et al., 2023).
Sustainability Extensions: Domain-specific languages and additional sections to quantify energy consumption, carbon footprint, and water use—expanding model cards for Green AI applications (Jouneaux et al., 25 Jul 2025).

4. Implementation, Adoption, and Automation

Widespread experimental and industrial adoption of model cards and datasheets has taken place across academia and large corporations (e.g., Microsoft, Google, IBM) (Gebru et al., 2018). Templates for Hugging Face and GEM benchmarks in NLP illustrate iterative, stakeholder-driven template development and community-driven revisions hosted in open repositories (McMillan-Major et al., 2021). Automated documentation generation is increasingly feasible:

Automated Generation: Pipelines like CardGen decompose cards into sub-question tasks, employing LLMs with two-step retrieval from original sources (papers, code, READMEs) to answer each field, achieving higher objectivity and completeness than many human-authored cards (Liu et al., 10 May 2024).
NER/RE-Based Extraction: Frameworks such as AutoLLM-Card extract entity, license, and application triples using dependency parsing and knowledge graphs, supporting large-scale documentation (Tian et al., 25 Sep 2024).
Ontology-Based Authoring: Model card elements are mapped to ontology classes, with reasoning engines (OWL API, FaCT++) creating computable, queryable, and linked records essential for FAIR compliance (Amith et al., 2023).

Automated approaches are validated with human and automatic metrics (e.g., ROUGE-L, BERTScore), and ablation studies show advantages for structured, chain-of-thought prompting in retrieval and answer generation (Liu et al., 10 May 2024).

5. Evaluation, Impact, and Current Practices

Empirical assessments reveal both strengths and persistent deficits:

Section Informativeness: In analysis of 32,111 Hugging Face model cards, training sections are most complete (74.3%), but environmental impact (2.0%), evaluation (15.4%), and limitation sections (17.4%) are consistently underreported (Liang et al., 7 Feb 2024).
Ethics and Transparency Gaps: Only 0.3% of manually studied model cards document ethical considerations or caveats, and dataset curators, annotators, or source details are rarely listed in dataset cards (Oreamuno et al., 2023).
Community Impact and Downloads: Intervention studies show that adding detailed model cards to popular Hugging Face models correlates with up to 29.0% increase in downloads, indicating practical value for discoverability and user trust (Liang et al., 7 Feb 2024).
Standardization Pressures: Regulatory cards built to comply with EU law and various ISO standards promote consistent documentation for trustworthy AI and third-party auditing (Brajovic et al., 2023).
Sustainability Reporting: DSLs for sustainability model cards formalize reporting of energy, water, and emissions, and enable downstream machine parsing for model selection, benchmarking, and compliance with service-level agreements (Jouneaux et al., 25 Jul 2025).

6. Challenges, Limitations, and Future Outlook

Despite growing adoption, major challenges remain:

Overhead and Dynamic Datasets: Completing detailed cards involves significant work for creators, and living documents must evolve with dynamic datasets and updating models (Gebru et al., 2018).
Fragmentation and Versioning: Multiple template versions, manual fill-in, and lack of automation impede up-to-date and canonical documentation (Pushkarna et al., 2022).
Limited Semantic Interoperability: Human-readable cards are being supplanted by machine-readable schemas and knowledge graphs, yet community convergence on standards is nascent (Čop et al., 19 May 2025).
Verification and Trust: Without trustworthy attestations, malicious actors could misrepresent metrics or provenance. Hardware-assisted solutions (Laminator) and cryptographic attestations are emerging to ensure verifiable property cards (Duddu et al., 25 Jun 2024).
Green AI and Sustainability: Integrating environmental metrics (energy, carbon, water) and supporting computable, standardized, and actionable sustainability benchmarks into cards is an open research area (Jouneaux et al., 25 Jul 2025).

Future directions include automatic and semi-automatic card generation, continuous integration with MLOps and ML pipeline tools, deeper regulatory and ethical linkage, standardized assessment of provenance and performance, and widespread adoption of machine-readable, interoperable documentation supporting responsible, auditable, and sustainable AI development.