Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Foundation Model Transparency Reports (2402.16268v1)

Published 26 Feb 2024 in cs.LG, cs.AI, and cs.CY

Abstract: Foundation models are critical digital technologies with sweeping societal impact that necessitates transparency. To codify how foundation model developers should provide transparency about the development and deployment of their models, we propose Foundation Model Transparency Reports, drawing upon the transparency reporting practices in social media. While external documentation of societal harms prompted social media transparency reports, our objective is to institutionalize transparency reporting for foundation models while the industry is still nascent. To design our reports, we identify 6 design principles given the successes and shortcomings of social media transparency reporting. To further schematize our reports, we draw upon the 100 transparency indicators from the Foundation Model Transparency Index. Given these indicators, we measure the extent to which they overlap with the transparency requirements included in six prominent government policies (e.g., the EU AI Act, the US Executive Order on Safe, Secure, and Trustworthy AI). Well-designed transparency reports could reduce compliance costs, in part due to overlapping regulatory requirements across different jurisdictions. We encourage foundation model developers to regularly publish transparency reports, building upon recommendations from the G7 and the White House.

Proposing Foundation Model Transparency Reports: A Structured Approach to Autonomous Transparency in AI Development

Introduction

The domain of AI has witnessed an unprecedented surge in interest and development of foundation models, significantly impacting various aspects of society. Despite their transformative potential, a glaring opacity within the foundation model ecosystem has raised substantial concerns. Addressing this issue head-on, this paper proposes Foundation Model Transparency Reports as a structured method to ensure comprehensive and coherent transparency from the developers of these models.

Reflections on Social Media Transparency Reports

Drawing parallels from the field of social media, where transparency reporting has evolved into a crucial mechanism to address societal harms, the paper intricately analyses the trajectory of these reports. The analysis uncovers valuable insights into the driving forces behind their emergence and evolution, highlighting the role of societal and regulatory pressures in fostering greater transparency. It elaborates on how, despite their benefits, such reports have struggled with standardization, completeness, and the precision of disclosed information, raising concerns about their effectiveness in truly fostering trust and accountability.

Design Principles for Foundation Model Transparency Reports

Navigating through the shortcomings and successes of social media transparency initiatives, the paper identifies six fundamental design principles crucial for the conceptualization of Foundation Model Transparency Reports. These principles emphasize the necessity for a structured, standardized, and methodologically clear reporting schema that is independently specified and comprehensively covers upstream resources, model properties, and downstream impacts of foundation models. The proposed design meticulously addresses the need for centralization, contextualization, and clarity in transparency reporting, aiming for a holistic depiction of foundation model ecosystems.

Aligning with Government Policies and Enhancing Compliance

The endeavor further explores the alignment of proposed transparency indicators with existing and forthcoming government policies across jurisdictions, shedding light on the considerable gap between current regulatory expectations and the detailed transparency facilitated by the proposed reports. By offering a schema that potentially reduces compliance costs and enhances regulatory alignment, the paper posits Foundation Model Transparency Reports as a strategic tool in navigating the complex regulatory landscapes governing AI development and deployment.

A Call for Robust Transparency Norms and Industry Standards

This research not only underscores the immediate need for enhanced transparency within the foundation model ecosystem but also advocates for the establishment of robust industry standards and norms that transcend mere compliance. Through a critical examination of existing practices and a forward-looking approach to transparency reporting, it sets the stage for significant shifts in how foundational models are developed, deployed, and scrutinized in the public domain.

Concluding Remarks

In summarizing, the paper positions Foundation Model Transparency Reports as a pivotal mechanism for institutionalizing transparency within the nascent foundation model industry. By drawing from historical precedents, existing practices, and a comprehensive understanding of the landscape, it charts a path toward a more transparent, accountable, and socially responsive AI future. The proposed framework not only promises to mitigate the risks associated with foundation models but also potentially fosters a culture of openness and trust, laying the groundwork for future developments in the field of generative AI.

The research concludes with a call to action for foundation model developers, urging them to embrace the practice of transparency reporting proactively. It is a clarion call for developers to align with broader societal values and regulatory expectations, ensuring that the advancement of AI technologies does not come at the cost of transparency, accountability, or societal well-being.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (99)
  1. 2010. Greater transparency around government requests. https://googleblog.blogspot.com/2010/04/greater-transparency-around-government.html.
  2. 2010. Internal Report Finds Flagrant National Security Letter Abuse By FBI. https://www.aclu.org/press-releases/internal-report-finds-flagrant-national-security-letter-abuse-fbi.
  3. 2014. 2014 Transparency Report. https://extfiles.etsy.com/Press/reports/Etsy_TransparencyReport_2014.pdf.
  4. 2020. WFA and platforms make major progress to address harmful content. https://wfanet.org/knowledge/item/2020/09/23/WFA-and-platforms-make-major-progress-to-address-harmful-content.
  5. A Safe Harbor for Platform Research. https://knightcolumbia.org/content/a-safe-harbor-for-platform-research.
  6. Persistent anti-muslim bias in large language models. arXiv preprint arXiv:2101.05783 (2021).
  7. Access Now. 2023. Transparency Reporting Index. https://www.accessnow.org/campaign/transparency-reporting-index/.
  8. Mike Ananny and Kate Crawford. 2018. Seeing without knowing: Limitations of the transparency ideal and its application to algorithmic accountability. New Media & Society 20, 3 (2018), 973–989. https://doi.org/10.1177/1461444816676645 arXiv:https://doi.org/10.1177/1461444816676645
  9. Frontier AI Regulation: Managing Emerging Risks to Public Safety. arXiv:2307.03718 [cs.CY]
  10. Aspen Institute. 2021. Commission on Information Disorder Final Report. https://www.aspeninstitute.org/wp-content/uploads/2021/11/Aspen-Institute_Commission-on-Information-Disorder_Final-Report.pdf.
  11. Case Study #3: Transparency Reporting. https://www.newamerica.org/in-depth/getting-internet-companies-do-right-thing/case-study-3-transparency-reporting/.
  12. Socially meaningful transparency in data-based systems: reflections and proposals from practice. Journal of Documentation (2023). https://doi.org/10.1108/JD-01-2023-0006
  13. Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics (TACL) 6 (2018), 587–604.
  14. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 610–623.
  15. Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 1493–1504. https://doi.org/10.1145/3593013.3594095
  16. Clare Birchall. 2021. Radical secrecy: The ends of transparency in datafied America. Vol. 60. U of Minnesota Press.
  17. On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv:2108.07258 (2021).
  18. The Foundation Model Transparency Index. arXiv:2310.12941 [cs.LG]
  19. Ecosystem Graphs: The Social Footprint of Foundation Models. ArXiv abs/2303.15772 (2023). https://api.semanticscholar.org/CorpusID:257771875
  20. Improving Transparency in AI Language Models: A Holistic Evaluation. Foundation Model Issue Brief Series (2023). https://hai.stanford.edu/foundation-model-issue-brief-series
  21. Danah Boyd. 2016. Algorithmic Accountability and Transparency. Open Transcripts. http://opentranscripts.org/transcript/danah-boyd-algorithmic-accountability-transparency/ Presented by danah boyd in Algorithmic Accountability and Transparency in the Digital Economy.
  22. Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv:2303.12712 [cs.CL]
  23. The Transparency Reporting Toolkit: Survey & Best Practice Memos for Reporting on U.S. Government Requests for User Information. https://www.newamerica.org/oti/policy-papers/the-transparency-reporting-toolkit/.
  24. European Commission. 2022. The Digital Services Act: ensuring a safe and accountable online environment. European Commission (2022). https://commission.europa.eu/strategy-and-policy/priorities-2019-2024/europe-fit-digital-age/digital-services-act-ensuring-safe-and-accountable-online-environment_en
  25. United States Congress. 2023. AI Foundation Model Transparency Act. https://beyer.house.gov/uploadedfiles/ai_foundation_model_transparency_act_text_118.pdf
  26. Kate Crawford. 2021. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.
  27. Who Has Your Back? https://www.eff.org/files/2019/06/11/whyb_2019_report.pdf
  28. Universal Digital Ad Transparency. In TPRC49: The 49th Research Conference on Communication, Information and Internet Policy. Available at SSRN: https://ssrn.com/abstract=3898214 or http://dx.doi.org/10.2139/ssrn.3898214.
  29. European Commission. 2023. Commission launches public consultation on the Implementing Regulation on transparency reporting under the DSA. https://digital-strategy.ec.europa.eu/en/news/commission-launches-public-consultation-implementing-regulation-transparency-reporting-under-dsa
  30. European Council. 2024. Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain Union legislative acts. https://data.consilium.europa.eu/doc/document/ST-5662-2024-INIT/en/pdf
  31. Facebook. 2023. Facebook Transparent Reports. https://transparency.fb.com/reports/
  32. U.S. Food and Drug Administration. 2018. Questions and Answers on FDA’s Adverse Event Reporting System (FAERS). https://www.fda.gov/drugs/surveillance/questions-and-answers-fdas-adverse-event-reporting-system-faers.
  33. U.S. Food and Drug Administration. 2021. FDA Adverse Event Reporting System (FAERS): Latest Quartely Data Files. https://catalog.data.gov/dataset/fda-adverse-event-reporting-system-faers-latest-quartely-data-files.
  34. U.S. Food and Drug Administration. 2023. FDA Adverse Event Reporting System (FAERS) Public Dashboard. https://www.fda.gov/drugs/questions-and-answers-fdas-adverse-event-reporting-system-faers/fda-adverse-event-reporting-system-faers-public-dashboard.
  35. What’s going on with the Open LLM Leaderboard? https://huggingface.co/blog/evaluating-mmlu-leaderboard
  36. Datasheets for datasets. Commun. ACM 64, 12 (2021), 86–92.
  37. Datasheets for Datasets. arXiv preprint arXiv:1803.09010 (2018).
  38. Ritwick Ghosh and Hilary Oliva Faxon. 2023. Smart corruption: Satirical strategies for gaming accountability. Big Data & Society 10, 1 (2023), 20539517231164119. https://doi.org/10.1177/20539517231164119 arXiv:https://doi.org/10.1177/20539517231164119
  39. Robert Gorwa and Timothy Garton Ash. 2020. Democratic Transparency in the Platform Society. Cambridge University Press, 286–312.
  40. Mary L Gray and Siddharth Suri. 2019. Ghost work: How to stop Silicon Valley from building a new global underclass. Eamon Dolan Books.
  41. Glenn Greenwald. 2013. NSA collecting phone records of millions of Verizon customers daily. https://www.theguardian.com/world/2013/jun/06/nsa-phone-records-verizon-court-order.
  42. Group of Seven. 2023. Hiroshima Process International Code of Conduct for Organizations Developing Advanced AI Syste. https://www.mofa.go.jp/files/100573473.pdf
  43. AI Regulation Has Its Own Alignment Problem: The Technical and Institutional Feasibility of Disclosure, Registration, Licensing, and Auditing. George Washington Law Review, Symposium on Legally Disruptive Emerging Technologies (2023).
  44. Byung-Chul Han. 2015. The transparency society. Stanford University Press.
  45. Karen Hao and Deepa Seetharaman. 2023. Cleaning Up ChatGPT Takes Heavy Toll on Human Workers. The Wall Street Journal (24 July 2023). https://www.wsj.com/articles/chatgpt-openai-content-abusive-sexually-explicit-harassment-kenya-workers-on-human-workers-cf191483 Photographs by Natalia Jidovanu.
  46. Woodrow Hartzog. 2023. Oversight of A.I.: Legislating on Artificial Intelligence. Prepared Testimony and Statement for the Record before the U.S. Senate Committee on the Judiciary, Subcommittee on Privacy, Technology, and the Law. https://www.judiciary.senate.gov/imo/media/doc/2023-09-12_pm_-_testimony_-_hartzog.pdf
  47. Measuring massive multitask language understanding. In International Conference on Learning Representations (ICLR).
  48. The White House. 2023. Ensuring Safe, Secure, and Trustworthy AI. https://www.whitehouse.gov/wp-content/uploads/2023/07/Ensuring-Safe-Secure-and-Trustworthy-AI.pdf
  49. Innovation, Science and Economic Development Canada. 2023. Voluntary Code of Conduct on the Responsible Development and Management of Advanced Generative AI Systems. https://ised-isde.canada.ca/site/ised/en/voluntary-code-conduct-responsible-development-and-management-advanced-generative-ai-systems
  50. Sayash Kapoor and Arvind Narayanan. 2023. Licensing is neither feasible nor effective for addressing AI risks. https://www.aisnakeoil.com/p/licensing-is-neither-feasible-nor
  51. Daphne Keller. 2021. Some Humility About Transparency. https://cyberlaw.stanford.edu/blog/2021/03/some-humility-about-transparency.
  52. Daphne Keller. 2022. Hearing on Platform Transparency: Understanding the Impact of Social Media. Technical Report. United States Senate Committee on the Judiciary, Subcommittee on Privacy, Technology and the Law. https://www.judiciary.senate.gov/imo/media/doc/Keller%20Testimony1.pdf Statement of Daphne Keller, Stanford University Cyber Policy Center.
  53. Jeremy Kessel. 2016. Advancing #transparency with more insightful data. https://blog.twitter.com/official/en_us/a/2016/advancing-transparency-with-more-insightful-data.html.
  54. Atul Kumar. 2018. The Newly Available FAERS Public Dashboard: Implications for Health Care Professionals. Issue 2.
  55. Seth Lazar. 2023. Governing the Algorithmic City. Tanner Lectures (2023). https://write.as/sethlazar/
  56. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model. (2022). https://doi.org/10.48550/ARXIV.2211.05100
  57. Emma Llansó and Caitlin Vogus. 2021. Transparency Reports. https://cdt.org/wp-content/uploads/2022/01/2021-12-20-FX-Transparency-Framework-brief-Transparency-Reports-final.pdf.
  58. Stable Bias: Analyzing Societal Representations in Diffusion Models. arXiv:2303.11408 [cs.CY]
  59. Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model. ArXiv abs/2211.02001 (2022). https://api.semanticscholar.org/CorpusID:253265387
  60. 2019 RDR Corporate Accountability Index. https://rankingdigitalrights.org/index2019/assets/static/download/RDRindex2019report.pdf
  61. By the Numbers: Tracking The AI Executive Order. https://hai.stanford.edu/news/numbers-tracking-ai-executive-order
  62. Gabby Miller. 2023. Tracking the First Digital Services Act Transparency Reports. https://www.techpolicy.press/tracking-the-first-digital-services-act-transparency-reports/.
  63. Model Cards for Model Reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency (2018).
  64. Brent Mittelstadt. 2019. Principles alone cannot guarantee ethical AI. Nature Machine Intelligence 1, 11 (November 2019), 501–507. https://doi.org/10.1038/s42256-019-0114-4
  65. NAIAC. 2023. RECOMMENDATION: Improve Monitoring of Emerging Risks from AI through Adverse Event Reporting. https://ai.gov/wp-content/uploads/2023/12/Recommendation_Improve-Monitoring-of-Emerging-Risks-from-AI-through-Adverse-Event-Reporting.pdf
  66. Arvind Narayanan and Sayash Kapoor. 2023. Generative AI companies must publish transparency reports. https://knightcolumbia.org/blog/generative-ai-companies-must-publish-transparency-reports
  67. NYT. 2024. THE NEW YORK TIMES COMPANY v. MICROSOFT CORPORATION, OPENAI, INC., OPENAI LP, OPENAI GP, LLC, OPENAI, LLC, OPENAI OPCO LLC, OPENAI GLOBAL LLC, OAI CORPORATION, LLC, and OPENAI HOLDINGS, LLC. https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf
  68. Conference on Neural Information Processing Systems. 2022. NeurIPS 2022 Paper Checklist Guidelines. https://neurips.cc/Conferences/2022/PaperInformation/PaperChecklist
  69. Conference on Neural Information Processing Systems. 2023. Call for Main Conference Papers. https://2023.emnlp.org/calls/main_conference_papers/
  70. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  71. Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350 (2021).
  72. M. Perino. 2010. The Hellhound of Wall Street: How Ferdinand Pecora’s Investigation of the Great Crash Forever Changed American Finance. Penguin Publishing Group. https://books.google.com/books?id=VJZPEAAAQBAJ
  73. Billy Perrigo. 2022. Exclusive: OpenAI Used Kenyan Workers on Less Than 2 Per Hour to Make ChatGPT Less Toxic. Time (2022). https://time.com/6247678/openai-chatgpt-kenya-workers
  74. Sundar Pichai and Demis Hassabis. [n. d.]. Introducing Gemini: our largest and most capable AI model.
  75. Inioluwa Deborah Raji and Joy Buolamwini. 2019. Actionable Auditing: Investigating the Impact of Publicly Naming Biased Performance Results of Commercial AI Products. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (Honolulu, HI, USA) (AIES ’19). Association for Computing Machinery, New York, NY, USA, 429–435. https://doi.org/10.1145/3306618.3314244
  76. Jan Rydzak. 2023. The Stalled Machines of Transparency Reporting. https://carnegieendowment.org/2023/11/29/stalled-machines-of-transparency-reporting-pub-91085.
  77. Santa Clara Principles. 2023. The Santa Clara Principles: On Transparency and Accountability in Content Moderation. https://santaclaraprinciples.org/.
  78. Amy Schatz. 2006. Tech Firms Defend China Web Policies. https://www.wsj.com/articles/SB114002162437674809.
  79. Collaborative Governance of the EU Digital Single Market established by the Digital Services Act. University of Luxembourg Law Research Paper 2023, 09 (4 September 2023). https://ssrn.com/abstract=4561010
  80. U.S. Secutiries and Exchange Commision. 2024a. About the SEC. https://www.sec.gov/strategic-plan/about.
  81. U.S. Secutiries and Exchange Commision. 2024b. Form 10-K. https://www.investor.gov/introduction-investing/investing-basics/glossary/form-10-k.
  82. U.S. Secutiries and Exchange Commision. 2024c. Form 8-K. https://www.investor.gov/introduction-investing/investing-basics/glossary/form-8-k.
  83. U.S. Secutiries and Exchange Commision. 2024d. Generally Accepted Accounting Principles (GAAP). https://www.investor.gov/introduction-investing/investing-basics/glossary/generally-accepted-accounting-principles-gaap.
  84. U.S. Secutiries and Exchange Commision. 2024e. Generally Accepted Accounting Principles (GAAP). https://www.investor.gov/introduction-investing/investing-basics/glossary/generally-accepted-accounting-principles-gaap.
  85. Katie Stoughton and Paul Rosenzweig. 2022. Toward Greater Content Moderation Transparency Reporting. Lawfare. https://www.lawfaremedia.org/article/toward-greater-content-moderation-transparency-reporting
  86. Trust and Safety Professional Association. 2023. Transparency Reporting. https://www.tspa.org/curriculum/ts-fundamentals/transparency-report/.
  87. UK CMA. 2023. AI Foundation Models: Initial Report. https://assets.publishing.service.gov.uk/media/65081d3aa41cc300145612c0/Full_report_.pdf
  88. United States Executive Office of the President. 2023. Executive Order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence
  89. Aleksandra Urman and Mykola Makhortykh. 2023. How transparent are transparency reports? Comparative analysis of transparency reporting across online platforms. Telecommunications Policy 47, 3 (2023), 102477. https://doi.org/10.1016/j.telpol.2022.102477
  90. Valerie C. Brannon and Victoria L. Killion and Whitney K. Novak and L. Paige Whitaker. 2023. First Amendment Limitations on Disclosure Requirements. https://crsreports.congress.gov/product/pdf/IF/IF12388
  91. Mathias Vermeulen. 2021. The Keys to the Kingdom. https://knightcolumbia.org/content/the-keys-to-the-kingdom
  92. Jai Vipra and Anton Korinek. 2023. Market concentration implications of foundation models: The Invisible Hand of ChatGPT. The Brookings Institution (2023). https://www.brookings.edu/articles/market-concentration-implications-of-foundation-models-the-invisible-hand-of-chatgpt
  93. Caitlin Vogus and Emma Llansó. 2021. Making Transparency Meaningful: A Framework for Policymakers. Center for Democracy and Technology (2021). https://cdt.org/insights/report-making-transparency-meaningful-a-framework-for-policymakers/
  94. Emergent Abilities of Large Language Models. Transactions on Machine Learning Research (2022). https://openreview.net/forum?id=yzkSU5zdwD Survey Certification.
  95. Taxonomy of Risks Posed by Language Models. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 214–229. https://doi.org/10.1145/3531146.3533088
  96. De Anima: On the Soul. https://www.noemamag.com/the-exploited-labor-behind-artificial-intelligence/
  97. X. 2023. An update on Twitter Transparency Reporting. https://blog.twitter.com/en_us/topics/company/2023/an-update-on-twitter-transparency-reporting.
  98. Monika Zalnieriute. 2021. “Transparency-Washing” in the Digital Age : A Corporate Agenda of Procedural Fetishism. Technical Report. http://hdl.handle.net/11159/468588
  99. Jenny Zhu. 2015. A perfect EFF score! We’re proud to have your back. https://wordpress.com/blog/2015/06/17/a-perfect-eff-score-were-proud-to-have-your-back/.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Rishi Bommasani (28 papers)
  2. Kevin Klyman (18 papers)
  3. Shayne Longpre (49 papers)
  4. Betty Xiong (5 papers)
  5. Sayash Kapoor (24 papers)
  6. Nestor Maslej (10 papers)
  7. Arvind Narayanan (48 papers)
  8. Percy Liang (239 papers)
Citations (8)