Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Safety cases for frontier AI (2410.21572v1)

Published 28 Oct 2024 in cs.CY

Abstract: As frontier AI systems become more capable, it becomes more important that developers can explain why their systems are sufficiently safe. One way to do so is via safety cases: reports that make a structured argument, supported by evidence, that a system is safe enough in a given operational context. Safety cases are already common in other safety-critical industries such as aviation and nuclear power. In this paper, we explain why they may also be a useful tool in frontier AI governance, both in industry self-regulation and government regulation. We then discuss the practicalities of safety cases, outlining how to produce a frontier AI safety case and discussing what still needs to happen before safety cases can substantially inform decisions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (120)
  1. \APACinsertmetastaradelard2024{APACrefauthors}Adelard.  \APACrefYearMonthDay2024. \APACrefbtitleThe Adelard Safety Case Development (ASCAD) Manual. The Adelard safety case development (ASCAD) manual. {APACrefURL} https://perma.cc/27J5-UPQ4 \PrintBackRefs\CurrentBib
  2. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleA Grading Rubric for AI Safety Frameworks A grading rubric for AI safety frameworks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2409.08751. \PrintBackRefs\CurrentBib
  3. \APACrefYearMonthDay2017. \APACrefbtitleFrom Safety Cases to Security Cases. From safety cases to security cases. \APAChowpublishedUniversity of York. {APACrefURL} https://perma.cc/434F-NP59 \PrintBackRefs\CurrentBib
  4. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleFrontier AI regulation: Managing emerging risks to public safety Frontier AI regulation: Managing emerging risks to public safety.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2307.03718. \PrintBackRefs\CurrentBib
  5. \APACinsertmetastaranthropic2023{APACrefauthors}Anthropic.  \APACrefYearMonthDay2023. \APACrefbtitleModel Card and Evaluations for Claude Models. Model card and evaluations for Claude models. {APACrefURL} https://perma.cc/D8NP-VGL3 \PrintBackRefs\CurrentBib
  6. \APACinsertmetastaranthropic2024{APACrefauthors}Anthropic.  \APACrefYearMonthDay2024. \APACrefbtitleResponsible Scaling Policy. Responsible scaling policy. {APACrefURL} https://perma.cc/DB9F-GAV4 \PrintBackRefs\CurrentBib
  7. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleFoundational Challenges in Assuring Alignment and Safety of Large Language Models Foundational challenges in assuring alignment and safety of large language models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2404.09932. \PrintBackRefs\CurrentBib
  8. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleConstitutional AI: Harmlessness from AI Feedback Constitutional AI: Harmlessness from AI feedback.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2212.08073. \PrintBackRefs\CurrentBib
  9. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleSelf-Regulation, Meta-regulation, and Regulatory Networks Self-regulation, meta-regulation, and regulatory networks.\BBCQ \BIn R. Baldwin, M. Cave\BCBL \BBA M. Lodge (\BEDS), \APACrefbtitleUnderstanding Regulation: Theory, Strategy, and Practice. Understanding regulation: Theory, strategy, and practice. \APACaddressPublisherOxford University Press. {APACrefDOI} \doi10.1093/acprof:osobl/9780199576081.003.0008 \PrintBackRefs\CurrentBib
  10. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleManaging Extreme AI Risks amid Rapid Progress Managing extreme AI risks amid rapid progress.\BBCQ \APACjournalVolNumPagesScience3846698842–845. {APACrefDOI} \doi10.1126/science.adn0117 \PrintBackRefs\CurrentBib
  11. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleMechanistic Interpretability for AI Safety: A Review Mechanistic interpretability for AI safety: A review.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2404.14082. \PrintBackRefs\CurrentBib
  12. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleAI Auditing: The Broken Bus on the Road to AI Accountability AI auditing: The broken bus on the road to AI accountability.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2401.14462. \PrintBackRefs\CurrentBib
  13. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleSafety and Assurance Cases: Past, Present and Possible Future – An Adelard Perspective Safety and assurance cases: Past, present and possible future – An Adelard perspective.\BBCQ \BIn C. Dale \BBA T. Anderson (\BEDS), \APACrefbtitleMaking Systems Safer Making systems safer (\BPGS 51–67). \APACaddressPublisherSpringer. {APACrefDOI} \doi10.1007/978-1-84996-086-1_4 \PrintBackRefs\CurrentBib
  14. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleSafety Case Templates for Autonomous Systems Safety case templates for autonomous systems.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2102.02625. \PrintBackRefs\CurrentBib
  15. \APACrefYearMonthDay2023. \APACrefbtitleStructured Access for Third-Party Research on Frontier AI Models: Investigating Researchers’ Model Access Requirements. Structured access for third-party research on frontier AI models: Investigating researchers’ model access requirements. \APAChowpublishedOxford Martin AI Governance Initiative. {APACrefURL} https://perma.cc/E553-EJC2 \PrintBackRefs\CurrentBib
  16. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleWeak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Weak-to-strong generalization: Eliciting strong capabilities with weak supervision.\BBCQ \BIn \APACrefbtitleInternational Conference on Machine Learning International Conference on Machine Learning (\BPGS 4971–5012). \APACaddressPublisherPMLR. {APACrefURL} https://perma.cc/5A69-69X8 \PrintBackRefs\CurrentBib
  17. \APACinsertmetastarcarlsmith2023{APACrefauthors}Carlsmith, J.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleScheming AIs: Will AIs Fake Alignment during Training in Order to Get Power? Scheming AIs: Will AIs fake alignment during training in order to get power?\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2311.08379. \PrintBackRefs\CurrentBib
  18. \APACinsertmetastarukcentrefordataethicsandinnovationcdei2021{APACrefauthors}CDEI.  \APACrefYearMonthDay2021. \APACrefbtitleThe Roadmap to an Effective AI Assurance Ecosystem. The roadmap to an effective AI assurance ecosystem. {APACrefURL} https://perma.cc/CZ9M-HYTC \PrintBackRefs\CurrentBib
  19. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleHarms from Increasingly Agentic Algorithmic Systems Harms from increasingly agentic algorithmic systems.\BBCQ \BIn \APACrefbtitleACM Conference on Fairness, Accountability, and Transparency ACM Conference on Fairness, Accountability, and Transparency (\BPGS 651–666). {APACrefDOI} \doi10.1145/3593013.3594033 \PrintBackRefs\CurrentBib
  20. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleJailbreaking Black Box Large Language Models in Twenty Queries Jailbreaking black box large language models in twenty queries.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2310.08419v4. \PrintBackRefs\CurrentBib
  21. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleSupervising Strong Learners by Amplifying Weak Experts Supervising strong learners by amplifying weak experts.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1810.08575. \PrintBackRefs\CurrentBib
  22. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleSafety Cases: How to Justify the Safety of Advanced AI Systems Safety cases: How to justify the safety of advanced AI systems.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2403.10462. \PrintBackRefs\CurrentBib
  23. \APACrefYearMonthDay2010. \BBOQ\APACrefatitleMeta-Regulation and Self-Regulation Meta-regulation and self-regulation.\BBCQ \BIn R. Baldwin, M. Cave\BCBL \BBA M. Lodge (\BEDS), \APACrefbtitleThe Oxford Handbook of Regulation The Oxford handbook of regulation (\BPGS 146–168). \APACaddressPublisherOxford University Press. {APACrefDOI} \doi10.1093/oxfordhb/9780199560219.003.0008 \PrintBackRefs\CurrentBib
  24. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleManagement-Based Regulation Management-based regulation.\BBCQ \BIn K\BPBIR. Richards \BBA J. van Zeben (\BEDS), \APACrefbtitlePolicy Instruments in Environmental Law Policy instruments in environmental law (\BPGS 292–307). \APACaddressPublisherEdward Elgar. {APACrefDOI} \doi10.4337/9781785365683.VIII.20 \PrintBackRefs\CurrentBib
  25. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleRegulating Advanced Artificial Agents Regulating advanced artificial agents.\BBCQ \APACjournalVolNumPagesScience384669136–38. {APACrefDOI} \doi10.1126/science.adl0625 \PrintBackRefs\CurrentBib
  26. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleTowards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems Towards guaranteed safe AI: A framework for ensuring robust and reliable AI systems.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2405.06624. \PrintBackRefs\CurrentBib
  27. \APACinsertmetastar2020{APACrefauthors}David, R.  \APACrefYearMonthDay2018. \APACrefbtitleAn Introduction to System Safety Management in the MoD (White Booklet) Part 1. An introduction to system safety management in the MoD (white booklet) Part 1. \APAChowpublishedMoD. {APACrefURL} https://perma.cc/B4LM-F64A \PrintBackRefs\CurrentBib
  28. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAI Capabilities Can Be Significantly Improved without Expensive Retraining AI capabilities can be significantly improved without expensive retraining.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2312.07413. \PrintBackRefs\CurrentBib
  29. \APACinsertmetastarDecker2018-kd{APACrefauthors}Decker, C.  \APACrefYearMonthDay2018. \APACrefbtitleGoals-Based and Rules-Based Approaches to Regulation. Goals-based and rules-based approaches to regulation. \APAChowpublishedUK Department for Business, Energy & Industrial Strategy. {APACrefURL} https://perma.cc/YLD3-NFQA \PrintBackRefs\CurrentBib
  30. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleTowards Measurement of Confidence in Safety Cases Towards measurement of confidence in safety cases.\BBCQ \BIn \APACrefbtitleInternational Symposium on Empirical Software Engineering and Measurement International symposium on empirical software engineering and measurement (\BPGS 380–383). {APACrefDOI} \doi10.1109/ESEM.2011.53 \PrintBackRefs\CurrentBib
  31. \APACrefYearMonthDay2014. \APACrefbtitleNASA System Safety Handbook. Volume 2: System Safety Concepts, Guidelines, and Implementation Examples. NASA system safety handbook. Volume 2: System safety concepts, guidelines, and implementation examples. \APAChowpublishedNASA. {APACrefURL} https://perma.cc/759U-C6E4 \PrintBackRefs\CurrentBib
  32. \APACinsertmetastardsit2024{APACrefauthors}DSIT.  \APACrefYearMonthDay2024. \APACrefbtitleFrontier AI Safety Commitments, AI Seoul Summit 2024. Frontier AI safety commitments, AI Seoul Summit 2024. {APACrefURL} https://perma.cc/M9NQ-GNED \PrintBackRefs\CurrentBib
  33. \APACinsertmetastareuparliament2024{APACrefauthors}European Parliament.  \APACrefYearMonthDay2024. \APACrefbtitleRegulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act). {APACrefURL} https://perma.cc/J7RE-DRVR \PrintBackRefs\CurrentBib
  34. \APACrefYearMonthDay2024a. \BBOQ\APACrefatitleLLM Agents Can Autonomously Exploit One-day Vulnerabilities LLM agents can autonomously exploit one-day vulnerabilities.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2404.08144. \PrintBackRefs\CurrentBib
  35. \APACrefYearMonthDay2024b. \BBOQ\APACrefatitleLLM Agents Can Autonomously Hack Websites LLM agents can autonomously hack websites.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2402.06664. \PrintBackRefs\CurrentBib
  36. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleBuilding a Credible Case for Safety: Waymo’s Approach for the Determination of Absence of Unreasonable Risk Building a credible case for safety: Waymo’s approach for the determination of absence of unreasonable risk.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2306.01917. \PrintBackRefs\CurrentBib
  37. \APACrefYearMonthDay2022. \BBOQ\APACrefatitlePredictability and Surprise in Large Generative Models Predictability and surprise in large generative models.\BBCQ \BIn \APACrefbtitleACM Conference on Fairness, Accountability, and Transparency ACM Conference on Fairness, Accountability, and Transparency (\BPGS 1747–1764). {APACrefDOI} \doi10.1145/3531146.3533229 \PrintBackRefs\CurrentBib
  38. \APACrefYearMonthDayforthcoming. \APACrefbtitleSafety case template for frontier AI: A cyber inability argument. Safety case template for frontier AI: A cyber inability argument. \PrintBackRefs\CurrentBib
  39. \APACinsertmetastargoogledeepmind2024{APACrefauthors}Google DeepMind.  \APACrefYearMonthDay2024. \APACrefbtitleFrontier Safety Framework Version 1.0. Frontier Safety Framework Version 1.0. {APACrefURL} https://perma.cc/5Q3D-LM2Q \PrintBackRefs\CurrentBib
  40. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleThousands of AI Authors on the Future of AI Thousands of AI authors on the future of AI.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2401.02843. \PrintBackRefs\CurrentBib
  41. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleAI Control: Improving Safety Despite Intentional Subversion AI control: Improving safety despite intentional subversion.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2312.06942. \PrintBackRefs\CurrentBib
  42. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleSafety Cases: An Impending Crisis? Safety cases: An impending crisis?\BBCQ \BIn \APACrefbtitleSafety-Critical Systems Symposium. Safety-Critical Systems Symposium. \APACaddressPublisherUniversity of York. {APACrefURL} https://perma.cc/73QJ-XQL2 \PrintBackRefs\CurrentBib
  43. \APACinsertmetastarHaddon-Cave2009{APACrefauthors}Haddon-Cave, C.  \APACrefYearMonthDay2009. \APACrefbtitleThe Nimrod Review: An Independent Review into the Broader Issues Surrounding the Loss of the RAF Nimrod MR2 Aircraft XV230 in Afghanistan in 2006. The Nimrod review: An independent review into the broader issues surrounding the loss of the RAF Nimrod MR2 aircraft XV230 in Afghanistan in 2006. \APAChowpublishedThe Stationery Office. {APACrefURL} https://perma.cc/S4HG-9ER9 \PrintBackRefs\CurrentBib
  44. \APACinsertmetastarhagendorff2024{APACrefauthors}Hagendorff, T.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleDeception Abilities Emerged in Large Language Models Deception abilities emerged in large language models.\BBCQ \APACjournalVolNumPagesPNAS12124. {APACrefDOI} \doi10.1073/pnas.2317967121 \PrintBackRefs\CurrentBib
  45. \APACrefYearMonthDay2013. \BBOQ\APACrefatitleAssurance Cases and Prescriptive Software Safety Certification: A Comparative Study Assurance cases and prescriptive software safety certification: A comparative study.\BBCQ \APACjournalVolNumPagesSafety Science5955–71. {APACrefDOI} \doi10.1016/j.ssci.2013.04.007 \PrintBackRefs\CurrentBib
  46. \APACinsertmetastarhelfrich2024{APACrefauthors}Helfrich, G.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleThe Harms of Terminology: Why We Should Reject so-Called “Frontier AI” The harms of terminology: Why we should reject so-called “frontier AI”.\BBCQ \APACjournalVolNumPagesAI and Ethics43699–705. {APACrefDOI} \doi10.1007/s43681-024-00438-1 \PrintBackRefs\CurrentBib
  47. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAn Overview of Catastrophic AI Risks An overview of catastrophic AI risks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2306.12001. \PrintBackRefs\CurrentBib
  48. \APACinsertmetastarhopkins2012{APACrefauthors}Hopkins, A.  \APACrefYearMonthDay2012. \APACrefbtitleExplaining “Safety Case”. Explaining “safety case”. \APAChowpublishedAustralian National University. {APACrefURL} https://perma.cc/5VQ8-3FEZ \PrintBackRefs\CurrentBib
  49. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleSleeper Agents: Training Deceptive LLMs That Persist Through Safety Training Sleeper agents: Training deceptive LLMs that persist through safety training.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2401.05566. \PrintBackRefs\CurrentBib
  50. \APACinsertmetastarHinton2024{APACrefauthors}IDAIS.  \APACrefYearMonthDay2024. \APACrefbtitleIDAIS-Beijing, 2024 Statement. IDAIS-Beijing, 2024 statement. {APACrefURL} https://perma.cc/RVE3-PLMP \PrintBackRefs\CurrentBib
  51. \APACinsertmetastaringe2007{APACrefauthors}Inge, J\BPBIR.  \APACrefYearMonthDay2007. \BBOQ\APACrefatitleThe Safety Case, Its Development and Use in the United Kingdom The safety case, its development and use in the United Kingdom.\BBCQ \BIn \APACrefbtitleEquipment Safety Assurance Symposium 2007. Equipment safety assurance symposium 2007. \APACaddressPublisherISSC. {APACrefURL} https://perma.cc/T4L4-FP7C \PrintBackRefs\CurrentBib
  52. \APACinsertmetastarirving2024{APACrefauthors}Irving, G.  \APACrefYearMonthDay2024. \APACrefbtitleSafety Cases at AISI. Safety cases at AISI. \APAChowpublishedUK AI Safety Institute. {APACrefURL} https://perma.cc/6MRR-4VD4 \PrintBackRefs\CurrentBib
  53. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleAI Safety via Debate AI safety via debate.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1805.00899. \PrintBackRefs\CurrentBib
  54. \APACrefYearMonthDay2020. \BBOQ\APACrefatitleA Case Study of Agile Software Development for Safety-Critical Systems Projects A case study of agile software development for safety-critical systems projects.\BBCQ \APACjournalVolNumPagesReliability Engineering & System Safety200106954. {APACrefDOI} \doi10.1016/j.ress.2020.106954 \PrintBackRefs\CurrentBib
  55. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleUncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant Uncovering deceptive tendencies in language models: A simulated company AI assistant.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2405.01576. \PrintBackRefs\CurrentBib
  56. \APACinsertmetastardraganjovicic2009{APACrefauthors}Jovicic, D.  \APACrefYearMonthDay2009. \APACrefbtitleGuide for the Application of the Commission Regulation on the Adoption of a Common Safety Method on Risk Evaluation and Assessment as Referred to in Article 6(3)(a) of the Railway Safety Directive (ERA/GUI/01-2008/SAF). Guide for the application of the Commission Regulation on the Adoption of a Common Safety Method on Risk Evaluation and Assessment as Referred to in Article 6(3)(a) of the Railway Safety Directive (ERA/GUI/01-2008/SAF). \APAChowpublishedEuropean Railway Agency. {APACrefURL} https://perma.cc/C7CC-XXRF \PrintBackRefs\CurrentBib
  57. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleOn the Societal Impact of Open Foundation Models On the societal impact of open foundation models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2403.07918. \PrintBackRefs\CurrentBib
  58. \APACinsertmetastarkelly2017{APACrefauthors}Kelly, T.  \APACrefYearMonthDay2017. \BBOQ\APACrefatitleSafety Cases Safety cases.\BBCQ \BIn N. Moller, S\BPBIO. Hansson, J\BHBIE. Holmberg\BCBL \BBA C. Rollenhagen (\BEDS), \APACrefbtitleHandbook of Safety Principles Handbook of safety principles (\BPGS 361–385). \APACaddressPublisherWiley. {APACrefDOI} \doi10.1002/9781119443070.ch16 \PrintBackRefs\CurrentBib
  59. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleEvaluating Language-Model Agents on Realistic Autonomous Tasks Evaluating language-model agents on realistic autonomous tasks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2312.11671. \PrintBackRefs\CurrentBib
  60. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleRisk Assessment at AGI Companies: A Review of Popular Risk Assessment Techniques from Other Safety-Critical Industries Risk assessment at AGI companies: A review of popular risk assessment techniques from other safety-critical industries.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2307.08823. \PrintBackRefs\CurrentBib
  61. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleRisk Thresholds for Frontier AI Risk thresholds for frontier AI.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2406.14713. \PrintBackRefs\CurrentBib
  62. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleResponsible Reporting for Frontier AI Development Responsible reporting for frontier AI development.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2404.02675. \PrintBackRefs\CurrentBib
  63. \APACinsertmetastarkoopman2022{APACrefauthors}Koopman, P.  \APACrefYear2022. \APACrefbtitleHow Safe Is Safe Enough? Measuring and Predicting Autonomous Vehicle Safety How safe is safe enough? Measuring and predicting autonomous vehicle safety. \PrintBackRefs\CurrentBib
  64. \APACrefYearMonthDay2013. \BBOQ\APACrefatitleWhat Is a Complex System? What is a complex system?\BBCQ \APACjournalVolNumPagesEuropean Journal for Philosophy of Science3133–67. {APACrefDOI} \doi10.1007/s13194-012-0056-8 \PrintBackRefs\CurrentBib
  65. \APACrefYearMonthDay2013. \BBOQ\APACrefatitleSafety Cases: A Review of Challenges Safety cases: A review of challenges.\BBCQ \BIn \APACrefbtitle1st International Workshop on Assurance Cases for Software-Intensive Systems (ASSURE) 1st international workshop on assurance cases for software-intensive systems (ASSURE) (\BPGS 1–6). {APACrefDOI} \doi10.1109/ASSURE.2013.6614263 \PrintBackRefs\CurrentBib
  66. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleScalable Agent Alignment via Reward Modeling: A Research Direction Scalable agent alignment via reward modeling: A research direction.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:1811.07871. \PrintBackRefs\CurrentBib
  67. \APACinsertmetastarleveson2011{APACrefauthors}Leveson, N\BPBIG.  \APACrefYearMonthDay2011. \APACrefbtitleThe Use of Safety Cases in Certification and Regulation. The use of safety cases in certification and regulation. \APAChowpublishedMIT. {APACrefURL} https://perma.cc/LN5W-5UYE \PrintBackRefs\CurrentBib
  68. \APACinsertmetastarleveson2012{APACrefauthors}Leveson, N\BPBIG.  \APACrefYear2012. \APACrefbtitleEngineering a Safer World: Systems Thinking Applied to Safety Engineering a safer world: Systems thinking applied to safety. \APACaddressPublisherMIT Press. {APACrefDOI} \doi10.7551/mitpress/8179.001.0001 \PrintBackRefs\CurrentBib
  69. \APACinsertmetastarmilitaryaviationauthority{APACrefauthors}MAA.  \APACrefYearMonthDay2019. \APACrefbtitleManual of Air System Safety Cases (MASSC). Manual of air system safety cases (MASSC). {APACrefURL} https://perma.cc/P3FM-LYNA \PrintBackRefs\CurrentBib
  70. \APACinsertmetastarmagic2024{APACrefauthors}Magic.  \APACrefYearMonthDay2024. \APACrefbtitleAGI Readiness Policy. AGI readiness policy. {APACrefURL} https://perma.cc/AQ5N-G5V5 \PrintBackRefs\CurrentBib
  71. \APACinsertmetastarmetr2024{APACrefauthors}METR.  \APACrefYearMonthDay2024. \APACrefbtitleCommon elements of frontier AI safety policies. Common elements of frontier AI safety policies. {APACrefURL} https://perma.cc/5NN9-QPLU \PrintBackRefs\CurrentBib
  72. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleThe Threat of Offensive AI to Organizations The threat of offensive AI to organizations.\BBCQ \APACjournalVolNumPagesComputers & Security124103006. {APACrefDOI} \doi10.1016/j.cose.2022.103006 \PrintBackRefs\CurrentBib
  73. \APACinsertmetastarstan1996{APACrefauthors}MoD.  \APACrefYearMonthDay2007. \APACrefbtitleSafety Management Requirements for Defence Systems (Standard 00-56). Safety management requirements for defence systems (Standard 00-56). {APACrefURL} https://perma.cc/YUP4-V83D \PrintBackRefs\CurrentBib
  74. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleAuditing large language models: A three-layered approach Auditing large language models: A three-layered approach.\BBCQ \APACjournalVolNumPagesAI and Ethics. {APACrefDOI} \doi10.1007/s43681-023-00289-2 \PrintBackRefs\CurrentBib
  75. \APACrefYearMonthDay2001. \BBOQ\APACrefatitleSpace Shuttle RTOS Bayesian Network Space shuttle RTOS bayesian network.\BBCQ \BIn \APACrefbtitle20th Digital Avionics Systems Conference. 20th Digital Avionics Systems Conference. {APACrefDOI} \doi10.1109/DASC.2001.963378 \PrintBackRefs\CurrentBib
  76. \APACrefYearMonthDay2023. \APACrefbtitleThe Operational Risks of AI in Large-Scale Biological Attacks: A Red-Team Approach. The operational risks of AI in large-scale biological attacks: A red-team approach. \APAChowpublishedRAND. {APACrefDOI} \doi10.7249/RRA2977-1 \PrintBackRefs\CurrentBib
  77. \APACrefYearMonthDayforthcoming. \APACrefbtitleEstimating the Marginal Risk of LLMs: A Pilot Study. Estimating the marginal risk of LLMs: A pilot study. \PrintBackRefs\CurrentBib
  78. \APACinsertmetastarNCSC2024{APACrefauthors}NCSC.  \APACrefYearMonthDay2024. \APACrefbtitleThe Near-Term Impact of AI on the Cyber Threat. The near-term impact of AI on the cyber threat. {APACrefURL} https://perma.cc/6NPT-UHX4 \PrintBackRefs\CurrentBib
  79. \APACrefYearMonthDay2021. \BBOQ\APACrefatitleA Probabilistic Model of Belief in Safety Cases A probabilistic model of belief in safety cases.\BBCQ \APACjournalVolNumPagesSafety Science138105187. {APACrefDOI} \doi10.1016/j.ssci.2021.105187 \PrintBackRefs\CurrentBib
  80. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleThe Alignment Problem from a Deep Learning Perspective The alignment problem from a deep learning perspective.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2209.00626. \PrintBackRefs\CurrentBib
  81. \APACinsertmetastarnrc_safety_goals_1986{APACrefauthors}NRC.  \APACrefYearMonthDay1986. \APACrefbtitleSafety Goals for the Operations of Nuclear Power Plants: Policy Statement Republication. Safety goals for the operations of nuclear power plants: Policy statement republication. {APACrefURL} https://perma.cc/3M2V-U4TK \PrintBackRefs\CurrentBib
  82. \APACinsertmetastarusnuclearregulatorycommissionnrc2007{APACrefauthors}NRC.  \APACrefYearMonthDay2007. \APACrefbtitle10 CFR § 52.79 – Contents of applications; technical information in final safety analysis report. 10 CFR § 52.79 – Contents of applications; technical information in final safety analysis report. {APACrefURL} https://perma.cc/P2VS-8UBN \PrintBackRefs\CurrentBib
  83. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleDeployment Corrections: An Incident Response Framework for Frontier AI Models Deployment corrections: An incident response framework for frontier AI models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2310.00328. \PrintBackRefs\CurrentBib
  84. \APACinsertmetastarukofficefornuclearregulationonr2014{APACrefauthors}ONR.  \APACrefYearMonthDay2014. \APACrefbtitleSafety Assessment Principles For Nuclear Power Plants. Safety assessment principles for nuclear power plants. {APACrefURL} https://perma.cc/5RQK-YCD8 \PrintBackRefs\CurrentBib
  85. \APACinsertmetastarukofficefornuclearregulationonr2021{APACrefauthors}ONR.  \APACrefYearMonthDay2021. \APACrefbtitleLicensing Nuclear Installations. Licensing nuclear installations. {APACrefURL} https://perma.cc/B762-8SYY \PrintBackRefs\CurrentBib
  86. \APACinsertmetastarOpenAI2023-tt{APACrefauthors}OpenAI.  \APACrefYearMonthDay2023. \APACrefbtitlePreparedness Framework (Beta). Preparedness Framework (Beta). {APACrefURL} https://perma.cc/9FBB-URXF \PrintBackRefs\CurrentBib
  87. \APACinsertmetastaropenai2024{APACrefauthors}OpenAI.  \APACrefYearMonthDay2024. \APACrefbtitleOpenAI o1 System Card. OpenAI o1 system card. {APACrefURL} https://perma.cc/8PA5-UBL5 \PrintBackRefs\CurrentBib
  88. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleHow to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2309.15840. \PrintBackRefs\CurrentBib
  89. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleAI Deception: A Survey of Examples, Risks, and Potential Solutions AI deception: A survey of examples, risks, and potential solutions.\BBCQ \APACjournalVolNumPagesPatterns55. {APACrefDOI} \doi10.1016/j.patter.2024.100988 \PrintBackRefs\CurrentBib
  90. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleEvaluating Frontier Models for Dangerous Capabilities Evaluating frontier models for dangerous capabilities.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2403.13793. \PrintBackRefs\CurrentBib
  91. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleThe Fallacy of AI Functionality The fallacy of AI functionality.\BBCQ \BIn \APACrefbtitleACM Conference on Fairness, Accountability, and Transparency ACM Conference on Fairness, Accountability, and Transparency (\BPGS 959–972). {APACrefDOI} \doi10.1145/3531146.3533158 \PrintBackRefs\CurrentBib
  92. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleGaps in the Safety Evaluation of Generative AI Gaps in the safety evaluation of generative AI.\BBCQ \BIn \APACrefbtitleAAAI/ACM Conference on AI, Ethics, and Society. AAAI/ACM Conference on AI, Ethics, and Society. \APACaddressPublisherAAAI. \PrintBackRefs\CurrentBib
  93. \APACrefYearMonthDay2017. \APACrefbtitleUnderstanding What It Means for Assurance Cases to “Work”. Understanding what it means for assurance cases to “work”. \APAChowpublishedNASA. {APACrefURL} https://perma.cc/TA87-8TZL \PrintBackRefs\CurrentBib
  94. \APACinsertmetastarsandbrink2023{APACrefauthors}Sandbrink, J\BPBIB.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleArtificial Intelligence and Biological Misuse: Differentiating Risks of Language Models and Biological Design Tools Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2306.13952. \PrintBackRefs\CurrentBib
  95. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleLarge Language Models Can Strategically Deceive Their Users When Put Under Pressure Large language models can strategically deceive their users when put under pressure.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2311.07590. \PrintBackRefs\CurrentBib
  96. \APACinsertmetastarschuett2023{APACrefauthors}Schuett, J.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleThree Lines of Defense against Risks from AI Three lines of defense against risks from AI.\BBCQ \APACjournalVolNumPagesAI & Society. {APACrefDOI} \doi10.1007/s00146-023-01811-0 \PrintBackRefs\CurrentBib
  97. \APACinsertmetastarschuett2024-ia{APACrefauthors}Schuett, J.  \APACrefYearMonthDay2024. \BBOQ\APACrefatitleFrontier AI developers need an internal audit function Frontier AI developers need an internal audit function.\BBCQ \APACjournalVolNumPagesRisk Analysis1–21. {APACrefDOI} \doi10.1111/risa.17665 \PrintBackRefs\CurrentBib
  98. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleFrom Principles to Rules: A Regulatory Approach for Frontier AI From principles to rules: A regulatory approach for frontier AI.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2407.07300. \PrintBackRefs\CurrentBib
  99. \APACinsertmetastarscscassurancecaseworkinggroupacwg2021{APACrefauthors}SCSC.  \APACrefYearMonthDay2021. \APACrefbtitleGoal Structuring Notation Community Standard (Version 3). Goal structuring notation community standard (Version 3). {APACrefURL} https://perma.cc/CD9W-YX6S \PrintBackRefs\CurrentBib
  100. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleOpen-Sourcing Highly Capable Foundation Models: An Evaluation of Risks, Benefits, and Alternative Methods for Pursuing Open-Source Objectives Open-sourcing highly capable foundation models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2311.09227. \PrintBackRefs\CurrentBib
  101. \APACinsertmetastarshevlane2024{APACrefauthors}Shevlane, T.  \APACrefYearMonthDay2022. \BBOQ\APACrefatitleStructured Access: An Emerging Paradigm for Safe AI Deployment Structured access: An emerging paradigm for safe AI deployment.\BBCQ \BIn J\BPBIB. Bullock \BOthers. (\BEDS), \APACrefbtitleThe Oxford Handbook of AI Governance. The oxford handbook of AI governance. \APACaddressPublisherOxford University Press. {APACrefDOI} \doi10.1093/oxfordhb/9780197579329.013.39 \PrintBackRefs\CurrentBib
  102. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleModel Evaluation for Extreme Risks Model evaluation for extreme risks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2305.15324. \PrintBackRefs\CurrentBib
  103. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleCan Large Language Models Democratize Access to Dual-Use Biotechnology? Can large language models democratize access to dual-use biotechnology?\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2306.03809. \PrintBackRefs\CurrentBib
  104. \APACinsertmetastarsolaiman2023{APACrefauthors}Solaiman, I.  \APACrefYearMonthDay2023. \BBOQ\APACrefatitleThe Gradient of Generative AI Release: Methods and Considerations The gradient of generative AI release: Methods and considerations.\BBCQ \BIn \APACrefbtitleACM Conference on Fairness, Accountability, and Transparency ACM Conference on Fairness, Accountability, and Transparency (\BPGS 111–122). {APACrefDOI} \doi10.1145/3593013.3593981 \PrintBackRefs\CurrentBib
  105. \APACrefYearMonthDay2016. \BBOQ\APACrefatitleShould Healthcare Providers Do Safety Cases? Lessons from a Cross-Industry Review of Safety Case Practices Should healthcare providers do safety cases? Lessons from a cross-industry review of safety case practices.\BBCQ \APACjournalVolNumPagesSafety Science84181–189. {APACrefDOI} \doi10.1016/j.ssci.2015.12.021 \PrintBackRefs\CurrentBib
  106. \APACinsertmetastarthecollectiveintelligenceproject2023{APACrefauthors}The Collective Intelligence Project.  \APACrefYearMonthDay2023. \APACrefbtitleParticipatory AI Risk Prioritization. Participatory AI risk prioritization. {APACrefURL} https://perma.cc/8P29-NUCH \PrintBackRefs\CurrentBib
  107. \APACinsertmetastarcleland2012evidence{APACrefauthors}The Health Foundation.  \APACrefYearMonthDay2012. \APACrefbtitleEvidence: Using Safety Cases in Industry and Healthcare. Evidence: Using safety cases in industry and healthcare. {APACrefURL} https://perma.cc/X38X-EDXP \PrintBackRefs\CurrentBib
  108. \APACinsertmetastarThe_White_House2023-ru{APACrefauthors}The White House.  \APACrefYearMonthDay2023. \APACrefbtitleSafe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Executive Order 14110). Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence (Executive Order 14110). {APACrefURL} https://perma.cc/99VP-6H55 \PrintBackRefs\CurrentBib
  109. \APACinsertmetastarukparliament2000{APACrefauthors}UK Parliament.  \APACrefYearMonthDay2000. \APACrefbtitleThe Railways (Safety Case) Regulations 2000. The railways (safety case) regulations 2000. {APACrefURL} https://perma.cc/849E-3G9Y \PrintBackRefs\CurrentBib
  110. \APACinsertmetastarukparliament2015{APACrefauthors}UK Parliament.  \APACrefYearMonthDay2015\BCnt1. \APACrefbtitleThe Control of Major Accident Hazards Regulations 2015. The control of major accident hazards regulations 2015. {APACrefURL} https://perma.cc/8B34-36Z3 \PrintBackRefs\CurrentBib
  111. \APACinsertmetastarukparliament2015a{APACrefauthors}UK Parliament.  \APACrefYearMonthDay2015\BCnt2. \APACrefbtitleThe Offshore Installations (Offshore Safety Directive) (Safety Case Etc.) Regulations 2015. The offshore installations (offshore safety directive) (safety case etc.) regulations 2015. {APACrefURL} https://perma.cc/H94N-CDVF \PrintBackRefs\CurrentBib
  112. \APACinsertmetastarulstandards&engagementulse2023{APACrefauthors}UL.  \APACrefYearMonthDay2023. \APACrefbtitleEvaluation of Autonomous Products (UL 4600). Evaluation of Autonomous Products (UL 4600). \PrintBackRefs\CurrentBib
  113. \APACrefYearMonthDay2022. \BBOQ\APACrefatitleDual Use of Artificial-Intelligence-Powered Drug Discovery Dual use of artificial-intelligence-powered drug discovery.\BBCQ \APACjournalVolNumPagesNature Machine Intelligence43189–191. {APACrefDOI} \doi10.1038/s42256-022-00465-9 \PrintBackRefs\CurrentBib
  114. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleAI Sandbagging: Language Models Can Strategically Underperform on Evaluations AI sandbagging: Language models can strategically underperform on evaluations.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2406.07358. \PrintBackRefs\CurrentBib
  115. \APACrefYearMonthDay2018. \BBOQ\APACrefatitleModelling Confidence in Railway Safety Case Modelling confidence in railway safety case.\BBCQ \APACjournalVolNumPagesSafety Science110286–299. {APACrefDOI} \doi10.1016/j.ssci.2017.11.012 \PrintBackRefs\CurrentBib
  116. \APACrefYearMonthDay2024. \BBOQ\APACrefatitleAffirmative Safety: An Approach to Risk Management for High-Risk AI Affirmative safety: An approach to risk management for high-risk AI.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2406.15371. \PrintBackRefs\CurrentBib
  117. \APACrefYearMonthDay2011. \BBOQ\APACrefatitleSoftware Certification: Is There a Case against Safety Cases? Software certification: Is there a case against safety cases?\BBCQ \BIn R. Calinescu \BBA E. Jackson (\BEDS), \APACrefbtitleFoundations of Computer Software. Modeling, Development, and Verification of Adaptive Systems Foundations of computer software. modeling, development, and verification of adaptive systems (\BPGS 206–227). \APACaddressPublisherSpringer. {APACrefDOI} \doi10.1007/978-3-642-21292-5_12 \PrintBackRefs\CurrentBib
  118. \APACrefYearMonthDay2023. \BBOQ\APACrefatitleSociotechnical Safety Evaluation of Generative AI Systems Sociotechnical safety evaluation of generative AI systems.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2310.11986. \PrintBackRefs\CurrentBib
  119. \APACrefYearMonthDay2024. \APACrefbtitleInternational Scientific Report on the Safety of Advanced AI: Interim Report. International scientific report on the safety of advanced AI: Interim report. \APAChowpublishedDSIT. {APACrefURL} https://perma.cc/PM8K-9G48 \PrintBackRefs\CurrentBib
  120. \APACrefYearMonthDay2019. \APACrefbtitleThinking About Risks From AI: Accidents, Misuse and Structure. Thinking about risks from AI: Accidents, misuse and structure. \APAChowpublishedLawfare. {APACrefURL} https://perma.cc/57GZ-QXBD \PrintBackRefs\CurrentBib
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Marie Davidsen Buhl (5 papers)
  2. Gaurav Sett (2 papers)
  3. Leonie Koessler (6 papers)
  4. Jonas Schuett (20 papers)
  5. Markus Anderljung (29 papers)
Citations (3)

Summary

An Expert Overview of "Safety Cases for Frontier AI"

The paper "Safety Cases for Frontier AI," co-authored by researchers from the Centre for the Governance of AI, develops a structured approach for assessing the safety of frontier AI systems through safety cases. Safety cases are established practices in safety-critical industries like nuclear power and aviation, comprising a structured argument supported by evidence that an AI system is adequately safe for its intended operational context. This paper makes the case for adapting these practices to the AI domain and explores their utility in both self-regulation by developers and formal government regulation.

The authors delineate the four critical components of a safety case: objectives, arguments, evidence, and the operational scope. For frontier AI, this involves setting safety objectives, composing logical arguments supported by sufficing evidence, and defining the scope in which these claims remain valid. These components aim to ensure a comprehensive understanding and coverage of the potential risks posed by deploying frontier AI systems, which are defined as highly capable general-purpose AI systems with the potential to perform a wide variety of tasks at advanced levels.

From a pragmatic perspective, employing safety cases serves developers in several capacities: to inform critical development and deployment decisions, continuous risk management, and in fostering trust with stakeholders through transparent safety assurances. For regulators, safety cases offer a flexible tool that accommodates the rapid evolution and uncertainty inherent in frontier AI systems. Rather than prescribing specific safety practices, safety cases enable a systematic evaluation of whether a system meets safety objectives, allowing for future adaptability as AI capabilities and risks evolve.

However, the authors also identify several challenges. For self-regulation, developers must establish internal processes for producing and reviewing safety cases, potentially incorporating third-party reviews to mitigate biases. For regulation, challenges include developing an ecosystem to handle third-party involvement, setting clear regulatory expectations, and building governmental capacities to review safety cases effectively.

The paper underlines that while it is feasible to create rudimentary safety cases with existing knowledge and frameworks, the frontier nature of AI infers future systems might require innovative methodologies and breakthrough safety techniques. Thus, significant investments in safety research are imperative, with a focus on improving existing evaluation methodologies and establishing best practices for a dynamically changing landscape.

These findings are underscored by practical recommendations for both AI developers and governmental bodies. Developers are encouraged to integrate safety cases into their deployment cycles, while governments should consider policies that incentivize such practices and support the development of third-party ecosystems for safety verification. In regulation contexts, deploying safety cases promises more robust and adaptive governance of AI technologies compared to rigid compliance mechanisms.

In summary, this paper contributes a critical discourse on how structured risk assessments can be adapted to govern frontier AI. By aligning the intricacies of technology with regulatory frameworks, safety cases offer a promising path forward in ensuring that advancements in AI systems do not outpace our understanding and management of their risks. Looking ahead, adopting and refining safety case methodologies will likely be pivotal to framework applicable governance, enabling regulatory models to be robust enough to address frontier AI's potential societal impacts as its capabilities continue to evolve.