AI Risk Management Should Incorporate Both Safety and Security (2405.19524v1)

Published 29 May 2024 in cs.CR and cs.AI

Abstract: The exposure of security vulnerabilities in safety-aligned LLMs, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.

Authors (25)

Xiangyu Qi (21 papers)
Yangsibo Huang (40 papers)
Yi Zeng (153 papers)
Edoardo Debenedetti (16 papers)
Jonas Geiping (73 papers)
Luxi He (9 papers)
Kaixuan Huang (70 papers)
Udari Madhushani (15 papers)
Vikash Sehwag (33 papers)
Weijia Shi (55 papers)
Boyi Wei (10 papers)
Tinghao Xie (10 papers)
Danqi Chen (84 papers)
Pin-Yu Chen (311 papers)
Jeffrey Ding (5 papers)
Ruoxi Jia (88 papers)
Jiaqi Ma (83 papers)
Arvind Narayanan (48 papers)
Mengdi Wang (199 papers)
Chaowei Xiao (110 papers)

Citations (6)

View on Semantic Scholar

Summary

AI Risk Management: The Dual Considerations of Safety and Security

The paper "AI Risk Management Should Incorporate Both Safety and Security" emphasizes the necessity of an integrated approach to AI risk management that concurrently addresses both AI safety and security. Historically treated as discrete disciplines, AI safety and security have traditionally been pursued with different objectives and methodologies. While safety aims to prevent harm caused by AI systems to the external environment, security is centered around protecting AI systems themselves from external threats. This paper sets forth a framework to unify these perspectives under the umbrella of AI risk management.

The ongoing advancement in LLMs has raised alarms about potential risks accompanying their deployment. With these models assuming increasingly complex tasks, the repercussions of potential misuse amplify, mandating a more sophisticated risk management approach. Governments are responding by formulating standards and policies, positioning the integrity of AI technologies as a priority. The paper posits that the definitions of safety and security often lack consensus, which hinders comprehensive risk management strategies.

A notable contribution of the paper is the proposed reference framework that elucidates the distinctions and interactions between AI safety and security. It does so across several dimensions:

Objectives of Protection: The paper delineates that safety pertains to minimizing harm emanating from AI systems, whereas security focuses on shielding these systems from hostile interference. It provides examples where security failures lead to unintended safety breaches, highlighting the intertwined nature of the two domains.
Threat Models: Distinctions are drawn between adversarial and non-adversarial threat scenarios. Security typically deals with adversarial threats, whereas safety often entails accidental or unintended harm. The paper argues for expanding this view, recognizing that adversarial threats also implicate safety concerns, especially with the increasing risks posed by more robust AI systems.
Problem Framing: While safety relies on probabilistic risk assessments based on stable scenarios, security must contend with dynamic adversarial threats, requiring worst-case scenario planning.
Governance and Liability: The paper underscores that different governance structures have historically managed safety and security, as seen in sectors like nuclear energy and aviation. The distinction in AI governance and funding further exemplifies the asymmetry, advocating for a balanced perspective that equally values both safety and security considerations.

From an academic standpoint, the paper ventures into the historic separations in research focuses between AI safety and AI security communities. It lists representative problems tackled by each community, calling for a broader integration. This comprehensive scrutiny within each domain helps in identifying gaps and synergies that might otherwise remain invisible in isolated studies.

The implications of this research are substantial. Practically, this integrated framework could aid organizations and policymakers in developing standardized protocols that address both safety and security. The potential for oversight in one dimension because of exclusions in another is an urgent issue, particularly as AI systems become ever more pervasive.

In conclusion, the paper posits that a nuanced understanding and adoption of both AI safety and AI security foundations is critical for advancing AI risk management practices. As AI technologies continue to evolve and play a larger role in various sectors, bridging these two perspectives will not only enhance the overall robustness of AI systems but also position stakeholders to better manage the complexities of future AI developments, ensuring a more secure and reliable future for AI technology deployment.

PDF Markdown

Related Papers

Tweets

https://twitter.com/maksym_andr/status/1842886500284846214

https://twitter.com/pinyuchenTW/status/1801426931167961167

https://twitter.com/xiangyuqi_pton/status/1890468377002471907

https://twitter.com/ohlennart/status/1809600226627313800

https://twitter.com/pandeyparul/status/1848901425461645514

https://twitter.com/kevinlwei/status/1808868071899287864