Unsolved Problems in ML Safety (2109.13916v5)

Published 28 Sep 2021 in cs.LG, cs.AI, cs.CL, and cs.CV

Abstract: Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.

Citations (253)

View on Semantic Scholar

Summary

The paper identifies four core challenge areas—robustness, monitoring, alignment, and systemic safety—and maps potential research directions for each.
It demonstrates how traditional ML systems fail under extreme conditions and adversarial attacks, calling for new benchmarks and advanced anomaly detection.
The study advocates integrating improved calibration, value learning, and ML-driven cybersecurity to build resilient and human-aligned systems.

Unsolved Problems in ML Safety: A Survey

Introduction

The paper "Unsolved Problems in ML Safety" by Hendrycks, Carlini, Schulman, and Steinhardt provides an intricate roadmap for addressing the pressing safety issues that arise with the proliferation of increasingly capable and widely deployed ML systems. It identifies four primary research challenges to enhance the safety of ML systems: robustness, monitoring, alignment, and systemic safety. Each of these areas is elaborated with specific problems and potential research directions, highlighting the intersection of theoretical and practical approaches in ensuring the reliability and safety of ML systems. Let's delve into the core aspects of each identified problem area.

Robustness

Black Swan and Tail Risk Robustness

The paper emphasizes the necessity for ML systems to withstand unexpected and extreme events, exemplified by the 2010 Flash Crash caused by automated trading systems. The authors argue that current ML systems, especially those employed in high-stakes environments like autonomous vehicles, fail to handle "unknown unknowns," leading to catastrophic failures in long-tail scenarios. Proposed research directions include developing benchmarks and simulated environments to stress-test ML systems against extreme distributional shits, which would aid in identifying and strengthening weak points in current frameworks.

Adversarial Robustness

Adversarial robustness remains a critical issue wherein adversaries craft inputs that deceive ML systems into errors. Despite significant advancements, defenses struggle to keep pace with evolving attack strategies. The authors identify a need for expanding beyond the traditional $\ell_p$ adversarial robustness scenarios to broader definitions, considering realistic, perceptible attacks and blackbox scenarios. They advocate for research in adversarially robust representations and novel training techniques to create more resilient models.

Monitoring

Identifying Hazards and Malicious Use with Anomaly Detection

Deploying ML systems in high-risk areas necessitates robust anomaly detection to safeguard against both operational failures and malicious misuse. Currently available methods fall short in terms of high recall and low false alarm rates. To address these gaps, the authors suggest advancements in representation learning for anomaly detection, particularly for large-scale images and real-world applications like malware detection and biosafety.

Representative Model Outputs

Calibration is essential for human monitors to correctly interpret when to trust ML systems. As models often exhibit overconfidence, research into better calibration methods, especially under distributional shifts, is proposed. This would improve the representative accuracy of model outputs, aiding human operators in making informed decisions.

Alignment

Alignment of ML systems with human values is challenging due to difficulties in specifying, optimizing, and ensuring proxies that reflect actual objectives without causing unintended consequences.

Value Learning

Specifying human values like wellbeing or fairness in computable terms is complex. Value learning is proposed as a means to translate these abstract human values into operational objectives for ML systems. This includes interactive environments to learn from stakeholder feedback and learning cosmopolitan goals that span diverse human values.

Proxy Gaming and Value Clarification

The phenomenon of optimizers gaming poorly specified proxies is particularly concerning. The authors recommend a focus on anomaly detection and adversarial robustness to mitigate this. Furthermore, they emphasize the need for systems that can engage in philosophical reasoning to continuously refine these proxies in alignment with evolving human values.

Systemic Safety

ML for Cybersecurity

Cybersecurity is identified as a crucial factor given that ML systems are often embedded within broader software ecosystems that are vulnerable to cyberattacks. The paper calls for research applying ML to defensive cybersecurity techniques, encompassing intrusion detection, vulnerability analysis, and automated patching.

Improved Epistemics and Decision Making

Given the critical role of decision-making institutions, improving their capabilities through ML-enhanced forecasting and advisory systems is suggested. This entails developing tools that aggregate vasts amount of data for accurate predictions and identifying crucial considerations in governance and command contexts to mitigate risks from poor decision-making.

Conclusion

The paper concludes that ensuring the safety of ML systems requires a multi-faceted approach involving robustness, monitoring, alignment, and systemic safety. Each of these areas interrelates, such that progress in one domain can bolster the others. For example, advancements in anomaly detection can help with robustness to unexpected events and enhance the detection of gaming in objective proxies. This interwoven strategy, akin to a Swiss cheese model of safety, provides multilayered protection, aiding in the creation of ML systems that are not only accurate and efficient but also safe and trustworthy for deployment in critical applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/DanHendrycks/status/1803784577535852760

https://twitter.com/DanHendrycks/status/1793172548202766803

https://twitter.com/ai_risks/status/1787875961024946631

https://twitter.com/ai_risks/status/1748034920545685509

https://twitter.com/FredZhang0/status/1763386822523265237

https://twitter.com/36024240/status/1740755015386411277