2000 character limit reached
Understanding and Avoiding AI Failures: A Practical Guide (2104.12582v4)
Published 22 Apr 2021 in cs.CY and cs.AI
Abstract: As AI technologies increase in capability and ubiquity, AI accidents are becoming more common. Based on normal accident theory, high reliability theory, and open systems theory, we create a framework for understanding the risks associated with AI applications. In addition, we also use AI safety principles to quantify the unique risks of increased intelligence and human-like qualities in AI. Together, these two fields give a more complete picture of the risks of contemporary AI. By focusing on system properties near accidents instead of seeking a root cause of accidents, we identify where attention should be paid to safety for current generation AI systems.
- Concrete problems in AI Safety. ArXiv, abs/1606.06565, 2016.
- Brian D. O. Anderson. Failures of adaptive control theory and their resolution. Commun. Inf. Syst., 5(1):1–20, 2005.
- Anonymous. HAL 9000. 2001: A Space Odyssey Wiki, 2021.
- The agi containment problem. In The AGI Containment Problem, 04 2016.
- C. Badea and Gregory Artus. Morality, machines and the interpretation problem: A value-based, wittgensteinian approach to building moral agents. ArXiv, abs/2103.02728, 2021.
- Lisanne Bainbridge. Ironies of automation. Automatica, 19(6):775–779, 1983.
- Damon Beres. Microsoft chat bot goes on racist, genocidal Twitter rampage. Huffington Post, 2016.
- Nick Bostrom. Existential risks - analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology, 9, 11 2001.
- Nick Bostrom. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Inc., USA, 1st edition, 2014.
- The ethics of artificial intelligence, page 316–334. Cambridge University Press, 2014.
- Language models are few-shot learners, 2020.
- Stephanie Carvin. Normal autonomous accidents: What happens when killer robots fail? Carleton University, 2017.
- Back to basics: Benchmarking canonical evolution strategies for playing atari. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 1419–1426. International Joint Conferences on Artificial Intelligence Organization, 7 2018.
- B. Christian. The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive. Knopf Doubleday Publishing Group, 2011.
- Richard I. Cook. How complex systems fail. Cognitive Technologies Labratory, 1998.
- Explainable AI for system failures: Generating explanations that improve human assistance in fault recovery, 2020.
- Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
- Safe exploration of state and action spaces in reinforcement learning. CoRR, abs/1402.0560, 2014.
- Michael S. Gazzaniga. The split brain revisited. Scientific American, 279(1):50–55, 1998.
- Inverse reward design, 2020.
- Cooperative inverse reinforcement learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
- Global AI ethics: A review of the social impacts and ethical implications of artificial intelligence, 2019.
- J. Haidt. The Righteous Mind: Why Good People Are Divided by Politics and Religion. Knopf Doubleday Publishing Group, 2012.
- Isobel Asher Hamilton. An AI tool which reconstructed a pixelated picture of Barack Obama to look like a white man perfectly illustrates racial bias in algorithms. Business Insider, 2020.
- Douglas R. Hofstadter. The calculus of cooperation is tested through a lottery. Scientific American, 248(6), June 1983.
- Standardizing measurements of autonomy in the artificially intelligent. National Institute of Standards and Technology, pages 70–75, 08 2007.
- Adversarial examples are not bugs, they are features. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- AI safety needs social scientists. Distill, 4(2):e14, 2019.
- The rise of artificial intelligence under the lens of sustainability. Technologies, 6(4), 2018.
- The flash crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3):967–998, 2017.
- Todd R. La Porte. A strawman speaks up: Comments on the limits of safety. Journal of Contingencies and Crisis Management, 2(4):207–211, 1994.
- Joel Lehman et al. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial Life, 26(2):274–306, 2020. PMID: 32271631.
- AI safety gridworlds, 2017.
- Anat Lior. The AI accident network: Artificial intelligence liability meets network theory. Social Science Research Network, 95 TUL. L. REV. (2020-2021), 2020.
- TWI Ltd. What are technology readiness levels (trl)? https://www.twi-global.com/technical-knowledge/faqs/technology-readiness-levels, 2021. [Online; accessed 2-April-2021].
- Towards deep learning models resistant to adversarial attacks, 2019.
- Justin McCurry. South Korean woman’s hair ‘eaten’ by robot vacuum cleaner as she slept. The Guardian, 2015.
- AI incident database. https://incidentdatabase.ai/, 2021. [Online; accessed 1-April-2021].
- Towards accountable AI: Hybrid human-machine analyses for characterizing system failure. In HCOMP 2018. AAAI, July 2018.
- Social dilemma behavior of individuals from highly individualist and collectivist cultures. Journal of Conflict Resolution, 38(4):708–718, 1994.
- Curiosity-driven exploration by self-supervised prediction. CoRR, abs/1705.05363, 2017.
- Charles Perrow. Normal Accidents: Living with High Risk Technologies. Basic Books, 1984.
- Charles Perrow. Complexity, Coupling, and Catastrophe, pages 62–100. Princeton University Press, rev - revised edition, 1999.
- Jens Rasmussen. Risk management in a dynamic society: a modelling problem. Safety Science, 27(2):183–213, 1997.
- J. Reuben. A survey on virtual machine security. In A Survey on Virtual Machine Security, 2007.
- Anthropomorphism in AI. AJOB Neuroscience, 11(2):88–95, 2020. PMID: 32228388.
- Classification schemas for artificial intelligence failures. Delphi - Interdisciplinary Review of Emerging Technologies, 2(4), 2020.
- Normal accident theory versus high reliability theory: A resolution and call for an open systems view of accidents. Human Relations, 62(9):1357–1390, 2009.
- S.A. Snook. Friendly Fire. Princeton, NJ: Princeton University Press, 2000.
- Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda, pages 103–125. Machine Intelligence Research Institute, 05 2017.
- One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828–841, 2019.
- Alignment for advanced machine learning systems, 2020.
- Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures. CoRR, abs/1812.01647, 2018.
- Diane Vaughan. On Slippery Slopes, Repeating Negative Patterns, and Learning from Mistake?, chapter 2, pages 41–59. Blackwell Publishing, 2007.
- Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- Organizing for high reliability: Processes of collective mindfulness. In Organizing for high reliability: Processes of collective mindfulness., 1999.
- Maya Wilkinson. How the technology of iRobot Roomba self cleaning robot vacuum works. All About The Self Cleaning Bots, 2015.
- Roman V Yampolskiy. Leakproofing the singularity artificial intelligence confinement problem. In Leakproofing the Singularity Artificial Intelligence Confinement Problem, 2012.
- Roman V. Yampolskiy. Predicting future AI failures from historic examples. foresight, 21(1):138–152, January 2019.
- Eliezer Yudkowsky. Hard takeoff. LessWrong, 2008.
- Eliezer Yudkowsky. Complex value systems in friendly AI. In International Conference on Artificial General Intelligence, pages 388–393. Springer, 2011.
- Heather M. Williams (1 paper)
- Roman V. Yampolskiy (32 papers)