Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding and Avoiding AI Failures: A Practical Guide (2104.12582v4)

Published 22 Apr 2021 in cs.CY and cs.AI

Abstract: As AI technologies increase in capability and ubiquity, AI accidents are becoming more common. Based on normal accident theory, high reliability theory, and open systems theory, we create a framework for understanding the risks associated with AI applications. In addition, we also use AI safety principles to quantify the unique risks of increased intelligence and human-like qualities in AI. Together, these two fields give a more complete picture of the risks of contemporary AI. By focusing on system properties near accidents instead of seeking a root cause of accidents, we identify where attention should be paid to safety for current generation AI systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Concrete problems in AI Safety. ArXiv, abs/1606.06565, 2016.
  2. Brian D. O. Anderson. Failures of adaptive control theory and their resolution. Commun. Inf. Syst., 5(1):1–20, 2005.
  3. Anonymous. HAL 9000. 2001: A Space Odyssey Wiki, 2021.
  4. The agi containment problem. In The AGI Containment Problem, 04 2016.
  5. C. Badea and Gregory Artus. Morality, machines and the interpretation problem: A value-based, wittgensteinian approach to building moral agents. ArXiv, abs/2103.02728, 2021.
  6. Lisanne Bainbridge. Ironies of automation. Automatica, 19(6):775–779, 1983.
  7. Damon Beres. Microsoft chat bot goes on racist, genocidal Twitter rampage. Huffington Post, 2016.
  8. Nick Bostrom. Existential risks - analyzing human extinction scenarios and related hazards. Journal of Evolution and Technology, 9, 11 2001.
  9. Nick Bostrom. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Inc., USA, 1st edition, 2014.
  10. The ethics of artificial intelligence, page 316–334. Cambridge University Press, 2014.
  11. Language models are few-shot learners, 2020.
  12. Stephanie Carvin. Normal autonomous accidents: What happens when killer robots fail? Carleton University, 2017.
  13. Back to basics: Benchmarking canonical evolution strategies for playing atari. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 1419–1426. International Joint Conferences on Artificial Intelligence Organization, 7 2018.
  14. B. Christian. The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive. Knopf Doubleday Publishing Group, 2011.
  15. Richard I. Cook. How complex systems fail. Cognitive Technologies Labratory, 1998.
  16. Explainable AI for system failures: Generating explanations that improve human assistance in fault recovery, 2020.
  17. Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
  18. Safe exploration of state and action spaces in reinforcement learning. CoRR, abs/1402.0560, 2014.
  19. Michael S. Gazzaniga. The split brain revisited. Scientific American, 279(1):50–55, 1998.
  20. Inverse reward design, 2020.
  21. Cooperative inverse reinforcement learning. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016.
  22. Global AI ethics: A review of the social impacts and ethical implications of artificial intelligence, 2019.
  23. J. Haidt. The Righteous Mind: Why Good People Are Divided by Politics and Religion. Knopf Doubleday Publishing Group, 2012.
  24. Isobel Asher Hamilton. An AI tool which reconstructed a pixelated picture of Barack Obama to look like a white man perfectly illustrates racial bias in algorithms. Business Insider, 2020.
  25. Douglas R. Hofstadter. The calculus of cooperation is tested through a lottery. Scientific American, 248(6), June 1983.
  26. Standardizing measurements of autonomy in the artificially intelligent. National Institute of Standards and Technology, pages 70–75, 08 2007.
  27. Adversarial examples are not bugs, they are features. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  28. AI safety needs social scientists. Distill, 4(2):e14, 2019.
  29. The rise of artificial intelligence under the lens of sustainability. Technologies, 6(4), 2018.
  30. The flash crash: High-frequency trading in an electronic market. The Journal of Finance, 72(3):967–998, 2017.
  31. Todd R. La Porte. A strawman speaks up: Comments on the limits of safety. Journal of Contingencies and Crisis Management, 2(4):207–211, 1994.
  32. Joel Lehman et al. The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities. Artificial Life, 26(2):274–306, 2020. PMID: 32271631.
  33. AI safety gridworlds, 2017.
  34. Anat Lior. The AI accident network: Artificial intelligence liability meets network theory. Social Science Research Network, 95 TUL. L. REV. (2020-2021), 2020.
  35. TWI Ltd. What are technology readiness levels (trl)? https://www.twi-global.com/technical-knowledge/faqs/technology-readiness-levels, 2021. [Online; accessed 2-April-2021].
  36. Towards deep learning models resistant to adversarial attacks, 2019.
  37. Justin McCurry. South Korean woman’s hair ‘eaten’ by robot vacuum cleaner as she slept. The Guardian, 2015.
  38. AI incident database. https://incidentdatabase.ai/, 2021. [Online; accessed 1-April-2021].
  39. Towards accountable AI: Hybrid human-machine analyses for characterizing system failure. In HCOMP 2018. AAAI, July 2018.
  40. Social dilemma behavior of individuals from highly individualist and collectivist cultures. Journal of Conflict Resolution, 38(4):708–718, 1994.
  41. Curiosity-driven exploration by self-supervised prediction. CoRR, abs/1705.05363, 2017.
  42. Charles Perrow. Normal Accidents: Living with High Risk Technologies. Basic Books, 1984.
  43. Charles Perrow. Complexity, Coupling, and Catastrophe, pages 62–100. Princeton University Press, rev - revised edition, 1999.
  44. Jens Rasmussen. Risk management in a dynamic society: a modelling problem. Safety Science, 27(2):183–213, 1997.
  45. J. Reuben. A survey on virtual machine security. In A Survey on Virtual Machine Security, 2007.
  46. Anthropomorphism in AI. AJOB Neuroscience, 11(2):88–95, 2020. PMID: 32228388.
  47. Classification schemas for artificial intelligence failures. Delphi - Interdisciplinary Review of Emerging Technologies, 2(4), 2020.
  48. Normal accident theory versus high reliability theory: A resolution and call for an open systems view of accidents. Human Relations, 62(9):1357–1390, 2009.
  49. S.A. Snook. Friendly Fire. Princeton, NJ: Princeton University Press, 2000.
  50. Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda, pages 103–125. Machine Intelligence Research Institute, 05 2017.
  51. One pixel attack for fooling deep neural networks. IEEE Transactions on Evolutionary Computation, 23(5):828–841, 2019.
  52. Alignment for advanced machine learning systems, 2020.
  53. Rigorous agent evaluation: An adversarial approach to uncover catastrophic failures. CoRR, abs/1812.01647, 2018.
  54. Diane Vaughan. On Slippery Slopes, Repeating Negative Patterns, and Learning from Mistake?, chapter 2, pages 41–59. Blackwell Publishing, 2007.
  55. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
  56. Organizing for high reliability: Processes of collective mindfulness. In Organizing for high reliability: Processes of collective mindfulness., 1999.
  57. Maya Wilkinson. How the technology of iRobot Roomba self cleaning robot vacuum works. All About The Self Cleaning Bots, 2015.
  58. Roman V Yampolskiy. Leakproofing the singularity artificial intelligence confinement problem. In Leakproofing the Singularity Artificial Intelligence Confinement Problem, 2012.
  59. Roman V. Yampolskiy. Predicting future AI failures from historic examples. foresight, 21(1):138–152, January 2019.
  60. Eliezer Yudkowsky. Hard takeoff. LessWrong, 2008.
  61. Eliezer Yudkowsky. Complex value systems in friendly AI. In International Conference on Artificial General Intelligence, pages 388–393. Springer, 2011.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
Citations (21)