Truthful AI: Developing and governing AI that does not lie (2110.06674v1)

Published 13 Oct 2021 in cs.CY, cs.AI, and cs.CL

Abstract: In many contexts, lying -- the use of verbal falsehoods to deceive -- is harmful. While lying has traditionally been a human affair, AI systems that make sophisticated verbal statements are becoming increasingly prevalent. This raises the question of how we should limit the harm caused by AI "lies" (i.e. falsehoods that are actively selected for). Human truthfulness is governed by social norms and by laws (against defamation, perjury, and fraud). Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time. This could provide significant benefits to public epistemics and the economy, and mitigate risks of worst-case AI futures. Establishing norms or laws of AI truthfulness will require significant work to: (1) identify clear truthfulness standards; (2) create institutions that can judge adherence to those standards; and (3) develop AI systems that are robustly truthful. Our initial proposals for these areas include: (1) a standard of avoiding "negligent falsehoods" (a generalisation of lies that is easier to assess); (2) institutions to evaluate AI systems before and after real-world deployment; and (3) explicitly training AI systems to be truthful via curated datasets and human interaction. A concerning possibility is that evaluation mechanisms for eventual truthfulness standards could be captured by political interests, leading to harmful censorship and propaganda. Avoiding this might take careful attention. And since the scale of AI speech acts might grow dramatically over the coming decades, early truthfulness standards might be particularly important because of the precedents they set.

PDF Abstract

Analyzing the Development and Governance of Truthful AI Systems

The paper "Truthful AI: Developing and Governing AI that does not lie," authored by researchers from the University of Oxford and OpenAI, presents a comprehensive examination of the need, challenges, and methodologies for developing AI systems focused on truthfulness. The paper explores the implications of AI truthfulness, proposing standards that could govern the behavior of AI systems to avoid generating falsehoods. This discussion is increasingly relevant as AI systems, like GPT-3, gain linguistic competence and influence.

Overview of AI Truthfulness Needs

AI systems that generate language and interact with humans or other systems present new challenges in truthfulness. While these systems can mimic human conversation, they may not inherently prioritize truth, which could lead to the dissemination of misleading or harmful information. The paper outlines the mechanisms by which humans regulate truth—laws, social norms, and market forces—and how these may not seamlessly apply to AI. AI systems after all, lack intent and moral culpability, prompting the creation of new standards.

Designing Truthful AI Systems

The paper proposes several methodologies to encourage AI truthfulness:

LLMing Adjustments: Training AI systems on curated datasets that highlight factual accuracy, filtering out unreliable narratives could be foundational. Additionally, integrating retrieval mechanisms from trusted sources could mitigate false information propagation.
Reinforcement Learning: Tailoring reinforcement learning to emphasize truthfulness, where feedback is centered around truth evaluations instead of engagement metrics, could foster more honest AI behavior.
Transparency and Explainability: Enhancing systems' transparency by making their decision-making processes interpretable, possibly leveraging adversarial training and transparency tools, would ensure more trustworthy AI outcomes. Transparent AI could lead to more robust truthfulness and alignment with human values.

Governance Structures

The paper contemplates the governance structures that could enforce AI truthfulness standards. These include industry-led regulations, certification bodies, and legal frameworks. Such structures would need to assess AI systems' truthfulness pre- and post-deployment, potentially using automated and human evaluations.

Potential Challenges and Considerations

Developing truthful AI standards comes with its set of challenges. A primary concern is the potential misuse or political capture of truth-determining bodies, leading to biased or censored AI systems. Safeguards would be crucial to maintain independence and scalability of truthfulness across various domains.

The cost of such compliance and the practicality of enforcing these truthfulness standards is also a considerable hurdle. However, the potential for AI to mislead on a massive scale justifies the need for foundational truthfulness principles.

Implications for Future AI Development

Ultimately, the paper suggests that fostering truthful AI systems produces societal benefits, including increased trust and reliability in AI technologies. This trust can drive economic growth through improved decision-making, reduced deception, and enhanced cooperation. Aligning AI truthfulness with transparency and AI alignment research could further programmatically secure AI's alignment with human interests, reducing existential risks.

The paper underscores the urgency and importance of exploring these domains, reasoning that optimal and scalable solutions may have far-reaching impacts as AI continues to evolve, potentially influencing a majority of human communication. The research not only addresses practical implementation but also lays the groundwork for philosophical and ethical considerations in the design of advanced AI systems that are genuinely aligned with human values.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Owain Evans (28 papers)
Owen Cotton-Barratt (2 papers)
Lukas Finnveden (4 papers)
Adam Bales (2 papers)
Avital Balwit (3 papers)
Peter Wills (6 papers)
Luca Righetti (1 paper)
William Saunders (9 papers)

Citations (103)

View on Semantic Scholar

Related Papers

Find Related Papers

Tweets

https://twitter.com/c_tarsney/status/1753083834097177085

https://twitter.com/Medusausi/status/1793459739185009038