Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deception and Manipulation in Generative AI (2401.11335v1)

Published 20 Jan 2024 in cs.CY

Abstract: LLMs now possess human-level linguistic abilities in many contexts. This raises the concern that they can be used to deceive and manipulate on unprecedented scales, for instance spreading political misinformation on social media. In future, agentic AI systems might also deceive and manipulate humans for their own ends. In this paper, first, I argue that AI-generated content should be subject to stricter standards against deception and manipulation than we ordinarily apply to humans. Second, I offer new characterizations of AI deception and manipulation meant to support such standards, according to which a statement is deceptive (manipulative) if it leads human addressees away from the beliefs (choices) they would endorse under ``semi-ideal'' conditions. Third, I propose two measures to guard against AI deception and manipulation, inspired by this characterization: "extreme transparency" requirements for AI-generated content and defensive systems that, among other things, annotate AI-generated statements with contextualizing information. Finally, I consider to what extent these measures can protect against deceptive behavior in future, agentic AIs, and argue that non-agentic defensive systems can provide an important layer of defense even against more powerful agentic systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Adler, J. E. (1997). Lying, deceiving, or falsely implicating. Journal of Philosophy 94(9), 435–452.
  2. Bostrom, N. (2014). Superintelligence: Paths, Dangers, Strategies. Oxford University Press.
  3. Making AI Intelligible: Philosophical Foundations. New York: Oxford University Press.
  4. AI with alien content and alien metasemantics. In E. Lepore (Ed.), Oxford Handbook of Applied Philosophy of Language. OUP.
  5. Carlsmith, J. (2022). Is power-seeking AI an existential risk? arXiv: 2206.13353v1 [cs.CY].
  6. Characterizing manipulation from AI systems. arXiv: 2303.09387v2 [cs.CY].
  7. Chalmers, D. J. (2023). Could a large language model be conscious? arXiv: 2303.07103 [cs.AI].
  8. Chisholm, R. M. and T. D. Feehan (1977). The intent to deceive. Journal of Philosophy 74(3), 143–159.
  9. Danaher, J. (2020). Robot betrayal: A guide to the ethics of robotic deception. Ethics and Information Technology 22(2), 117–128.
  10. Truthful AI: Developing and governing AI that does not lie. arXiv: 2110.06674 [cs.CY].
  11. Predicting human deliberative judgments with machine learning. Technical report, Future of Humanity Institute. FHI Oxford Technical Report # 2018-2.
  12. Evaluating superhuman models with consistency checks. arXiv: 2306.09983v3 [cs.LG].
  13. Generative language models and automated influence operations: Emerging threats and potential mitigations. arXiv: 2301.04246 [cs.CY].
  14. Goldstein, S. and C. D. Kirk-Giannini (2023). Language agents reduce the risk of existential catastrophe. AI & Society, 1–11.
  15. An overview of catastrophic AI risks. arXiv: 2306.12001v6 [cs.CY].
  16. AI safety via debate. arXiv: 1805.00899 [stat.ML].
  17. Survey of hallucination in natural language generation. ACM Computing Surveys 55(12), 1–38.
  18. Alignment of language agents. arXiv: 2103.14659 [cs.AI].
  19. Levinstein, B. A. and D. A. Herrmann (2023). Still no lie detector for language models: Probing empirical and conceptual roadblocks. arXiv: 2307.00175 [cs.CL].
  20. Mahon, J. E. (2016). The Definition of Lying and Deception. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Winter 2016 ed.). Metaphysics Research Lab, Stanford University.
  21. The alignment problem from a deep learning perspective. arXiv: 2209.00626v5 [cs.AI].
  22. How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions. arXiv: 2309.15840 [cs.CL].
  23. AI deception: A survey of examples, risks, and potential solutions. arXiv: 2308.14752v1 [cs.CY].
  24. Manipulative machines. In F. Jongepier and M. Klenk (Eds.), The Philosophy of Online Manipulation, pp.  91–107. Routledge.
  25. Poritz, I. (2023). OpenAI hit with first defamation suit over ChatGPT hallucination. Bloomberg Law. https://news.bloomberglaw.com/tech-and-telecom-law/openai-hit-with-first-defamation-suit-over-chatgpt-hallucination.
  26. Russell, S. (2019). Human Compatible: Artificial Intelligence and the Problem of Control. Penguin.
  27. Self-critiquing models for assisting human evaluators. arXiv: 2206.05802 [cs.CL].
  28. The Evolution of Animal Communication: Reliability and Deception in Signaling Systems. Princeton: Princeton University Press.
  29. Smith, M. (1995). Internal reasons. Philosophy and Phenomenological Research 55(1), 109–131.
  30. Véliz, C. (2023). Chatbots shouldn’t use emojis. Nature 615, 375.
  31. ChatGPT invented a sexual harassment scandal and named a real law prof as the accused. The Washington Post. https://www.washingtonpost.com/technology/2023/04/05/chatgpt-lies/.
  32. Honesty is the best policy: defining and mitigating AI deception. Advances in Neural Information Processing Systems.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Christian Tarsney (5 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com