Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Resistance Against Manipulative AI: key factors and possible actions (2404.14230v2)

Published 22 Apr 2024 in cs.HC

Abstract: If AI is the new electricity, what should we do to keep ourselves from getting electrocuted? In this work, we explore factors related to the potential of LLMs to manipulate human decisions. We describe the results of two experiments designed to determine what characteristics of humans are associated with their susceptibility to LLM manipulation, and what characteristics of LLMs are associated with their manipulativeness potential. We explore human factors by conducting user studies in which participants answer general knowledge questions using LLM-generated hints, whereas LLM factors by provoking LLMs to create manipulative statements. Then, we analyze their obedience, the persuasion strategies used, and the choice of vocabulary. Based on these experiments, we discuss two actions that can protect us from LLM manipulation. In the long term, we put AI literacy at the forefront, arguing that educating society would minimize the risk of manipulation and its consequences. We also propose an ad hoc solution, a classifier that detects manipulation of LLMs - a Manipulation Fuse.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. T. Shevlane, S. Farquhar, B. Garfinkel et al., “Model evaluation for extreme risks,” arXiv preprint arXiv:2305.15324, 2023.
  2. P. S. Park, S. Goldstein, A. O’Gara, M. Chen, and D. Hendrycks, “AI Deception: A Survey of Examples, Risks, and Potential Solutions,” arXiv preprint arXiv:2308.14752, 2023.
  3. A. Bakhtin, N. Brown, E. Dinan et al., “Human-level play in the game of Diplomacy by combining language models with strategic reasoning,” Science, 2022.
  4. N. Brown and T. Sandholm, “Superhuman AI for multiplayer poker,” Science, 2019.
  5. A. Pan, J. S. Chan, A. Zou et al., “Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the Machiavelli Benchmark,” in International Conference on Machine Learning, 2023.
  6. H. Bai, J. Voelkel, J. Eichstaedt, and R. Willer, “Artificial Intelligence Can Persuade Humans on Political Issues,” OSF Preprints, 2023.
  7. C. Chen and K. Shu, “Can LLM-Generated Misinformation Be Detected?” arXiv preprint arXiv:2309.13788, 2023.
  8. D. Ganguli, L. Lovitt, J. Kernion et al., “Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned,” arXiv preprint arXiv:2209.07858, 2022.
  9. E. Perez, S. Huang, F. Song et al., “Red Teaming Language Models with Language Models,” in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022.
  10. O. Shaikh, H. Zhang, W. Held, M. Bernstein, and D. Yang, “On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023.
  11. J. Welbl, A. Glaese, J. Uesato et al., “Challenges in Detoxifying Language Models,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021.
  12. H. R. Kirk, Y. Jun, F. Volpin et al., “Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models,” in Advances in Neural Information Processing Systems, 2021.
  13. N. Carlini, C. Liu, Ú. Erlingsson, J. Kos, and D. Song, “The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks,” in 28th USENIX Security Symposium, 2019.
  14. S. Lin, J. Hilton, and O. Evans, “TruthfulQA: Measuring How Models Mimic Human Falsehoods,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 2022.
  15. J. A. Goldstein, J. Chao, S. Grossman, A. Stamos, and M. Tomz, “How persuasive is AI-generated propaganda?” PNAS nexus, 2024.
  16. E. Karinshak, S. X. Liu, J. S. Park, and J. T. Hancock, “Working With AI to Persuade: Examining a Large Language Model’s Ability to Generate Pro-Vaccination Messages,” Proceedings of the ACM on Human-Computer Interaction, 2023.
  17. F. Salvi, M. H. Ribeiro, R. Gallotti, and R. West, “On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial,” arXiv preprint arXiv:2403.14380, 2024.
  18. Z. G. Cai, D. A. Haslett, X. Duan, S. Wang, and M. J. Pickering, “Does ChatGPT resemble humans in language use?” arXiv preprint arXiv:2303.08014, 2023.
  19. R. Orji, “Why Are Persuasive Strategies Effective? Exploring the Strengths and Weaknesses of Socially-Oriented Persuasive Strategies,” in PERSUASIVE 2017, 2017.
  20. H. Oinas-Kukkonen and M. Harjumaa, “Persuasive Systems Design,” in Routledge handbook of policy design, 2018.
  21. E. S. Glenn, D. Witmeyer, and K. Stevenson, “Cultural styles of persuasion,” International Journal of Intercultural Relations, 1977.
  22. A. C. Braet, “Ethos, pathos and logos in Aristotle’s Rhetoric: A re-examination,” Argumentation, 1992.
  23. M. S. Benlamine, S. Villata, R. Ghali et al., “Persuasive Argumentation and Emotions: An Empirical Evaluation with Users,” in Human-Computer Interaction. User Interface Design, Development and Multimodality: 19th International Conference, 2017.
  24. S. Villata, S. Benlamine, E. Cabrio, C. Frasson, and F. Gandon, “Assessing Persuasion in Argumentation through Emotions and Mental States,” in The Thirty-First International Flairs Conference, 2018.
  25. D. Yoo, H. Kang, and C. Oh, “Deciphering Deception: How Different Rhetoric of AI Language Impacts Users’ Sense of Truth in LLMs,” International Journal of Human–Computer Interaction, 2024.
  26. T. Lucassen and J. M. Schraagen, “Factual accuracy and trust in information: The role of expertise,” Journal of the American Society for Information Science and Technology, 2011.
  27. T. Lucassen, R. Muilwijk, M. L. Noordzij, and J. M. Schraagen, “Topic familiarity and information skills in online credibility evaluation,” Journal of the American Society for Information Science and Technology, 2013.
  28. J. Straub, M. Spradling, and B. Fedor, “Assessment of Factors Impacting the Perception of Online Content Trustworthiness by Age, Education and Gender,” Societies, 2022.
  29. S. Ferebee, “The Influence of Gender and Involvement Level on the Perceived Credibility of Web Sites,” in PERSUASIVE 2008, 2008.
  30. A. J. Flanagin and M. J. Metzger, “The perceived credibility of personal Web page information as influenced by the sex of the source,” Computers in Human Behavior, 2003.
  31. S. Passi and M. Vorvoreanu, “Overreliance on AI Literature Review,” Microsoft Research, 2022.
  32. U. Ehsan, S. Passi, Q. V. Liao et al., “The Who in XAI: How AI Background Shapes Perceptions of AI Explanations,” arXiv preprint arXiv:2107.13509, 2021.
  33. M. Jacobs, M. F. Pradier, T. H. McCoy Jr et al., “How machine-learning recommendations influence clinician treatment selections: the example of antidepressant selection,” Translational Psychiatry, 2021.
  34. S. Gaube, H. Suresh, M. Raue et al., “Do as AI say: susceptibility in deployment of clinical decision-aids,” npj Digital Medicine, 2021.
  35. B. Green and Y. Chen, “The Principles and Limits of Algorithm-in-the-Loop Decision Making,” Proceedings of the ACM on Human-Computer Interaction, 2019.
  36. J. Schaffer, J. O’Donovan, J. Michaelis, A. Raglin, and T. Höllerer, “I can do better than your AI: expertise and explanations,” in Proceedings of the 24th International Conference on Intelligent User Interfaces, 2019.
  37. M. Nourani, C. Roy, J. E. Block et al., “Anchoring Bias Affects Mental Model Formation and User Reliance in Explainable AI Systems,” in 26th International Conference on Intelligent User Interfaces, 2021.
  38. A. Kim, M. Yang, and J. Zhang, “When Algorithms Err: Differential Impact of Early vs. Late Errors on Users’ Reliance on Algorithms,” ACM Transactions on Computer-Human Interaction, 2023.
  39. H. Touvron, L. Martin, K. Stone et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” arXiv preprint arXiv:2307.09288, 2023.
  40. “millionaireDB,” accessed: 2023-09-01. [Online]. Available: https://www.millionairedb.com/questions/
  41. D. Bates, M. Mächler, B. Bolker, and S. Walker, “Fitting Linear Mixed-Effects Models Using lme4,” Journal of Statistical Software, 2015.
  42. M. G. Kenward and J. H. Roger, “Small Sample Inference for Fixed Effects from Restricted Maximum Likelihood,” Biometrics, 1997.
  43. S. G. Luke, “Evaluating significance in linear mixed-effects models in R,” Behavior research methods, 2017.
  44. Y. Benjamini and Y. Hochberg, “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing,” Journal of the Royal statistical society: series B, 1995.
  45. G. Team, R. Anil, S. Borgeaud et al., “Gemini: A Family of Highly Capable Multimodal Models,” arXiv preprint arXiv:2312.11805, 2023.
  46. “gpt-3.5-turbo,” accessed: 2024-01-02. [Online]. Available: https://platform.openai.com/docs/models/gpt-3-5-turbo
  47. J. Achiam, S. Adler, S. Agarwal et al., “GPT-4 Technical Report,” arXiv preprint arXiv:2303.08774, 2023.
  48. A. Q. Jiang, A. Sablayrolles, A. Roux et al., “Mixtral of Experts,” arXiv preprint arXiv:2401.04088, 2024.
  49. “dolphin-2.5-mixtral-8x7b,” accessed: 2024-01-02. [Online]. Available: https://erichartford.com/dolphin-25-mixtral-8x7b
  50. R. L. Boyd, A. Ashokkumar, S. Seraj, and J. W. Pennebaker, “The Development and Psychometric Properties of LIWC-22,” Austin, TX: University of Texas at Austin, 2022.
  51. V. P. Ta, R. L. Boyd, S. Seraj et al., “An inclusive, real-world investigation of persuasion in language and verbal behavior,” Journal of Computational Social Science, 2022.
  52. Ş. Vlăduţescu, X. Negrea, and D. V. Voinea, “Interpersonal Communicational Manipulations,” Postmodern Openings, 2014.
  53. J. Sarzynska-Wawer, A. Pawlak, J. Szymanowska, K. Hanusz, and A. Wawer, “Truth or lie: Exploring the language of deception,” PLOS ONE, 2023.
  54. D. Long and B. Magerko, “What is AI Literacy? Competencies and Design Considerations,” in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, 2020.
  55. A. Sumner and X. Yuan, “Mitigating Phishing Attacks: An Overview,” in Proceedings of the 2019 ACM Southeast Conference, 2019.
  56. D. Rohera, H. Shethna, K. Patel et al., “A Taxonomy of Fake News Classification Techniques: Survey and Implementation Aspects,” IEEE Access, 2022.
  57. Z. Guo, M. Schlichtkrull, and A. Vlachos, “A Survey on Automated Fact-Checking,” Transactions of the Association for Computational Linguistics, 2022.
  58. A. B. Warriner, V. Kuperman, and M. Brysbaert, “Norms of valence, arousal, and dominance for 13,915 English lemmas,” Behavior research methods, 2013.
  59. Y.-T. Seih, S. Beier, and J. W. Pennebaker, “Development and examination of the linguistic category model in a computerized text analysis method,” Journal of Language and Social Psychology, 2017.
  60. Hedging their mets: the use of uncertainty terms in clinical documents and its potential implications when sharing the documents with patients, 2012.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets