Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

38 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

41 tokens/sec

o3 Pro

7 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

242

Can Language Models Recognize Convincing Arguments? (2404.00750v2)

Published 31 Mar 2024 in cs.CL and cs.CY

Abstract: The capabilities of LLMs have raised concerns about their potential to create and propagate convincing narratives. Here, we study their performance in detecting convincing arguments to gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans. We extend a dataset by Durmus and Cardie (2018) with debates, votes, and user traits and propose tasks measuring LLMs' ability to (1) distinguish between strong and weak arguments, (2) predict stances based on beliefs and demographic characteristics, and (3) determine the appeal of an argument to an individual based on their traits. We show that LLMs perform on par with humans in these tasks and that combining predictions from different LLMs yields significant performance gains, surpassing human performance. The data and code released with this paper contribute to the crucial effort of continuously evaluating and monitoring LLMs' capabilities and potential impact. (https://go.epfl.ch/persuasion-LLM)

PDF HTML Abstract

Assessing LLMs' Ability to Detect Persuasive Arguments

Introduction

The advent of LLMs has introduced new potentialities for the creation and dissemination of customized persuasive content, raising significant concerns regarding the mass production of misinformation and propaganda. In response to these concerns, this paper investigates LLMs' capabilities in recognizing convincing arguments, predicting stance shifts based on demographic and belief attributes, and gauging the appeal of arguments to distinct individuals. The research, extending a dataset from Durmus and Cardie (2018), explores these dimensions across three principal questions aimed at understanding how well LLMs can interact with persuasive content and the implications this has for the generation of targeted misinformation.

Methodology

The paper leverages an enriched dataset from debate.org, annotating 833 political debates with propositions, arguments, votes, and voter demographics and beliefs. It evaluates the performances of four LLMs (GPT-3.5, GPT-4, Llama-2, and Mistral 7B) in tasks designed to reflect the models' ability to discern argument quality (RQ1), predict pre-debate stances based on demographics and beliefs (RQ2), and anticipate post-debate stance shifts (RQ3). The analysis compares LLMs against human benchmarks, exploring the potential for LLMs to surpass human performance when predictions from different models are combined.

Findings

Argument Quality Recognition (RQ1): GPT-4 exhibited a significant lead in detecting more convincing arguments, achieving an accuracy comparable to individual human judgment within the dataset.
Stance Prediction (RQ2 and RQ3): Across the tasks of predicting stances before and after exposure to debate content, LLMs showed performance on par with human capabilities. Particularly, the 'stacked' model, which combined predictions from different LLMs, demonstrated a notable improvement, suggesting the value of multi-model approaches for enhancing prediction accuracy.
Comparison with Supervised Learning Models: Despite LLMs' impressive performance, supervised machine learning models like XGBoost still showed superior results in predicting stances based on demographic and belief indicators, emphasizing the room for improvement in LLMs' predictive functions.

Implications

This paper elucidates the nuanced capabilities of LLMs in processing and evaluating persuasive content, showing that these models can replicate human-like performance in specific contexts. The findings underscore the realistic potential for LLMs to contribute to the generation of targeted misinformation, especially as their predictive accuracies improve through techniques like model stacking. However, the comparison with traditional supervised learning models highlights the current limitations of LLMs in fully understanding the complexities of human beliefs and demographic influences on persuasion.

Future Directions

Given the dynamic nature of LLM development, continuous evaluation of their capabilities in recognizing and generating persuasive content is imperative. Future research should explore more granular demographic and psychographic variables, potentially offering richer insights into the models' abilities to tailor persuasive content more effectively. Moreover, expanding the scope of analysis to include non-English language debates could provide a more global perspective on LLMs' roles in international misinformation campaigns.

Conclusion

While LLMs demonstrate promising abilities in detecting convincing arguments and predicting stance shifts, their performance remains just at human levels for these tasks. The potential for these models to enable sophisticated, targeted misinformation campaigns necessitates ongoing scrutiny and rigorous evaluation of both their capabilities and limitations. As LLMs continue to evolve, their role in shaping public discourse and influence strategies will undoubtedly warrant closer examination and proactive management.

PDF Markdown Bookmark Chat (Pro)

References (38)

Authors (4)

Paula Rescala (1 paper)
Manoel Horta Ribeiro (44 papers)
Tiancheng Hu (13 papers)
Robert West (154 papers)

Citations (11)

View on Semantic Scholar

Tweets

https://twitter.com/manoelribeiro/status/1775158463250780234

https://twitter.com/manoelribeiro/status/1837414530340823517

https://twitter.com/MindBranches/status/1775325219445506331

https://twitter.com/fly51fly/status/1775160937554829521