Large language models can consistently generate high-quality content for election disinformation operations (2408.06731v1)

Published 13 Aug 2024 in cs.CY, cs.AI, and cs.CL

Abstract: Advances in LLMs have raised concerns about their potential use in generating compelling election disinformation at scale. This study presents a two-part investigation into the capabilities of LLMs to automate stages of an election disinformation operation. First, we introduce DisElect, a novel evaluation dataset designed to measure LLM compliance with instructions to generate content for an election disinformation operation in localised UK context, containing 2,200 malicious prompts and 50 benign prompts. Using DisElect, we test 13 LLMs and find that most models broadly comply with these requests; we also find that the few models which refuse malicious prompts also refuse benign election-related prompts, and are more likely to refuse to generate content from a right-wing perspective. Secondly, we conduct a series of experiments (N=2,340) to assess the "humanness" of LLMs: the extent to which disinformation operation content generated by an LLM is able to pass as human-written. Our experiments suggest that almost all LLMs tested released since 2022 produce election disinformation operation content indiscernible by human evaluators over 50% of the time. Notably, we observe that multiple models achieve above-human levels of humanness. Taken together, these findings suggest that current LLMs can be used to generate high-quality content for election disinformation operations, even in hyperlocalised scenarios, at far lower costs than traditional methods, and offer researchers and policymakers an empirical benchmark for the measurement and evaluation of these capabilities in current and future models.

Authors (10)

Angus R. Williams (5 papers)
Liam Burke-Moore (3 papers)
Ryan Sze-Yin Chan (7 papers)
Florence E. Enock (5 papers)
Federico Nanni (11 papers)
Tvesha Sippy (4 papers)
Yi-Ling Chung (12 papers)
Evelina Gabasova (3 papers)
Kobi Hackenburg (4 papers)
Jonathan Bright (32 papers)

Citations (1)

View on Semantic Scholar

Summary

Evaluation of LLMs for Election Disinformation Operations

The paper introduces the DisElect dataset designed to evaluate LLMs on compliance and effectiveness in generating disinformation content. The research presents a critical investigation into the capabilities of LLMs to produce content that could be utilized in election disinformation operations, highlighting both the technical and social facets associated with these applications. The paper is conducted with the understanding that LLMs are increasingly accessible and potentially leveraged for malicious purposes, such as undermining democratic institutions.

Key Findings

The research consists of two primary components:

DisElect Evaluation Dataset: This dataset assesses LLM compliance with prompts designed to generate election disinformation. The paper examines multiple stages of disinformation operations including news article generation, social media account and content creation, and reply generation within two use cases: hyperlocalized voting disinformation and fictitious claims about MPs.
Human Experiments: These experiments measure the perceived authenticity of LLM-generated disinformation content. Participants were tasked with distinguishing between human-written and AI-generated content across different stages of the disinformation pipeline.

LLM Compliance with Disinformation Prompts

Using the DisElect dataset, results indicate that refusal rates among LLMs to generate disinformation content are generally low. Only newer models, such as Llama 2, Gemma, and Gemini 1.0 Pro, exhibited notable refusal rates. These models also displayed an avoidance of right-wing perspectives and benign election-related prompts, indicative of over-sensitive refusal mechanisms that could impede non-malicious uses of LLMs.

Refusal Rates: The paper finds that even the models that do refuse disinformation prompts also refuse benign prompts at similar rates. This indicates a significant challenge in aligning LLMs to differentiate between malicious and benign use cases.
Factors Influencing Refusal: Right-wing prompts faced higher refusal rates, and there was variability in refusal based on the specific MP targeted, typically showing higher refusal rates for female and Labour party MPs.

Human Perception of AI-Generated Content

The human experiments were focused on assessing the "humanness" of LLM-generated disinformation content. The paper finds that most models released post-2022 produced content perceived by participants as human-written more than 50% of the time. Intriguingly, the Llama 3 and Gemini models scored the highest in these evaluations, often achieving above-human-humanness.

Experiment Results: The two experiments on MP disinformation showed that participants found it difficult to distinguish AI-generated content from human-written content, with the models achieving higher perceived authenticity. On the other hand, prompts related to localised voting issues were discerned more easily by participants, indicating lower humanness scores.
Pipeline Stage and Humanness: The paper shows variations in humanness across different stages of content creation. Social media reactions and replies generated higher humanness scores compared to account creation and news article generation stages.

Implications and Future Directions

The paper underscores the potential implications of LLMs' capabilities to generate realistic disinformation content and how this could be exploited in information operations. The findings suggest several avenues for future research:

Red-Teaming and Prompt Engineering: While the experiment utilized a straightforward approach without intense red-teaming or prompt engineering, future work could involve more sophisticated techniques to better understand the limits and capabilities of LLMs in disinformation contexts.
Multimodal Disinformation: Exploring the application of LLMs in generating content beyond text, such as audio and video deepfakes, to understand the scope of AI-driven disinformation.
Socio-Technological Measures: Developing systemic measures to mitigate AI-generated disinformation involves not only technological fixes but also increasing AI literacy among the public to recognize such content.
Model Transparency and Monitoring: Open-source and proprietary model providers need mechanisms to identify and prohibit misuse effectively without significantly impacting the utility of models for benign applications.

Ethical Considerations

The paper also addresses the ethical aspects of its research. The experiments involved clear instructions to participants about the fictional nature of the content and ensured participant anonymity. Additionally, the importance of balancing AI utility against potential for harm was highlighted, insisting on both technical and societal measures for comprehensive AI safety evaluations.

Conclusion

This paper provides a nuanced examination of the capabilities and limitations of LLMs in the context of election disinformation. It highlights critical factors influencing the compliance and effectiveness of LLMs while offering insights into potential future research directions and the broader implications for AI regulation and media literacy. The DisElect dataset and the accompanying findings offer valuable tools and information for ongoing and future efforts to mitigate the risks of AI-generated disinformation.