Evaluation of LLMs for Election Disinformation Operations
The paper introduces the DisElect dataset designed to evaluate LLMs on compliance and effectiveness in generating disinformation content. The research presents a critical investigation into the capabilities of LLMs to produce content that could be utilized in election disinformation operations, highlighting both the technical and social facets associated with these applications. The paper is conducted with the understanding that LLMs are increasingly accessible and potentially leveraged for malicious purposes, such as undermining democratic institutions.
Key Findings
The research consists of two primary components:
- DisElect Evaluation Dataset: This dataset assesses LLM compliance with prompts designed to generate election disinformation. The paper examines multiple stages of disinformation operations including news article generation, social media account and content creation, and reply generation within two use cases: hyperlocalized voting disinformation and fictitious claims about MPs.
- Human Experiments: These experiments measure the perceived authenticity of LLM-generated disinformation content. Participants were tasked with distinguishing between human-written and AI-generated content across different stages of the disinformation pipeline.
LLM Compliance with Disinformation Prompts
Using the DisElect dataset, results indicate that refusal rates among LLMs to generate disinformation content are generally low. Only newer models, such as Llama 2, Gemma, and Gemini 1.0 Pro, exhibited notable refusal rates. These models also displayed an avoidance of right-wing perspectives and benign election-related prompts, indicative of over-sensitive refusal mechanisms that could impede non-malicious uses of LLMs.
- Refusal Rates: The paper finds that even the models that do refuse disinformation prompts also refuse benign prompts at similar rates. This indicates a significant challenge in aligning LLMs to differentiate between malicious and benign use cases.
- Factors Influencing Refusal: Right-wing prompts faced higher refusal rates, and there was variability in refusal based on the specific MP targeted, typically showing higher refusal rates for female and Labour party MPs.
Human Perception of AI-Generated Content
The human experiments were focused on assessing the "humanness" of LLM-generated disinformation content. The paper finds that most models released post-2022 produced content perceived by participants as human-written more than 50% of the time. Intriguingly, the Llama 3 and Gemini models scored the highest in these evaluations, often achieving above-human-humanness.
- Experiment Results: The two experiments on MP disinformation showed that participants found it difficult to distinguish AI-generated content from human-written content, with the models achieving higher perceived authenticity. On the other hand, prompts related to localised voting issues were discerned more easily by participants, indicating lower humanness scores.
- Pipeline Stage and Humanness: The paper shows variations in humanness across different stages of content creation. Social media reactions and replies generated higher humanness scores compared to account creation and news article generation stages.
Implications and Future Directions
The paper underscores the potential implications of LLMs' capabilities to generate realistic disinformation content and how this could be exploited in information operations. The findings suggest several avenues for future research:
- Red-Teaming and Prompt Engineering: While the experiment utilized a straightforward approach without intense red-teaming or prompt engineering, future work could involve more sophisticated techniques to better understand the limits and capabilities of LLMs in disinformation contexts.
- Multimodal Disinformation: Exploring the application of LLMs in generating content beyond text, such as audio and video deepfakes, to understand the scope of AI-driven disinformation.
- Socio-Technological Measures: Developing systemic measures to mitigate AI-generated disinformation involves not only technological fixes but also increasing AI literacy among the public to recognize such content.
- Model Transparency and Monitoring: Open-source and proprietary model providers need mechanisms to identify and prohibit misuse effectively without significantly impacting the utility of models for benign applications.
Ethical Considerations
The paper also addresses the ethical aspects of its research. The experiments involved clear instructions to participants about the fictional nature of the content and ensured participant anonymity. Additionally, the importance of balancing AI utility against potential for harm was highlighted, insisting on both technical and societal measures for comprehensive AI safety evaluations.
Conclusion
This paper provides a nuanced examination of the capabilities and limitations of LLMs in the context of election disinformation. It highlights critical factors influencing the compliance and effectiveness of LLMs while offering insights into potential future research directions and the broader implications for AI regulation and media literacy. The DisElect dataset and the accompanying findings offer valuable tools and information for ongoing and future efforts to mitigate the risks of AI-generated disinformation.