Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization (2405.12566v1)

Published 21 May 2024 in cs.SI and cs.CY

Abstract: In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Alessandra Recordare (1 paper)
  2. Guglielmo Cola (3 papers)
  3. Tiziano Fagni (9 papers)
  4. Maurizio Tesconi (31 papers)
Citations (1)

Summary

Profiling Online Conspiracy Theorists: A Text-Based Approach

Goals of the Study

This research aims to profile online users who propagate conspiracy theories by analyzing the text of their tweets. The authors worked with two groups: conspiracy theorists and a control group that doesn't exhibit such behavior. The central questions they sought to answer were:

  1. Is it possible to identify a conspiracy user through text alone?
  2. What features differentiate a conspiracy user from a generic user?

Methodology in a Nutshell

The researchers used data from X (formerly Twitter) and created two datasets: one of conspiracy theorists and another of non-conspiracy theorists. They employed a variety of text-based features grouped into three categories:

  • Emotions: Utilized zero-shot learning to detect eight basic emotions in tweets.
  • Idioms: Compiled a list of idioms commonly used by conspiracy theorists.
  • Linguistic Features: Analyzed lexical, syntactical, semantic, structural, and subject-specific characteristics.

Results Summary

The paper found that conspiracy theorists and control users exhibit different writing styles which can be classified with high accuracy. The best-performing model achieved an impressive F1 score of 0.87.

Emotional Features

Emotional content was one aspect the paper explored. Through sentiment analysis, it became clear that conspiracy theorists tend to show more disgust and sadness, while generic users lean towards joy and anger. Although emotional features were less discriminative compared to other categories, they still added value to the analysis.

Idioms of Conspiracy Theorists

The researchers also used idioms characteristic of conspiracy theorists. Some of the phrases they identified included:

  • "Trust no one"
  • "The truth is out there"
  • "Follow the money"

Interestingly, these idioms showed significant differences between the two user groups, offering another layer for classification.

Linguistic Features

The paper's strongest results came from linguistic features. These included:

  • Lexical Features: Such as the count of unique words and punctuation usage.
  • Syntactical Features: Like the number of coordinating and subordinating clauses.
  • Semantic Features: Including the use of pronouns and named entities.
  • Structural Features: Capturing sentence length and other textual structuring elements.
  • Subject-specific Features: Focusing on readability indices, indicating the complexity of the language used.

Strong Numerical Results

The standout performance came from the Light Gradient Boosting Machine (LGBM) classifier:

  • F1 Score: 0.87
  • Precision and Recall: Both high, indicating a reliable model

Implications

Practical Implications

  1. Detection Tools: This research could inform the development of tools to identify conspiracy theorists online, potentially aiding in the control of disinformation spread.
  2. Content Moderation: Social media platforms can leverage these findings to fine-tune their content moderation strategies.

Theoretical Implications

  1. Understanding Psycholinguistics: The paper adds depth to our understanding of how language and emotional content can reveal underlying beliefs.
  2. Robust Classifiers: Highlighting the efficacy of text-based classifiers builds a foundation for similar future research.

Speculating on the Future

Given the findings, future research could delve into:

  • Platform Generalization: Testing if these features hold for other social media platforms like Facebook or Reddit.
  • Temporal Dynamics: Examining how these features evolve over time with changing social and political climates.
  • Broader Feature Sets: Incorporating more diverse linguistic and contextual features to boost accuracy further.

This paper shines a bright light on the stylistic markers that differentiate conspiracy theorists from other users online. While focusing on text alone, the findings show promise for developing effective tools to identify and mitigate the spread of conspiracy theories. This research lays down a robust framework for future explorations into understanding and managing online disinformation.