Extracting Participation in Collective Action from Social Media (2501.07368v1)

Published 13 Jan 2025 in cs.SI, cs.CY, and physics.soc-ph

Abstract: Social media play a key role in mobilizing collective action, holding the potential for studying the pathways that lead individuals to actively engage in addressing global challenges. However, quantitative research in this area has been limited by the absence of granular and large-scale ground truth about the level of participation in collective action among individual social media users. To address this limitation, we present a novel suite of text classifiers designed to identify expressions of participation in collective action from social media posts, in a topic-agnostic fashion. Grounded in the theoretical framework of social movement mobilization, our classification captures participation and categorizes it into four levels: recognizing collective issues, engaging in calls-to-action, expressing intention of action, and reporting active involvement. We constructed a labeled training dataset of Reddit comments through crowdsourcing, which we used to train BERT classifiers and fine-tune Llama3 models. Our findings show that smaller LLMs can reliably detect expressions of participation (weighted F1=0.71), and rival larger models in capturing nuanced levels of participation. By applying our methodology to Reddit, we illustrate its effectiveness as a robust tool for characterizing online communities in innovative ways compared to topic modeling, stance detection, and keyword-based methods. Our framework contributes to Computational Social Science research by providing a new source of reliable annotations useful for investigating the social dynamics of collective action.

Summary

The paper introduces a novel NLP methodology to detect four distinct levels of participation in collective action from social media data.
Evaluation shows that smaller BERT models achieve competitive participation detection performance with significantly less computation than larger LLMs.
This method reveals participation signals in online communities that are often missed by traditional topic or stance detection techniques.

Extracting Participation in Collective Action from Social Media

The paper entitled "Extracting Participation in Collective Action from Social Media," authored by Arianna Pera and Luca Maria Aiello, addresses a critical gap in the paper of collective action by leveraging NLP techniques to detect varying levels of participation in collective action from social media data. This work is grounded in the theoretical framework of social movement mobilization and contributes to the Computational Social Science domain by providing a new methodology for classifying online discourse in a topic-agnostic fashion.

Research Framework and Methodology

The authors present a theoretical foundation based on collective action and mobilization theories, categorizing participation into four levels: recognizing collective issues, engaging in calls-to-action, expressing intention to act, and reporting active involvement. This categorization acknowledges the nuances of participation beyond mere engagement metrics, offering a fine-grained understanding of social media dynamics in collective action contexts.

To operationalize this framework, the authors developed a suite of text classifiers, employing both BERT-based models and fine-tuned Llama3 models. A data corpus was constructed using Reddit comments, specifically curated to focus on activism and rights-oriented subreddits. These comments were manually annotated through crowdsourcing, and data augmentation techniques were applied to mitigate the sparsity of higher participation levels in the dataset.

Model Evaluation and Findings

The paper rigorously evaluates four classification pipelines: a BERT-based classifier, zero-shot learning with Llama3, supervised fine-tuning, and Direct Preference Optimization (DPO). The BERT model demonstrated the ability to detect participation expressions with an impressive weighted F1 score of 0.71.

One of the notable findings is the competitive performance of smaller models like BERT compared to LLMs, with the BERT classifier offering comparable accuracy with significantly lower computational demands. This finding is critical for computational efficiency, particularly in scenarios where resource constraints are a consideration.

Moreover, the application of these classifiers to Reddit data revealed that traditional topic modeling and stance detection inadequately characterize the participation nuances that this novel methodology can capture. The method surfaces participation signals in communities that may not be identified using keyword-based approaches, especially in discussions related to climate action.

Implications and Future Directions

The paper's implications extend both theoretically and practically. It introduces a methodology that significantly improves our capacity to analyze social media discourse, revealing insights into online community dynamics that were previously inaccessible. This methodological advancement provides robust annotations that can be used to paper the trajectories of social mobilization, supporting both academic research and real-world applications like campaign targeting.

Additionally, the work invites further exploration into the integration of NLP models with social science research, especially in the paper of emergent online behavior patterns and mobilization efforts. Future research could extend this framework to other platforms and contexts, addressing limitations related to subreddit-focused analyses and exploring cross-platform dynamics.

In conclusion, Pera and Aiello's work adds a valuable tool to the repertoire of Computational Social Science, enhancing our understanding of how collective action is expressed and engaged with in digital environments. This approach not only refines the analysis of online discourse but also sets the stage for more nuanced studies of socio-political movements and their online manifestations.

PDF Markdown

Related Papers

Tweets

https://twitter.com/arianna_pera1/status/1879528562765795663

https://twitter.com/net_science/status/1917486748059238724