Automatic Detection of Cyberbullying in Social Media Text (1801.05617v1)

Published 17 Jan 2018 in cs.CL, cs.CY, and cs.SI

Abstract: While social media offer great communication opportunities, they also increase the vulnerability of young people to threatening situations online. Recent studies report that cyberbullying constitutes a growing problem among youngsters. Successful prevention depends on the adequate detection of potentially harmful messages and the information overload on the Web requires intelligent systems to identify potential risks automatically. The focus of this paper is on automatic cyberbullying detection in social media text by modelling posts written by bullies, victims, and bystanders of online bullying. We describe the collection and fine-grained annotation of a training corpus for English and Dutch and perform a series of binary classification experiments to determine the feasibility of automatic cyberbullying detection. We make use of linear support vector machines exploiting a rich feature set and investigate which information sources contribute the most for this particular task. Experiments on a holdout test set reveal promising results for the detection of cyberbullying-related posts. After optimisation of the hyperparameters, the classifier yields an F1-score of 64% and 61% for English and Dutch respectively, and considerably outperforms baseline systems based on keywords and word unigrams.

Citations (276)

View on Semantic Scholar

Summary

The paper introduces an SVM-based model enhanced with word and character n-grams, sentiment lexicons, and topic models to identify cyberbullying instances.
It leverages finely annotated English and Dutch corpora from ASKfm posts, achieving F1-scores of 64% and 61% to outperform baseline keyword methods.
The research offers actionable insights for improving content moderation and paves the way for future expansions in accurately classifying nuanced cyberbullying behaviors.

Automatic Detection of Cyberbullying in Social Media Text

The paper "Automatic Detection of Cyberbullying in Social Media Text," authored by Cynthia Van Hee et al., presents a systematic approach to identifying cyberbullying incidents in social media platforms through machine learning techniques. This research addresses a critical need for automated systems capable of monitoring and flagging harmful content on vast and dynamic digital landscapes, which are increasingly frequented by young users susceptible to online exploitation.

The paper is anchored on developing a model that distinguishes posts associated with cyberbullying—encompassing texts initiated by bullies, as well as responses from victims and bystanders. The researchers constructed finely annotated corpora in English and Dutch, leveraging ASKfm posts, which necessitated a comprehensive labeling strategy to cover various facets of cyberbullying. This methodological choice allows for an inclusive classification framework that recognizes a broad spectrum of cyberbullying manifestations beyond the conventional focus on aggressor behavior alone.

Central to the methodology is the application of linear Support Vector Machines (SVMs), optimized using a diverse feature set including word and character n-grams, sentiment lexicons, and topic models. The classifier demonstrated notable effectiveness, achieving F $_1$ -scores of 64% for English and 61% for Dutch, significantly surpassing baseline systems reliant on mere keyword detection or unigram representations. These metrics underscore the efficacy of utilizing rich linguistic and semantic features in enhancing the automated identification of cyberbullying.

The implications of this research are multifaceted. Practically, it provides a mechanism that could greatly assist moderators in preemptively managing toxic content, thereby mitigating the adverse impacts on vulnerable users. Theoretically, it contributes to a nuanced understanding of the linguistic predictors of cyberbullying, highlighting the complexity in distinguishing potentially harmful discourse from benign interactions, particularly when the former employs implicit or nuanced language.

The results furnish compelling prospects for future exploration. Noteworthy is the potential for extending this work to recognize finer-grained categories such as racial slurs or veiled threats, which could further refine system precision. Additionally, dynamic adaptation to evolving language patterns, especially slang and emotive expressions prevalent in youth communication, remains an open challenge requiring innovative approaches.

In conclusion, the paper establishes a foundation for more sophisticated, context-aware cyberbullying detectors that leverage machine learning's versatility in handling the intricacies of human language. It paves the way for technological interventions that preserve the vitality of online spaces while safeguarding user well-being through proactive, informed moderation.

PDF Markdown

Automatic Detection of Cyberbullying in Social Media Text (1801.05617v1)

Summary

Automatic Detection of Cyberbullying in Social Media Text

Related Papers