Predicting the Type and Target of Offensive Posts in Social Media
Authors: Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar
The paper "Predicting the Type and Target of Offensive Posts in Social Media" addresses the complex issue of recognizing various forms of offensive content in online interactions, with a specific focus on social media platforms such as Twitter. This research overcomes the limitations of previous work that primarily focused on specific kinds of offensive language (e.g., hate speech, cyberbullying) by introducing a multi-faceted approach to offensive content detection. The authors propose a hierarchical model for classifying offensive posts, identifying both the type and target of the offense, thus providing a more comprehensive framework.
Hierarchical Annotation Schema
The authors introduce the Offensive Language Identification Dataset (OLID), annotated using a detailed three-level hierarchical schema:
- Level A: Offensive Language Detection
- NOT (Not Offensive): Posts devoid of any offensive language or profanity.
- OFF (Offensive): Posts containing unacceptable language, either targeted or untargeted.
- Level B: Categorization of Offensive Language
- TIN (Targeted Insult): Posts containing specific threats or insults directed at an individual, group, or entity.
- UNT (Untargeted): Posts with general profanity or swearing without a specific target.
- Level C: Offensive Language Target Identification
- IND (Individual): Posts targeting specific individuals.
- GRP (Group): Posts aimed at groups based on characteristics such as ethnicity, gender, or religious beliefs.
- OTH (Other): Posts targeting entities other than individuals or groups, such as organizations or events.
This annotation framework allows for detailed categorization and differentiation of offensive content, providing significant practical utility for social media platforms in moderating and managing content.
Data Collection and Annotation
The dataset was compiled using Twitter API, targeting keywords typically associated with offensive language. The authors stratified the collection to ensure a balanced representation of political and non-political content, given the higher propensity for offensive language within political contexts. Notably, the annotation process employed crowdsourcing through Figure Eight, ensuring high-quality data through strict annotator selection and agreement protocols.
Key statistics include:
- Training Set Size: 13,240 tweets
- Test Set Size: 860 tweets
- Distribution of Offensive Content: Approximately 30% offensive to 70% non-offensive
Experimental Evaluation
The performance of different machine learning models, including SVM, BiLSTM, and CNN, was evaluated on the OLID dataset. Here are the notable findings:
- Offensive Language Detection (Level A):
- CNN achieved the highest macro-F1 score (0.80), outperforming the BiLSTM and SVM models.
- Categorization of Offensive Language (Level B):
- CNN again showed superior performance with a macro-F1 score (0.69), particularly excelling in identifying targeted insults (TIN).
- Offensive Language Target Identification (Level C):
- Despite challenges due to the heterogeneous nature of the OTH category, the CNN and BiLSTM models performed comparably, with macro-F1 scores indicating moderate success (0.47).
Implications and Future Directions
The hierarchical approach delineated in this research provides a robust framework for handling offensive language detection at multiple levels of granularity. Practically, OLID's schema and the associated machine learning baselines can enhance the moderation capabilities of social media platforms, enabling more nuanced and effective handling of offensive content.
Future research should further explore cross-corpus comparisons with other datasets on related tasks such as aggression and hate speech identification. Expanding OLID to include other languages while adhering to the structured hierarchical annotation can pave the way for more generalizable and internationally applicable models. The work opens avenues for refining offensive content detection mechanisms, contributing to the broader goal of maintaining healthier online discourse.