Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation

Published 29 Nov 2024 in cs.CL | (2411.19832v3)

Abstract: The detection of sensitive content in large datasets is crucial for ensuring that shared and analysed data is free from harmful material. However, current moderation tools, such as external APIs, suffer from limitations in customisation, accuracy across diverse sensitive categories, and privacy concerns. Additionally, existing datasets and open-source models focus predominantly on toxic language, leaving gaps in detecting other sensitive categories such as substance abuse or self-harm. In this paper, we put forward a unified dataset tailored for social media content moderation across six sensitive categories: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. By collecting and annotating data with consistent retrieval strategies and guidelines, we address the shortcomings of previous focalised research. Our analysis demonstrates that fine-tuning LLMs on this novel dataset yields significant improvements in detection performance compared to open off-the-shelf models such as LLaMA, and even proprietary OpenAI models, which underperform by 10-15% overall. This limitation is even more pronounced on popular moderation APIs, which cannot be easily tailored to specific sensitive content categories, among others.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces a comprehensive X-Sensitive dataset that spans six key sensitive content categories for enhanced social media moderation.
It demonstrates that fine-tuned language models achieve 10-15% better performance over conventional off-the-shelf models.
The work emphasizes consistent annotation practices and open science by providing a publicly available resource for future research.

The paper presented by Antypas et al. addresses a critical need in the field of social media: the effective identification of sensitive content across diverse categories. This research proposes a unified dataset, X-Sensitive, which encompasses six primary types of sensitive content: conflictual language, profanity, sexually explicit material, drug-related content, self-harm, and spam. This novel dataset introduces a comprehensive framework for detecting sensitive content, moving beyond the predominantly toxic language detection focus seen in prior studies.

Key Contributions

Holistic Dataset Approach: The X-Sensitive dataset stands out by addressing the insufficiencies in existing models and datasets, which often lack customization abilities, vary in accuracy across categories, and pose privacy concerns. Unlike prior limited-resource datasets, X-Sensitive provides extensive, annotated data that spans multiple sensitive categories, thus filling a significant gap in content moderation resources.
Improved Detection Performance: When evaluating various models, the paper reports that fine-tuned LLMs on this dataset demonstrate significant performance improvements, with an overall 10-15% enhancement compared to standard off-the-shelf models, including proprietary ones like those from OpenAI. This underscores the importance of bespoke training on specialized datasets.
Annotation Consistency and Quality: The dataset was curated using consistent data collection and re-annotation methodologies to ensure high-quality annotations across categories. Notably, annotation disparities influenced by demographic differences highlighted the nuance in perception and recognition of sensitive content.
A Publicly Available Resource: A commendable aspect of this work is the researchers' commitment to open science. Both the dataset and the top-performing models trained on it have been made available on HuggingFace, providing a valuable asset for ongoing research and application in social media content moderation.

Implications for Future Research and Development

The creation of the X-Sensitive dataset and its associated models have multiple implications:

Enhanced Content Moderation: By providing a comprehensive dataset and demonstrating the effectiveness of fine-tuned models, this research provides a foundation for more robust and nuanced content moderation tools capable of addressing a broader spectrum of sensitive content. This can be particularly beneficial for social media platforms aiming to foster safer online environments while respecting user privacy.
Benchmark for Model Evaluation: X-Sensitive establishes a benchmark that can aid researchers in evaluating and developing sophisticated LLMs with increased precision in sensitive content detection, extending beyond the heavily studied toxic language domain.
Ethics and Annotation Biases: The study's insights into annotation biases, stemming from demographic differences, suggest a need for further exploration into how these biases can be systematically accounted for and mitigated in AI-driven moderation tools.

Conclusion

In summary, the research paper "Sensitive Content Classification in Social Media: A Holistic Resource and Evaluation" represents a methodical and significant contribution to the field of natural language processing, particularly in the context of content moderation. Its focus on under-represented sensitive content categories, along with the provision of a robust dataset and evaluation framework, opens pathways for improved detection methods and highlights the importance of tailoring models to specific use cases. While the research presents a strong foundation, it also encourages future exploration into expanding these methodologies across different languages and platforms, addressing ongoing challenges in privacy, demographic biases, and model robustness.

Markdown