Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Hate Speech Detection in Multimodal Publications (1910.03814v1)

Published 9 Oct 2019 in cs.CV and cs.CL

Abstract: In this work we target the problem of hate speech detection in multimodal publications formed by a text and an image. We gather and annotate a large scale dataset from Twitter, MMHS150K, and propose different models that jointly analyze textual and visual information for hate speech detection, comparing them with unimodal detection. We provide quantitative and qualitative results and analyze the challenges of the proposed task. We find that, even though images are useful for the hate speech detection task, current multimodal models cannot outperform models analyzing only text. We discuss why and open the field and the dataset for further research.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Raul Gomez (16 papers)
  2. Jaume Gibert (8 papers)
  3. Lluis Gomez (42 papers)
  4. Dimosthenis Karatzas (80 papers)
Citations (205)

Summary

  • The paper introduces the MMHS150K dataset with 150K tweets to benchmark multimodal hate speech detection.
  • The paper evaluates several models that merge text and image data, finding that multimodal approaches do not significantly outperform text-only methods.
  • The paper highlights challenges including limited labeled examples and annotation subjectivity, which hinder effective multimodal detection.

Exploring Hate Speech Detection in Multimodal Publications

The paper "Exploring Hate Speech Detection in Multimodal Publications" investigates the task of detecting hate speech in publications that include both text and images. The research highlights the challenge commonly faced in social media spaces like Twitter, where hate speech is proliferated, often using a combination of short text and images to obscure hateful content. This paper is particularly significant due to the increasing multimodal nature of user-generated content on platforms such as Twitter and Facebook, where the detection of hate speech via text alone may not be sufficient.

Key Contributions and Findings

The authors present several key contributions through this research:

  1. Introduction of the MMHS150K Dataset: A significant part of this work is the creation of the MMHS150K dataset, which comprises 150,000 tweets featuring both text and images. This dataset is notable for being one of the largest annotated collections aimed at multimodal hate speech detection, thus providing a valuable resource for advancing detection models in multimodal scenarios.
  2. Comparative Analysis of Detection Models: The paper evaluates several multimodal models that merge visual and textual information and compares their performance against unimodal models that rely solely on text. These models include the Feature Concatenation Model (FCM), Spatial Concatenation Model (SCM), and Textual Kernels Model (TKM). The experimental results reveal that none of these multimodal approaches significantly outperform the text-only models.
  3. Challenges in Multimodal Hate Speech Detection: Through experimental analysis, the authors acknowledge several challenges. The complexity and diversity of multimodal relations, alongside a small set of labeled multimodal examples, make it difficult for these models to learn effective patterns for hate speech detection. Furthermore, the subjectivity inherent in labeling hate speech adds an additional layer of complexity to this task.

Implications and Future Research

The work provides valuable insights into the nuances of multimodal hate speech detection and its current limitations. Despite images offering potential contextual cues that are not captured by text alone, the current multimodal models do not yield better results than text-only models in the context of this paper. This informs a crucial area for future exploration, emphasizing the need to innovate models that can effectively capture and learn from the added complexity in multimodal content.

The implications of these findings are profound for both theoretical exploration and practical application. Enhanced models could potentially improve the ability to identify hate speech across multiple platforms, which is crucial for fostering safer communication environments online. The paper opens various avenues for future research:

  • Development of more sophisticated models: To capture subtle and complex interactions between text and images that constitute hate speech.
  • Refinement of datasets: Given the current limitations due to the small subset of true multimodal hate speech examples, there is an opportunity to expand datasets further, ensuring they are representative of the full spectrum of multimodal interactions.
  • Advancements in annotation strategies: Particularly to address and mitigate the subjectivity in hate speech labeling.

This paper lays down a compelling foundation for engaging further in the endeavor to enhance the detection of hate speech, integrating both textual and visual cues to improve safety and communication in digital social platforms.