MeMo Dataset: Multimodal Misogyny Benchmark

Updated 19 January 2026

MeMo Dataset is a benchmark designed for multimodal analysis of misogynistic content, combining expert and crowd annotations with manual text transcriptions.
It comprises 800 carefully selected memes balanced between misogynistic and non-misogynistic instances, offering detailed binary labels and confidence scores.
The dataset supports advanced research with fusion architectures integrating CNN and transformer models, ideal for evaluations using standard and k-fold cross-validation.

The MeMo Dataset, introduced in "Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content" (Gasparini et al., 2021), is a rigorously constructed benchmark designed for research into the automatic multi-modal detection of misogyny in web memes. It combines expertly selected and crowdsourced binary labels with full manual transcriptions of meme text, facilitating both visual and textual approaches to cybersexism and technology-facilitated violence analysis. Below, the dataset's design, annotation protocols, available modalities, and recommended usage patterns are detailed.

1. Dataset Composition and Annotation Schema

The MeMo dataset consists of a curated collection of 800 memes, precisely balanced between 400 misogynistic and 400 non-misogynistic instances. Selection was performed by three domain experts, with only unanimously labeled memes admitted, yielding 100% expert agreement on primary class assignments and associated attributes. For each misogynistic meme, binary labels of "aggressiveness" and "irony" were assigned independently by both experts (suffix "DE") and by crowd annotators (suffix "CS").

The annotation structure per meme includes:

memeID
manual text transcription
binary expert labels: misogynisticDE, aggressiveDE, ironicDE
binary crowd labels: misogynisticCS, aggressiveCS, ironicCS
agreement fractions: confidence_M_CS, confidence_A_CS, confidence_I_CS

Crowdsourcing involved 60 diverse workers aged 20-50, each rating a meme independently in triples, with label confidence defined as the fraction of crowd annotators agreeing on a given binary decision. No formal Cohen’s κ or Krippendorff’s α values are reported; however, the standard formula for Cohen's κ is referenced:

$\kappa = \frac{p_0 - p_e}{1 - p_e}$

where $p_0$ is observed agreement and $p_e$ chance agreement.

2. Source Acquisition and Selection Protocols

Memes were collected over October–November 2018 from Facebook, Twitter, Instagram, Reddit, and meme-dedicated sites. Misogynistic memes were sourced using targeted keyword searches (#girl, #women, #feminist, etc.), selected forum threads (e.g., MGTOW), and domains with documented sexist activity. Non-misogynistic memes were drawn in parallel from identical sources and keywords to avoid trivial negative sampling.

Expert raters reviewed all downloaded memes, assigning judgments for misogyny, irony, and aggressiveness. Only cases of unanimous expert agreement for primary misogyny were retained, enforcing a perfectly balanced dataset with robust class definition.

Crowdsourced annotation used the Figure Eight (Appen) platform with strict per-contributor limits (max 40 memes, 90 minutes) and randomized ordering to avoid bias. Annotators answered a hierarchical set of questions per meme:

"In your opinion, is this meme misogynistic?"
(Conditional) "Is it ironic?"
(Conditional) "Is it aggressive?" No operational definitions of intent were provided, capturing the annotator’s native perception.

3. Modalities and Data Format

For each meme:

JPEG image resized to a max dimension of 640 px (zero-padded naming convention, e.g., 0001.jpg–0800.jpg).
Manual transcription of any overlaid text, with no OCR or automated recognition employed.

The annotation CSV contains memeID, full text, six binary labels, three confidence fractions, and is indexed by unique meme identifier.

4. Dataset Availability and Access Procedures

The MeMo dataset is publicly housed at https://github.com/MIND-Lab/MEME. The repository is password-protected; users must agree to a copyright notice for access. No formal open-source license is declared in the associated Data in Brief article. Potential users must consult repository instructions to obtain the data package (images and CSV annotations).

5. Evaluation Protocols and Modeling Practices

No experimental baselines are furnished in the Data in Brief article. For modeling approaches, fusion architectures that combine CNN image encoders with transformer-based text embeddings (early or late fusion) are proposed in related work (cf. ACIIW 2019). Standard train/validation/test splits (e.g., 70%/10%/20%) are recommended, but k-fold cross-validation (typically $k=5$ ) can be used to maximize data utility for supervised learning.

Recommended metrics include accuracy, precision, recall, and $F_1$ -score for misogyny detection:

$F_1 = 2 \cdot \frac{\text{precision} \cdot \text{recall}}{\text{precision}+\text{recall}}$

6. Intended Research Uses and Limitations

Applications encompass multimodal hate and misogyny detection on social media, pretraining/fine-tuning transformer architectures (e.g., VisualBERT, CLIP), crowdsourcing validation benchmarking, and perceptual studies on expert-public labeling gaps.

Limitations explicitly noted include:

Dataset size (800 memes) limits suitability for complex deep models without augmentation.
Crowd annotation agreement may be as low as 1/3 per item, compared to expert unanimity.
Manual transcriptions are subject to annotator error (punctuation and casing).
Absence of fine-grained misogyny subtype distinctions (e.g., body shaming, objectification).

Table: MeMo Dataset Structure

Field	Description	Type
memeID	Unique zero-padded image identifier	String
text	Manual transcription of meme overlay text	String
misogynisticDE	Expert label: misogynistic content	Boolean
aggressiveDE	Expert label: aggressiveness (if misogynistic)	Boolean
ironicDE	Expert label: irony (if misogynistic)	Boolean
misogynisticCS	Crowd label: misogynistic content	Boolean
aggressiveCS	Crowd label: aggressiveness (if misogynistic)	Boolean
ironicCS	Crowd label: irony (if misogynistic)	Boolean
confidence_M_CS	Fraction of crowd agreement: misogyny	{1/3, 2/3, 3/3}
confidence_A_CS	Fraction of crowd agreement: aggressiveness	{1/3, 2/3, 3/3}
confidence_I_CS	Fraction of crowd agreement: irony	{1/3, 2/3, 3/3}

All field definitions strictly follow those established in the official annotation CSV.

7. Context and Significance

MeMo is a foundational resource enabling rigorous evaluation and development of multimodal misogyny detection systems. Its stringent expert selection protocol ensures label certainty for core tasks, supporting social and computational studies where both perception and technical cues play critical roles. By offering per-item crowd agreement and manual transcriptions, MeMo facilitates nuanced analyses of labeling reliability and modality fusion. Limitations on granularity and dataset size, as noted, should be considered when designing high-capacity models or exploring fine-grained hate typologies.

MeMo thus provides a platform for systematic exploration of multimodal cues in misogynistic content identification and for benchmarking approaches that incorporate both visual and textual signal fusion (Gasparini et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Benchmark dataset of memes with text transcriptions for automatic detection of multi-modal misogynistic content (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MeMo Dataset.

MeMo Dataset: Multimodal Misogyny Benchmark

1. Dataset Composition and Annotation Schema

2. Source Acquisition and Selection Protocols

3. Modalities and Data Format

4. Dataset Availability and Access Procedures

5. Evaluation Protocols and Modeling Practices

6. Intended Research Uses and Limitations

Table: MeMo Dataset Structure

7. Context and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

MeMo Dataset: Multimodal Misogyny Benchmark

1. Dataset Composition and Annotation Schema

2. Source Acquisition and Selection Protocols

3. Modalities and Data Format

4. Dataset Availability and Access Procedures

5. Evaluation Protocols and Modeling Practices

6. Intended Research Uses and Limitations

Table: MeMo Dataset Structure

7. Context and Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research