Implicit Hate Corpus (IHC)
- Implicit Hate Corpus (IHC) is a public resource that captures covert hate speech through indirect, coded, and multimodal expressions.
- IHC employs a detailed taxonomy and robust annotation protocols to categorize hate speech into six specific implicit forms.
- IHC underpins state-of-the-art benchmarks and supports research in both unimodal and multimodal implicit hate detection.
The Implicit Hate Corpus (IHC) is a suite of public, English-language data resources designed for the study of implicit, latent, or covert hate speech, with a particular focus on the challenging identification and categorization of hate that is encoded in indirect, coded, humoristic, or multimodal forms. IHC has evolved through multiple iterations, each extending its coverage, granularity, and annotation protocols. IHC now underpins state-of-the-art benchmarks in implicit hate detection; supports research on target span identification; and provides both unimodal and multimodal resources for experimental and model development purposes. The following exposition synthesizes details from the principal IHC publications, including taxonomy, dataset construction, annotation protocols, methodological advances, key benchmarks, and their implications for the broader study of online hate (ElSherief et al., 2021, Botelho et al., 2021, Jafari et al., 2024).
1. Theoretical Foundations and Taxonomy
IHC formally distinguishes implicit hate from explicit hate based on the absence of overt slurs or plainly derogatory markers and the prevalence of indirect, coded, or allusive speech acts. Elsherief et al. define six categories, each mapping onto a theoretically significant form of implicit hate used in social media discourse (ElSherief et al., 2021):
- White Grievance: Indirect complaints that paint majority groups as victims or suggest loss of privileged status (e.g., “Soon white people will not be a majority anywhere. You believe.”).
- Incitement to Violence: Implicit rhetorical devices that praise violence or extremist in-group solidarity absent explicit threats (e.g., “White revolution is the only solution.”).
- Inferiority Language: Metaphors or descriptors that dehumanize or toxify the target group (e.g., “Immigrants are disease.”).
- Stereotypical Association: Assigning negative traits or behaviors to a group obliquely, often through circumlocution or stereotype.
- Dismissal of Victimhood: Implying harm or injustice is fabricated or exaggerated by the target group.
- Mocking Language: Use of humor, sarcasm, or creative linguistic form to imply derogation without overt statement.
This taxonomy underpins IHC’s single-label scheme (although the categories are not mutually exclusive in principle).
2. Corpus Construction and Annotation Methods
The original IHC development (ElSherief et al., 2021) drew on 4.7 million English-language tweets posted from Jan 2015–Dec 2017, filtering this space using seed accounts associated with the SPLC’s list of eight U.S. hate-group ideologies. Manual