Social Bias Frames Explained

Updated 26 January 2026

Social Bias Frames are a formalism that captures implicit stereotypes and discriminatory attitudes through structured, multi-dimensional annotations.
The framework employs hierarchical, tuple-based schemas to differentiate biases by offensiveness, intent, and group targeting in language.
It underpins datasets like SBIC and CDial-Bias, providing computational benchmarks and enhanced methods for bias detection in social media and dialogues.

A social bias frame is a structured formalism designed to systematically capture, annotate, and infer the pragmatic ways in which language projects stereotypes, discriminatory attitudes, and power differentials onto marginalized groups. In natural language, social bias is rarely explicit; rather, bias is embedded in implicatures, implicit attitudes, and context-dependent patterns that reinforce harmful stereotypes. Social bias frames provide annotation schemas, dataset constructions, and computational benchmarks for reasoning about these phenomena in open-domain dialogue and social media, enabling precise analysis, measurement, and mitigation in machine learning applications (Zhou et al., 2022, Sap et al., 2019).

Social bias frames (SBF) and their extension to dialogues via the Dial-Bias Frame formalize social bias annotation via multidimensional, structured tuples.

In the SBF paradigm (Sap et al., 2019), given a post $p$ , annotation produces: $F(p) = (L_{\mathrm{lewd}}, L_{\mathrm{off}}, L_{\mathrm{int}}, L_{\mathrm{grp}}, G, S, L_{\mathrm{ing}})$ where each component encodes:

$L_{\mathrm{lewd}} \in \{\text{yes},\text{maybe},\text{no}\}$ : contains lewd/sexual content.
$L_{\mathrm{off}} \in \{\text{yes},\text{maybe},\text{no}\}$ : offensive to anyone.
$L_{\mathrm{int}} \in \{\text{yes},\text{probably},\text{probably not},\text{no}\}$ : author’s perceived intent to offend.
$L_{\mathrm{grp}} \in \{\text{yes},\text{no}\}$ : targets a group rather than individual.
$G$ : free-text group label, e.g., “women,” “Black folks.”
$S$ : short stereotype implication, e.g., “women are less qualified.”
$L_{\mathrm{ing}} \in \{\text{yes},\text{maybe},\text{no}\}$ : annotator believes speaker belongs to the group.

For open-domain dialogues, the Dial-Bias Frame (Zhou et al., 2022) extends this as follows for a two-turn dialogue $(c, r)$ : $F(c, r) = \left\langle y_{\mathrm{ctx}},\, y_{\mathrm{dt}},\, y_{\mathrm{group}},\, y_{\mathrm{bias}} \right\rangle$ with

$y_{\mathrm{ctx}} \in \{0,1\}$ : Context‐Sensitivity (0 = context-independent, 1 = context-sensitive)
$y_{\mathrm{dt}} \in \{0,1,2\}$ : Data‐Type (0 = irrelevant, 1 = bias-discussing, 2 = bias-expressing)
$y_{\mathrm{group}} \subseteq G$ : Targeted social groups, multi-valued.
$y_{\mathrm{bias}} \in \{0,1,2,3\}$ : Implied Attitude (0 = irrelevant, 1 = anti-bias, 2 = neutral, 3 = biased).

Annotation proceeds hierarchically, from context-sensitivity to bias type, group, and attitude.

2. Annotation Schema and Sequential Labeling

Both SBF and Dial-Bias frameworks deploy multi-step, hierarchical annotation processes:

For SBF, annotation proceeds from coarse-grained categoricals—lewdness, offensiveness, intent—to group implication, with free-text slots for $G$ (targeted group) and $S$ (implied stereotype), culminating in optional “in-group” assessment.
Dial-Bias annotation (for each response $r$ $r$ and context $c$ $c$ ):
- Annotation halts if irrelevant.
- 3. Enumerate all targeted groups ( $y_{\mathrm{group}}$ ).
- 4. Assign implied attitude ( $y_{\mathrm{bias}}$ ).

This schema accommodates multi-label group assignment but enforces single-label for other dimensions, yielding a tractable and thorough hierarchical structure.

3. Distinctions From Prior Frameworks

Traditional datasets (e.g., StereoSet, CrowS-Pairs) focus on single sentences, providing binary or ternary labels with little contextual nuance. The Social Bias Frames approach (Sap et al., 2019) introduces explicit roles for bias source, target, and inference statement, capturing pragmatic and social meaning. Dial-Bias further advances the annotation structure by:

Segmenting dialogue for context sensitivity (CI/CS).
Distinguishing bias discussion from bias expression, isolating conversational dynamics overlooked in prior work.
Implementing a trichotomy for attitude (anti-bias, neutral, biased) vs. prior binary schemes.
Leveraging free-text for fine-grained group identity.

Sun et al. (DIA Safety) partition offensive, sexual, and biased content but lack the integrated, four-dimensional frame and sequential annotation schema (Zhou et al., 2022).

4. Datasets and Benchmarks

Two major annotated corpora operationalize these frameworks:

Name	Language	Entries	Distinct Groups	Dimensions
SBIC	English	44,671	1,414	7 (SBF tuple)
CDial-Bias	Chinese	28,000	171	4 (Dial-Bias tuple)

SBIC (Social Bias Inference Corpus) (Sap et al., 2019) crowdsources annotations across offensive jokes, microaggressions, Twitter abuse, and hate speech communities, yielding 147,139 inference tuples with 32,028 unique stereotype implications.

CDial-Bias (Zhou et al., 2022), collected from Zhihu, focuses on dialogue pairs, with 17,000 bias-related pairs (≈52% bias-discussing), fine-grained group and attitude tags, enabling context-dependent bias detection benchmarks.

Performance metrics in both works reflect multi-head classification and generation tasks, with F₁ scores (offensiveness ≈80%, intent ≈79%, lewd ≈81%) and generation metrics (target group BLEU ≈74, ROUGE-L ≈65).

Auxiliary labels and joint training (e.g., Mixture-of-Experts, multi-task) improve classifier performance in bias detection, and context-sensitive examples pose greater challenge (CS cases ≈15 points lower F₁ than CI in CDial-Bias).

5. Computational Modeling Approaches

Social bias frame inference is cast as a generative modeling problem. Baseline approaches (Sap et al., 2019):

Utilize Transformer-based architectures (GPT/GPT-2) without separate classifiers; slot values are represented as special vocabulary tokens.
Input consists of raw text concatenated with all target categorical slots and free-text fields.
Optimization via next-token cross-entropy over the linearized annotation sequence.
Inference by greedy decoding or multinomial sampling, with constrained post-hoc correction for slot dependencies.

For utterance-level bias detection (Zhou et al., 2022), fine-grained classifiers predict $y_{\mathrm{bias}}$ while optionally leveraging $y_{\mathrm{ctx}}$ , $y_{\mathrm{dt}}$ , and topic, demonstrating measurable improvements with auxiliary supervision.

6. Illustrative Frame Instantiations

Social bias frames explicitly model annotator inferences. Consider:

SBIC Example: Post: “I hate fat bitches.” Frame: (lewd = yes, offensive

Markdown Report Issue Upgrade to Chat

References (2)

Towards Identifying Social Bias in Dialog Systems: Frame, Datasets, and Benchmarks (2022)

Social Bias Frames: Reasoning about Social and Power Implications of Language (2019)

Topic to Video (Beta)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Social Bias Frames.

Social Bias Frames Explained

1. Formalism of Social Bias Frames

2. Annotation Schema and Sequential Labeling

3. Distinctions From Prior Frameworks

4. Datasets and Benchmarks

5. Computational Modeling Approaches

6. Illustrative Frame Instantiations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research