Papers
Topics
Authors
Recent
Search
2000 character limit reached

Social Bias Frames Explained

Updated 26 January 2026
  • Social Bias Frames are a formalism that captures implicit stereotypes and discriminatory attitudes through structured, multi-dimensional annotations.
  • The framework employs hierarchical, tuple-based schemas to differentiate biases by offensiveness, intent, and group targeting in language.
  • It underpins datasets like SBIC and CDial-Bias, providing computational benchmarks and enhanced methods for bias detection in social media and dialogues.

A social bias frame is a structured formalism designed to systematically capture, annotate, and infer the pragmatic ways in which language projects stereotypes, discriminatory attitudes, and power differentials onto marginalized groups. In natural language, social bias is rarely explicit; rather, bias is embedded in implicatures, implicit attitudes, and context-dependent patterns that reinforce harmful stereotypes. Social bias frames provide annotation schemas, dataset constructions, and computational benchmarks for reasoning about these phenomena in open-domain dialogue and social media, enabling precise analysis, measurement, and mitigation in machine learning applications (Zhou et al., 2022, Sap et al., 2019).

1. Formalism of Social Bias Frames

Social bias frames (SBF) and their extension to dialogues via the Dial-Bias Frame formalize social bias annotation via multidimensional, structured tuples.

In the SBF paradigm (Sap et al., 2019), given a post pp, annotation produces: F(p)=(Llewd,Loff,Lint,Lgrp,G,S,Ling)F(p) = (L_{\mathrm{lewd}}, L_{\mathrm{off}}, L_{\mathrm{int}}, L_{\mathrm{grp}}, G, S, L_{\mathrm{ing}}) where each component encodes:

  • Llewd{yes,maybe,no}L_{\mathrm{lewd}} \in \{\text{yes},\text{maybe},\text{no}\}: contains lewd/sexual content.
  • Loff{yes,maybe,no}L_{\mathrm{off}} \in \{\text{yes},\text{maybe},\text{no}\}: offensive to anyone.
  • Lint{yes,probably,probably not,no}L_{\mathrm{int}} \in \{\text{yes},\text{probably},\text{probably not},\text{no}\}: author’s perceived intent to offend.
  • Lgrp{yes,no}L_{\mathrm{grp}} \in \{\text{yes},\text{no}\}: targets a group rather than individual.
  • GG: free-text group label, e.g., “women,” “Black folks.”
  • SS: short stereotype implication, e.g., “women are less qualified.”
  • Ling{yes,maybe,no}L_{\mathrm{ing}} \in \{\text{yes},\text{maybe},\text{no}\}: annotator believes speaker belongs to the group.

For open-domain dialogues, the Dial-Bias Frame (Zhou et al., 2022) extends this as follows for a two-turn dialogue (c,r)(c, r): F(c,r)=yctx,ydt,ygroup,ybiasF(c, r) = \left\langle y_{\mathrm{ctx}},\, y_{\mathrm{dt}},\, y_{\mathrm{group}},\, y_{\mathrm{bias}} \right\rangle with

  • yctx{0,1}y_{\mathrm{ctx}} \in \{0,1\}: Context‐Sensitivity (0 = context-independent, 1 = context-sensitive)
  • ydt{0,1,2}y_{\mathrm{dt}} \in \{0,1,2\}: Data‐Type (0 = irrelevant, 1 = bias-discussing, 2 = bias-expressing)
  • ygroupGy_{\mathrm{group}} \subseteq G: Targeted social groups, multi-valued.
  • ybias{0,1,2,3}y_{\mathrm{bias}} \in \{0,1,2,3\}: Implied Attitude (0 = irrelevant, 1 = anti-bias, 2 = neutral, 3 = biased).

Annotation proceeds hierarchically, from context-sensitivity to bias type, group, and attitude.

2. Annotation Schema and Sequential Labeling

Both SBF and Dial-Bias frameworks deploy multi-step, hierarchical annotation processes:

  • For SBF, annotation proceeds from coarse-grained categoricals—lewdness, offensiveness, intent—to group implication, with free-text slots for GG (targeted group) and SS (implied stereotype), culminating in optional “in-group” assessment.
  • Dial-Bias annotation (for each response rr and context cc):
    • Annotation halts if irrelevant.
    • 3. Enumerate all targeted groups (ygroupy_{\mathrm{group}}).
    • 4. Assign implied attitude (ybiasy_{\mathrm{bias}}).

This schema accommodates multi-label group assignment but enforces single-label for other dimensions, yielding a tractable and thorough hierarchical structure.

3. Distinctions From Prior Frameworks

Traditional datasets (e.g., StereoSet, CrowS-Pairs) focus on single sentences, providing binary or ternary labels with little contextual nuance. The Social Bias Frames approach (Sap et al., 2019) introduces explicit roles for bias source, target, and inference statement, capturing pragmatic and social meaning. Dial-Bias further advances the annotation structure by:

  • Segmenting dialogue for context sensitivity (CI/CS).
  • Distinguishing bias discussion from bias expression, isolating conversational dynamics overlooked in prior work.
  • Implementing a trichotomy for attitude (anti-bias, neutral, biased) vs. prior binary schemes.
  • Leveraging free-text for fine-grained group identity.

Sun et al. (DIA Safety) partition offensive, sexual, and biased content but lack the integrated, four-dimensional frame and sequential annotation schema (Zhou et al., 2022).

4. Datasets and Benchmarks

Two major annotated corpora operationalize these frameworks:

Name Language Entries Distinct Groups Dimensions
SBIC English 44,671 1,414 7 (SBF tuple)
CDial-Bias Chinese 28,000 171 4 (Dial-Bias tuple)

SBIC (Social Bias Inference Corpus) (Sap et al., 2019) crowdsources annotations across offensive jokes, microaggressions, Twitter abuse, and hate speech communities, yielding 147,139 inference tuples with 32,028 unique stereotype implications.

CDial-Bias (Zhou et al., 2022), collected from Zhihu, focuses on dialogue pairs, with 17,000 bias-related pairs (≈52% bias-discussing), fine-grained group and attitude tags, enabling context-dependent bias detection benchmarks.

Performance metrics in both works reflect multi-head classification and generation tasks, with F₁ scores (offensiveness ≈80%, intent ≈79%, lewd ≈81%) and generation metrics (target group BLEU ≈74, ROUGE-L ≈65).

Auxiliary labels and joint training (e.g., Mixture-of-Experts, multi-task) improve classifier performance in bias detection, and context-sensitive examples pose greater challenge (CS cases ≈15 points lower F₁ than CI in CDial-Bias).

5. Computational Modeling Approaches

Social bias frame inference is cast as a generative modeling problem. Baseline approaches (Sap et al., 2019):

  • Utilize Transformer-based architectures (GPT/GPT-2) without separate classifiers; slot values are represented as special vocabulary tokens.
  • Input consists of raw text concatenated with all target categorical slots and free-text fields.
  • Optimization via next-token cross-entropy over the linearized annotation sequence.
  • Inference by greedy decoding or multinomial sampling, with constrained post-hoc correction for slot dependencies.

For utterance-level bias detection (Zhou et al., 2022), fine-grained classifiers predict ybiasy_{\mathrm{bias}} while optionally leveraging yctxy_{\mathrm{ctx}}, ydty_{\mathrm{dt}}, and topic, demonstrating measurable improvements with auxiliary supervision.

6. Illustrative Frame Instantiations

Social bias frames explicitly model annotator inferences. Consider:

  • SBIC Example: Post: “I hate fat bitches.” Frame: (lewd = yes, offensive
Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Social Bias Frames.