Social Bias Frames Explained
- Social Bias Frames are a formalism that captures implicit stereotypes and discriminatory attitudes through structured, multi-dimensional annotations.
- The framework employs hierarchical, tuple-based schemas to differentiate biases by offensiveness, intent, and group targeting in language.
- It underpins datasets like SBIC and CDial-Bias, providing computational benchmarks and enhanced methods for bias detection in social media and dialogues.
A social bias frame is a structured formalism designed to systematically capture, annotate, and infer the pragmatic ways in which language projects stereotypes, discriminatory attitudes, and power differentials onto marginalized groups. In natural language, social bias is rarely explicit; rather, bias is embedded in implicatures, implicit attitudes, and context-dependent patterns that reinforce harmful stereotypes. Social bias frames provide annotation schemas, dataset constructions, and computational benchmarks for reasoning about these phenomena in open-domain dialogue and social media, enabling precise analysis, measurement, and mitigation in machine learning applications (Zhou et al., 2022, Sap et al., 2019).
1. Formalism of Social Bias Frames
Social bias frames (SBF) and their extension to dialogues via the Dial-Bias Frame formalize social bias annotation via multidimensional, structured tuples.
In the SBF paradigm (Sap et al., 2019), given a post , annotation produces: where each component encodes:
- : contains lewd/sexual content.
- : offensive to anyone.
- : author’s perceived intent to offend.
- : targets a group rather than individual.
- : free-text group label, e.g., “women,” “Black folks.”
- : short stereotype implication, e.g., “women are less qualified.”
- : annotator believes speaker belongs to the group.
For open-domain dialogues, the Dial-Bias Frame (Zhou et al., 2022) extends this as follows for a two-turn dialogue : with
- : Context‐Sensitivity (0 = context-independent, 1 = context-sensitive)
- : Data‐Type (0 = irrelevant, 1 = bias-discussing, 2 = bias-expressing)
- : Targeted social groups, multi-valued.
- : Implied Attitude (0 = irrelevant, 1 = anti-bias, 2 = neutral, 3 = biased).
Annotation proceeds hierarchically, from context-sensitivity to bias type, group, and attitude.
2. Annotation Schema and Sequential Labeling
Both SBF and Dial-Bias frameworks deploy multi-step, hierarchical annotation processes:
- For SBF, annotation proceeds from coarse-grained categoricals—lewdness, offensiveness, intent—to group implication, with free-text slots for (targeted group) and (implied stereotype), culminating in optional “in-group” assessment.
- Dial-Bias annotation (for each response and context ):
- Annotation halts if irrelevant.
- 3. Enumerate all targeted groups ().
- 4. Assign implied attitude ().
This schema accommodates multi-label group assignment but enforces single-label for other dimensions, yielding a tractable and thorough hierarchical structure.
3. Distinctions From Prior Frameworks
Traditional datasets (e.g., StereoSet, CrowS-Pairs) focus on single sentences, providing binary or ternary labels with little contextual nuance. The Social Bias Frames approach (Sap et al., 2019) introduces explicit roles for bias source, target, and inference statement, capturing pragmatic and social meaning. Dial-Bias further advances the annotation structure by:
- Segmenting dialogue for context sensitivity (CI/CS).
- Distinguishing bias discussion from bias expression, isolating conversational dynamics overlooked in prior work.
- Implementing a trichotomy for attitude (anti-bias, neutral, biased) vs. prior binary schemes.
- Leveraging free-text for fine-grained group identity.
Sun et al. (DIA Safety) partition offensive, sexual, and biased content but lack the integrated, four-dimensional frame and sequential annotation schema (Zhou et al., 2022).
4. Datasets and Benchmarks
Two major annotated corpora operationalize these frameworks:
| Name | Language | Entries | Distinct Groups | Dimensions |
|---|---|---|---|---|
| SBIC | English | 44,671 | 1,414 | 7 (SBF tuple) |
| CDial-Bias | Chinese | 28,000 | 171 | 4 (Dial-Bias tuple) |
SBIC (Social Bias Inference Corpus) (Sap et al., 2019) crowdsources annotations across offensive jokes, microaggressions, Twitter abuse, and hate speech communities, yielding 147,139 inference tuples with 32,028 unique stereotype implications.
CDial-Bias (Zhou et al., 2022), collected from Zhihu, focuses on dialogue pairs, with 17,000 bias-related pairs (≈52% bias-discussing), fine-grained group and attitude tags, enabling context-dependent bias detection benchmarks.
Performance metrics in both works reflect multi-head classification and generation tasks, with F₁ scores (offensiveness ≈80%, intent ≈79%, lewd ≈81%) and generation metrics (target group BLEU ≈74, ROUGE-L ≈65).
Auxiliary labels and joint training (e.g., Mixture-of-Experts, multi-task) improve classifier performance in bias detection, and context-sensitive examples pose greater challenge (CS cases ≈15 points lower F₁ than CI in CDial-Bias).
5. Computational Modeling Approaches
Social bias frame inference is cast as a generative modeling problem. Baseline approaches (Sap et al., 2019):
- Utilize Transformer-based architectures (GPT/GPT-2) without separate classifiers; slot values are represented as special vocabulary tokens.
- Input consists of raw text concatenated with all target categorical slots and free-text fields.
- Optimization via next-token cross-entropy over the linearized annotation sequence.
- Inference by greedy decoding or multinomial sampling, with constrained post-hoc correction for slot dependencies.
For utterance-level bias detection (Zhou et al., 2022), fine-grained classifiers predict while optionally leveraging , , and topic, demonstrating measurable improvements with auxiliary supervision.
6. Illustrative Frame Instantiations
Social bias frames explicitly model annotator inferences. Consider:
- SBIC Example: Post: “I hate fat bitches.” Frame: (lewd = yes, offensive