SBIC: Social Bias Inference Corpus
- Social Bias Inference Corpus (SBIC) is a dataset that employs structured, multi-dimensional Social Bias Frames to capture nuanced language bias and implicit stereotypes.
- It utilizes a detailed annotation scheme that records offensiveness, intent, and group targeting through both categorical labels and free-text fields.
- SBIC supports bias research by enabling the training and evaluation of neural models for social bias detection and mitigating biased outputs in dialogue systems.
A Social Bias Frame is a formal, multifaceted structure devised to capture, decompose, and systematically annotate the complex pragmatic implications of language that perpetuates, discusses, or subverts social bias. Developed to address limitations in prior work, Social Bias Frames support fine-grained annotation and analysis through several orthogonal yet interrelated dimensions. Core applications include detailed corpus annotation for bias research, training and evaluating neural models for social bias detection, and providing foundations for constructing safer, normatively aware open-domain dialog and generation systems (Sap et al., 2019, Zhou et al., 2022).
1. Formal Structure of Social Bias Frames
A Social Bias Frame (SBF) specifies a multi-dimensional annotation scheme for any text instance (typically, a sentence or dialog turn), with each dimension encoding a distinct aspect of bias or implied social meaning. The prototypical SBF is a tuple applied to a document or utterance :
where:
- : presence of lewd or sexual content
- : judged offensiveness
- : perceived intent to offend
- : whether a social/demographic group is invoked
- : free-text group label(s) (e.g., "women", "Black folks")
- : free-text stereotype or proposition being implied
- : whether the speaker is an in-group member
This design enables simultaneous categorical slot filling and open-text inference, essential for linguistic phenomena where bias is communicated through implicature, connotation, or stereotyped propositions (Sap et al., 2019).
2. Sequential Annotation Workflow and Granularity
SBF annotation schemas unfold in a sequential, slot-filling process:
- Annotators label the text for (sexual content, offensiveness, intent).
- If group targeting () is detected, annotators provide group label(s) ().
- For each , annotators articulate stereotyped implications (e.g., “women are less qualified”).
- records in-group status.
Categorial variables can be binarized for simplified analysis (e.g., "yes/maybe" mapped to positive, "no" to negative), but free-text fields and remain central for interpretability and broader coverage.
This strategy is also evident in the Dial-Bias Frame for dialogues (Zhou et al., 2022), which incorporates: where is context sensitivity, is data-type (irrelevant, bias-discussing, bias-expressing), is a set of targeted groups, is the implied attitude.
| Dimension | Social Bias Frame (Sap et al., 2019) | Dial-Bias Frame (Zhou et al., 2022) |
|---|---|---|
| Contextualization | No explicit context | Explicit context sensitivity |
| Data-Type | Not modeled | Discriminates discussing vs. expressing bias ( ) |
| Target Group | (free text) | (free text/multi-set) |
| Attitude/Implicature | (free text implication) | (categorical: anti-bias/neutral/biased) |
| Offensiveness | Not modeled separately |
3. Application to Corpus Construction and Benchmarking
The Social Bias Inference Corpus (SBIC) (Sap et al., 2019) operationalizes the SBF by annotating 44,671 English-language social media posts, resulting in approximately 147,139 frame instances mapping posts to group-target implication tuples. The design accommodates:
- Fine-grained group target identification (1,414 unique labels)
- Extraction of 32,028 unique implicit stereotypes
- Precise breakdowns: 44.8% of posts offensive, 50.9% group-targeted, 43.4% with perceived intent
For dialog systems, the Dial-Bias Frame underpins the CDial-Bias dataset (Zhou et al., 2022), comprising ~28,000 Chinese question-response pairs focused on race, gender, region, and occupation. This dataset includes over 17,000 bias-related dialogues, distinguished by whether bias is discussed (52%) or expressed, and 171 distinct targeted groups.
4. Modeling and Evaluation Paradigms
Frame recovery is cast as a conditional sequence generation problem. In (Sap et al., 2019), SBFs are linearized and generated auto-regressively by transformer models (e.g., GPT-2), with output tokens corresponding to categorical slot values and free-text fields. Loss is computed as token-level cross-entropy over linearized targets, skipping irrelevant slots.
Performance for categorical slots (offensive, intent, lewd, group-targeted) using transformer models reaches –, with in-group identification being notably harder ( ). Free-text match for group () and implication () is evaluated with BLEU-2 and ROUGE-L. Generating target groups is relatively easy (BLEU 74), while implication recovery is more challenging (BLEU 50, ROUGE-L 42).
The CDial-Bias dataset leverages fine-grained, multi-task classification strategies. Models informed by auxiliary frame labels achieve measurable improvements (1% F1), particularly for context-sensitive cases, which are approximately 15 points harder than context-independent examples (Zhou et al., 2022).
5. Comparative Analysis with Related Frameworks
Earlier bias detection frameworks such as StereoSet and CrowS-Pairs operate at the sentence level with coarse triplet labels (stereotypical / anti-stereotypical / unrelated) and lack contextual or dialog-aware structure. DIA Safety uses dichotomous classifiers for offensive and biased content but omits multi-step structured annotation (Zhou et al., 2022).
Social Bias Frames (Sap et al., 2019) expand on standard pipeline annotation by incorporating not only slot-based offense and group targeting, but also free-text explanations, shifting emphasis from purely surface-level features to pragmatic, commonsense reasoning.
The Dial-Bias Frame further innovates by:
- Systematically decomposing conversational phenomena into context sensitivity, bias data-type (discussing vs. expressing), targeted group(s), and attitude.
- Employing hierarchical, sequential annotation to reduce confounds and clarify annotator decisions.
- Providing a trichotomous attitude dimension (explicit anti-bias, neutral, biased), in contrast to binary schemes.
These advances enable richer, normatively grounded annotation and more granular model evaluation.
6. Significance, Limitations, and Research Trajectories
Social Bias Frames facilitate detailed empirical study of linguistic bias, stereotype propagation, and marginalized group representation at scale. Their architecture supports not only evaluation, but also the development of norm-aware and mitigative interventions in dialog models.
However, challenges remain. Generative models, even when guided by SBFs, perform poorly on fine-grained free-text implication recovery, pointing to limitations in current pragmatic inference capabilities. In-group speaker identification also proves highly imbalanced and difficult. Further, while SBFs provide structured insight, operationalizing interventions or automated moderation on the basis of these labels requires robust advances in context tracking, discourse modeling, and ethical specification (Sap et al., 2019, Zhou et al., 2022).
Ongoing research is expected to refine frame designs for multilingual, multimodal, and real-time applications, and to further anchor annotation and modeling in practical safety requirements for deployed systems.