In-Group & Out-Group Annotator Dynamics

Updated 20 November 2025

In-group and out-group annotator dynamics are systematic differences in labeling driven by social identity, resulting in in-group favoritism and out-group derogation.
Quantitative frameworks such as empathy gap metrics and cohesion indices provide measurable evidence of bias in annotation tasks.
Mitigation strategies—diverse recruitment, belief-based labeling, and optimized feedback loops—enhance data quality and fairness in AI systems.

In-group and out-group annotator dynamics refer to the systematic ways in which annotator social identity—demographic, political, cultural, or organizational—modulates subjective judgment and labeling in data annotation tasks. These dynamics reflect both classic psychological phenomena (e.g., in-group favoritism, out-group derogation) and emergent behaviors unique to sociotechnical and machine learning workflows, influencing both the reliability and representativeness of labeled datasets for training, evaluation, and deployment of AI systems.

1. Definitions and Empirical Manifestations

In-group annotators are those whose salient social or demographic identity attributes align with the referent of the item to be labeled, or with the group identity primed in the annotation context. Out-group annotators lack this alignment for a given item or context. Practically, annotators are assigned to in-group or out-group status based on explicit demographic matching (e.g., race, political party, gender) relative to the annotation target, organizational proximity (internal vs. external), or induced persona during task setup (Fleisig et al., 2023, Rosenthal et al., 13 Oct 2025, Hou et al., 2 Mar 2025, Dong et al., 2024).

Empirical studies consistently demonstrate:

Higher cohesion, reliability, and internal agreement within in-groups versus across groups, often measured via IRR/α, negentropy, or plurality size (Pandita et al., 2024).
Systematic intergroup bias: in-group annotators rate own-group content more positively, assign higher empathy or credibility, or apply more lenient standards, while out-group annotators are more critical, discount emotional intensity, or more readily label content as offensive or uncivil (Hou et al., 2 Mar 2025, Nishi, 2023, Dong et al., 2024).
"Inbetweeners": individuals or items occupying the boundary or center of an ideological/socio-demographic continuum are frequently excluded from both groups, treated as out-group by either side, or assigned ambiguous/discordant labels (Yang et al., 2019).

2. Quantitative Frameworks and Metrics

Annotation studies employ precise mathematical definitions to quantify in-group and out-group dynamics.

Empathy gap metrics: In LLM-based studies, empathy gaps are captured via z-normalized matrices of predicted scores (e.g., emotion intensity), with block-diagonal (in-group) entries contrasted against off-diagonal (out-group) entries. The empathy-gap score

$\delta = \mathbb{E}[\hat{I} | g_p = g_e] - \mathbb{E}[\hat{I} | g_p \neq g_e]$

where $\hat{I}$ is predicted intensity, measures the systematic elevation of in-group over out-group predictions. Significance is established via permutation testing and paired comparisons (Hou et al., 2 Mar 2025).

Cohesion metrics: In-group ( $IRR_g$ ), cross-group ( $XRR_{g, h}$ ), and association indices ( $GAI_{g,h} = IRR_g / XRR_{g,h}$ ) use Krippendorff's α and entropy-based quantities, enabling formal comparison of internal agreement and cross-group alignment. High $GAI>1$ indicates factional cohesion exceeds cross-group agreement (Pandita et al., 2024).

Group bias estimation: GroupAnno, a probabilistic graphical framework, assigns group-conditional priors over annotator sensitivity/specificity, learning demographic-group corrections via extended EM to jointly optimize label inference and annotator reliability modeling (Liu et al., 2021).

Bias reduction via belief/vicarious labels: Bias magnitude is tracked by the difference between group means in judgments versus beliefs, with belief elicitation routinely reducing observed bias by >85% in political labeling contexts (Jakobsen et al., 2024, Pandita et al., 2024).

3. Mechanisms and Psychological Models

In-group/out-group effects in annotation are understood as the result of:

Motivated social identity processing: Annotators selectively interpret, rate, and endorse items to favor their in-group and police or discount the out-group. In annotation, this manifests as higher incivility or offensiveness scores for out-group-originated comments and stricter restriction support (Nishi, 2023).
Perspective-taking and meta-reasoning: Eliciting beliefs about how "others" (often, out-group members) would label forces annotators toward a meta-perspective, moderating idiosyncratic or partisan extremity. This mechanism reduces both mean bias and variance in labels, yielding more representative and consensus-anchored targets (Jakobsen et al., 2024).
Structural polarization: Opinion dynamics models (e.g., DeGroot with opposition) formalize persistent group polarization as a function of network structure and negative ties, showing that bi-polarization, divergence, or consensus depend on signed graph balance properties (Eger, 2013). In annotation, this predicts the persistence of labeling disagreement and the impossibility of global unbiased aggregation where strong group antagonism is present.

4. Practical Methodologies for Diagnosis and Mitigation

Annotation pipelines have developed a variety of strategies to recognize, measure, and attenuate in-group/out-group effects.

Demographically informed modeling: Annotator group bias is captured either via direct demographic matching (groupings by race, language, age, political affiliation) or by combining structured demographic and behavioral survey features in individual-level rating models (Fleisig et al., 2023, Liu et al., 2021).
Bias-aware aggregation: GroupAnno and similar models treat group means as informative priors on annotator behavior, allowing label inference to de-weight or correct for over- or underrepresentation of group-specific bias (Liu et al., 2021).
Belief/vicarious annotation protocols: Eliciting beliefs about other groups' judgments or vicarious annotations increases both in-group and cross-group cohesion and erodes factional barriers, especially when intermediated via cohesive groups (e.g., Independents as cross-group proxies) (Jakobsen et al., 2024, Pandita et al., 2024).
Feedback structure optimization: In complex annotation tasks, "in-group" (internal) annotators with direct, rapid feedback loops to project leaders yield higher-quality and more coherent data, while "out-group" (external) annotators increase throughput and diversity but at the cost of shallow engagement and passage diversity (Rosenthal et al., 13 Oct 2025).

Approach	Bias Diagnosis	Mitigation Impact
GroupAnno probabilistic EM	Sensitivity/specificity by demographic split	Improves truth inference/F1, de-bias
Belief/vicarious annotation	Within/group mean difference, variance	Reduces bias, stabilizes variance
Feedback loop optimization	Qualitative metrics, edit count, diversity	Balances quality/depth (internal)

5. Empirical Patterns and Case Studies

Substantial evidence confirms that:

Systematic group-level differences are robust: in the Wikipedia Detox tasks, native speakers label ~5% more items as "toxic" than non-natives, and over-30 annotators label 2.5% more than under-30 (Liu et al., 2021).
Political annotation scenarios (e.g., incivility, argument identification) exhibit clear partisan bias in labeling, with judgment bias ≈0.14–0.15 between Democrats and Republicans, attenuated to ≈0.01–0.02 via belief elicitation (Jakobsen et al., 2024, Nishi, 2023).
LLMs mirror human biases, predicting higher empathy or emotion intensity for in-group experiences, and exhibiting sharper negative correction when prompted with out-group value sets (out-group bias magnitude 2–5× in-group positivity) (Hou et al., 2 Mar 2025, Dong et al., 2024).
"Inbetweeners" or ideological/identity-center items are repeatedly treated as out-group, and data sparsity emerges in middle-range cases, increasing variance and uncertainty (Yang et al., 2019, Pandita et al., 2024).
Filtering for annotator quality using tools like CrowdTruth, low-quality annotators are identified and removal yields higher in-group and cross-group cohesion, reducing apparent but spurious disagreement (Pandita et al., 2024).

6. Design Recommendations and Open Limitations

Best practices derived from annotator dynamics research include:

Diverse, balanced recruitment: Stratify annotator pools by key demographic axes, and intentionally include representatives from marginalized or small groups to avoid representational bias and suppress dominance of majority views (Hou et al., 2 Mar 2025, Pandita et al., 2024).
Task design for consensus/explanation: Leverage perspective-taking/anchoring rounds, vicarious annotation, and explicit calibration to bridge group-specific interpretations and highlight middle ground (Jakobsen et al., 2024, Yang et al., 2019).
Analytical monitoring: Apply per-group IRR, XRR, GAI, and robust permutation testing to map disagreement and identify problematic splits or sources of instability (Pandita et al., 2024).
Judicious feedback loop engineering: For tasks where depth, faithfulness, and quality are paramount, exploit closely coupled internal feedback, and deploy external annotation where coverage and diversity are critical; combine phases with targeted review pipelines (Rosenthal et al., 13 Oct 2025).

Limitations persist in reliably identifying all axes of disagreement a priori ("unknown unknowns"), and extending protocols to unseen or highly intersectional populations remains an open challenge (Jakobsen et al., 2024, Fleisig et al., 2023). Persistent out-group negativity and failure of wisdom-of-crowds phenomena under strong negative ties are intrinsic risks dictated by underlying social and network structure (Eger, 2013).

7. Theoretical and Sociotechnical Implications

The theoretical foundations of in-group and out-group annotator dynamics integrate cognitive category boundary formation, DeGroot-like opinion dynamics models with antagonism, and systematic bias estimation in statistical label aggregation. The core insight is that intergroup bias is not reducible to noise: it emerges predictably from social identity, group structure, and network topology, and must be directly modeled and engineered for at all stages of dataset creation and model training.

The operational paradigm is a shift from treating disagreement as a nuisance or error, to a signal about underlying representational divides—requiring nuanced aggregation methods, perspective-taking interventions, and transparent cohort-specific analysis to ensure fairness, validity, and societal alignment in AI systems (Liu et al., 2021, Pandita et al., 2024, Yang et al., 2019, Hou et al., 2 Mar 2025).