Confidence-Aware Fine-Grained Debate Framework
- The paper presents the CFD framework that leverages multi-agent debates to refine sharenting risk assessments with consensus-driven explanations and confidence scores.
- It employs structured annotation protocols and advanced ensemble methods to resolve inter-annotator disagreements and improve labeling reliability.
- CFD enrichment boosts macro F1 by up to 10.1% over baseline models, demonstrating its practical impact on automated risk assessments for child privacy.
The Online Safety Facebook Sharenting Risk Dataset is an expertly annotated corpus of public Facebook posts concerned with parental sharing (sharenting) of children’s information online. The resource is designed to support the development and benchmarking of automated online safety models, particularly those that predict or analyze the risk associated with disclosing sensitive information about minors. The dataset provides detailed multi-label annotations covering behavioral sharenting categories and single-label risk assessments, and is augmented via advanced LLM ensemble methods, specifically the Confidence-Aware Fine-Grained Debate (CFD) framework, to enrich the dataset with consensus-driven explanations and confidence signals relevant for robust automated risk assessment (Mao et al., 6 Dec 2025).
1. Dataset Construction and Annotation
The dataset consists of 1,901 public Facebook posts gathered from parenting and family-support groups and pages. Posts were included only if they contained sharenting content, defined as any disclosure involving a child; posts in non-English language, commercial spam, advertisements, or unrelated news items were excluded. All personal identifiers (such as usernames and profile URLs) were stripped to protect privacy.
Annotation utilized domain experts in child protection. Each post received:
- Sharenting Behaviour Labels (multi-label, 5 categories):
- Personal Data: Posts containing GDPR-protected attributes about the child (e.g., name, age, school, images, videos).
- Physical/Mental/Emotional Health: Any mention of a child’s health, including diagnosed/undiagnosed conditions, therapeutic or medical referrals, behavioral concerns, or mental well-being.
- Intervention Services: Explicit reference to engagement with social services, educational needs authorities (e.g., child and adolescent mental health services, foster care), or formal interventions.
- Disruptive Home Life: Disclosures involving domestic violence, neglect, custody disputes, or co-parenting conflicts.
- Other/None: Posts irrelevant to the above (such as general queries, toy preferences).
- Sharenting Risk Label (single-label, 4 classes):
- A (High risk): Explicit disclosure of GDPR data or sensitive health/intervention indicators.
- B (Moderate risk): Speculative or partial disclosure, or non-specific but serious information.
- C (Low risk): Disclosure with minimal identifiable information (e.g., city only, minor behavioral notes).
- D (No risk): No sharenting content.
Annotation required all posts to be labeled by three independent experts; conflicts were resolved through adjudication by discussion or majority vote. Although the exact Fleiss’ κ is not provided, the annotation process and formal adjudication yield a likely inter-annotator agreement in the substantial range (κ ≈ 0.7–0.8) (Mao et al., 6 Dec 2025).
Dataset statistics are summarized below:
| Statistic | Value |
|---|---|
| Total posts | 1,901 |
| Training/debate split | 1,520 posts |
| Held-out test set | 381 posts |
| Risk-level distribution (approx) | 25% High, 30% Moderate, 20% Low, 25% None |
2. Annotation Schema and Guidelines
Annotation protocols were guided by GDPR and online safety best practices. Each annotation instance for a post includes up to five binary sharenting-behavior indicators and a mutually exclusive risk category. Behavior label definitions were designed for regulatory and practical relevance.
The high-level process can be summarized:
- Screen post content for explicit and implicit sharenting indicators.
- Assign multi-label annotations based on the five defined behavioral categories.
- Assign risk based on the most severe disclosure present, per the provided risk taxonomy.
- Curate annotation guidelines to ensure reproducibility and consistency among experts.
- Remove any personal identifiers from dataset records prior to distribution (Mao et al., 6 Dec 2025).
3. CFD Data Enrichment Mechanism
The dataset is enriched using the Confidence-Aware Fine-Grained Debate (CFD) framework, which systematically employs multiple LLM agents (e.g., Qwen2.5, Mistral3) as annotators. The CFR process is as follows:
- Initial Labeling: Each agent labels all five behavior categories (yes/no), provides stepwise explanations, and rates its confidence (scale 1–10) both per category and overall (Categorical Chain-of-Thought).
- Consensus Check: If all agents concur on all categories, labels are accepted. If not, a structured debate is triggered.
- Debate: Agents view one another’s stepwise explanations and confidence scores, then may keep or change their label, justify, and self-rate anew.
- Finalization: Unanimous post-debate labels are accepted; otherwise, majority-vote or recourse to a “judge LLM” finalizes the annotation.
CFD also includes sampling-based confidence estimation by comparing perturbed agent reasoning using Natural Language Inference (NLI) as follows:
with agreement and confidence signals averaged and linearly scaled to [1,10]. The full debate transcript (all stepwise explanations with fine-grained confidence) forms a key enriched feature for downstream modeling (Mao et al., 6 Dec 2025).
4. Downstream Task Design and Evaluation
The principal downstream use case is automated sharenting risk classification. The paper’s protocol uses Llama 3 70B in a zero-shot, chain-of-thought (CoT) style prompt. Feature variants encompass:
- Baseline: Post content only
- Self-consistency (majority labels and reasoning traces)
- CFD team-predicted labels
- CFD debate transcripts (multi-agent dialogue)
- Ground-truth expert-annotated labels
The performance is evaluated on the 381-post held-out set with macro-averaged F1 and accuracy over the risk classes (A/B/C/D). Key quantitative results:
| Enrichment Variant | Macro F1 (mean ± std) |
|---|---|
| Baseline (post only) | 0.60 ± 0.01 |
| + ground-truth labels | 0.63 ± 0.01 |
| + self-consistency majority | 0.64 ± 0.02 |
| + self-consistency reasoning | 0.69 ± 0.01 |
| + CFD team-predicted labels | 0.64 ± 0.02 |
| + CFD debate transcripts | 0.66 ± 0.01 |
The CFD debate transcript enrichment yields a 10.1% macro F1 improvement over the post-only baseline; self-consistency using reasoning traces achieves the highest F1, but among LLM-based enrichment methods, debate transcripts are most effective when expert annotation is unavailable (Mao et al., 6 Dec 2025).
5. Analysis, Context, and Significance
Performance improvements are consistently observed with the use of CFD enrichment, even where LLM-predicted behavior labels are noisy. Debate transcripts offer richer evidence structures than flat labels: multi-agent chains of argument and counter-argument surface fine-grained cues critical for risk assessment. For example, content referencing specialist therapeutic intervention (“CAMHS”) or educational support (“SENCO”) may be missed by single-label LLM approaches but appropriately escalated by multi-agent debate to high-risk status.
A plausible implication is that debate-based enrichment is particularly advantageous for hierarchical or class-imbalanced safety tasks with moderate to high inter-annotator disagreement. In this application, ~28.6% of posts triggered initial agent disagreement, highlighting the complexity and interpretive nuance of sharenting risk detection (Mao et al., 6 Dec 2025).
6. Connections to Visual Privacy Risk Modeling
The development of the Facebook sharenting risk dataset is closely informed by the methodology of visual privacy risk modeling, notably the VISPR dataset and framework (Orekondy et al., 2017). VISPR’s pipeline comprises:
- Curation of a multi-label privacy attribute dataset (22,167 images with 68 privacy attributes)
- User survey for fine-grained privacy preferences by image attribute (k-means clustered into “privacy profiles”)
- Max-based risk score computation
- Training both attribute predictors (CNNs) and personalized risk regressors (AP-PR, PR-CNN)
Several aspects translate to the text modality and sharenting risk setting:
- Attribute taxonomy extension to behavioral risk factors in text;
- User preference elicitation replaced by expert-driven annotation and (potentially) parental behavioral surveys;
- Enriched feature design (e.g., rich debate transcripts) analogous to visual risk explanation;
- Risk scoring and classifier evaluation mapped to text-based posts and enriched annotation sets.
This framework provides a roadmap for systematically constructing regulatory-compliant, explainable datasets and models for child online safety on Facebook and other platforms (Orekondy et al., 2017).
7. Regulatory Compliance and Dataset Extension
Annotation and risk assessment align with GDPR (“age < 13”) and, by extension, COPPA requirements, with child-centric data collection and behavioral label definitions reflecting both EU and US privacy doctrine for minors. Age-specific annotation flags (e.g., “child < 6 yrs”, “adolescent 6–13 yrs”) and location fields may be incorporated to inform risk thresholds or classifier calibration by region and age bracket (Orekondy et al., 2017).
A plausible implication is that this dataset structure and enrichment pipeline can be extended further to account for:
- Multi-modal posts (image/text/video fusion)
- Automated extraction of metadata (e.g., GPS tags, face recognition meta, comment-named entities)
- Tailored risk modeling for localized legal and cultural norms around sharenting and child privacy
Such expansion could facilitate parent-facing risk advisors, browser extensions, or moderation pipelines capable of robust, regulation-aligned intervention on sharenting content before it is posted publicly (Orekondy et al., 2017, Mao et al., 6 Dec 2025).