Facebook Sharenting Risk Dataset

Updated 12 December 2025

Facebook sharenting risk dataset is an expert-annotated resource capturing parental disclosures and associated privacy risks on Facebook.
It employs multi-label annotations and a CFD framework with LLM-based debate transcripts to classify sharenting behaviours and GDPR-sensitive information.
Downstream classifiers integrating CFD debate transcripts show significant performance gains, enhancing empirical risk evaluation in online child protection.

The Online Safety Facebook Sharenting Risk Dataset is an expert-annotated corpus and experimental testbed designed to support automated detection and risk evaluation of parental disclosures about children (“sharenting”) in Facebook posts. The dataset targets the intersection of online child privacy, regulatory compliance (notably GDPR), and the operationalization of risk in natural language supervision tasks. In tandem with recent LLM-based enrichment methodologies, it represents a key resource for both empirical child-protection research and the study of confidence-aware data annotation and classification (Mao et al., 6 Dec 2025).

1. Dataset Composition and Annotation Schema

The dataset comprises 1,901 public posts sourced from Facebook parenting and family-support forums. All posts involve disclosure about a child, explicitly filtering out spam, advertisements, news, and non-English content. Personal identifiers such as usernames and profile URLs are removed to preserve privacy. Annotation is expert-driven: three domain professionals, each with a child-protection background, independently label every post for both sharenting behaviour and risk level, with disputes resolved via structured discussion or majority vote. Inter-annotator agreement is informally estimated as “substantial” ( $\kappa\approx0.7$ –$0.8$).

Sharenting annotations are multi-label for “behaviours” and single-label for “risk”:

Label Type	Categories
Sharenting Behaviour	Personal Data (GDPR-protected info), Health (physical/mental), Intervention Services,
	Disruptive Home Life, Other/None
Sharenting Risk Level	A (High), B (Moderate), C (Low), D (None)

Personal Data covers child identifiers (name, age, address, school, media).
Health involves physical/mental/emotional disclosures, including therapy.
Intervention Services signals engagement with social care or special education.
Disruptive Home Life logs domestic violence or custody issues.
Other/None serves as a catch-all.

Risk is stratified: A (high) is triggered by explicit disclosures (name, therapy, images); B (moderate) by speculative or partial signals; C (low) by innocuous or non-GDPR references; D (none) by posts with no sharenting.

Distribution on the test-set: High (~25%), Moderate (~30%), Low (~20%), None (~25%) (Mao et al., 6 Dec 2025).

2. Confidence-Aware Fine-Grained Debate (CFD) Framework

The CFD framework is an LLM-based data enrichment and consensus protocol specifically engineered to address noisy or ambiguous annotations in real-world NLP data (Mao et al., 6 Dec 2025). Multiple LLM “agents” (e.g., Qwen2.5, Mistral3) act as annotators, initially generating binary labels for each sharenting category along with step-by-step reasoning and confidence (on a 1–10 scale). If there is disagreement, a structured debate phase is triggered: each agent is shown their own and others’ rationales, and may revise its prediction and confidence. Label consensus is reached via unanimity or, failing that, majority vote (possibly adjudicated by a “judge” LLM).

Fine-grained confidence scoring leverages both self-verbalized and sampling-based techniques: the latter perturbs input or LLM seeds and uses NLI models to compute the entailment and agreement between sub-reasoning steps. Explanation-level and answer-level confidences are aggregated and linearly scaled.

The full debate transcripts—including individual rationales, back-and-forth argumentation, and associated confidences—are preserved as dataset features, allowing downstream models to consume not only categorical outcomes but the supporting discourse.

CFD Consensus Logic (abbreviated pseudocode)

for each post x:
  for each agent a:
    (A0_a, R0_a, Conf0_a) ← Cat-CoT_prompt(x)
  if unanimous({A0_a}):
    final_labels ← that unanimous set
  else:
    for each agent a:
      provide {A0_others, R0_others, Conf0_others}
      (A1_a, R1_a, Conf1_a) ← StructuredDebate(a, x)
    if unanimous({A1_a}):
      final_labels ← that unanimous set
    else:
      final_labels ← majority_vote({A1_a}) or JudgeLLM({A1_a, R1_a, Conf1_a})

3. Downstream Classification Pipeline and Enrichment

A primary application is Facebook sharenting risk classification. The zero-shot risk classifier utilizes Llama 3 70B under a chain-of-thought prompt paradigm, ingesting the raw post text and optionally the enriched features (label sets, rationales, debate transcripts). Several feature enrichment modalities are evaluated:

Input/Enrichment Variant	Macro F1 (mean ± std)
Post only (baseline)	0.60 ± 0.01
+ Ground-truth labels	0.63 ± 0.01
+ Self-consistency majority labels	0.64 ± 0.02
+ Self-consistency reasoning	0.69 ± 0.01
+ CFD team-predicted labels	0.64 ± 0.02
+ CFD debate transcripts	0.66 ± 0.01

The most robust improvement over the baseline is observed when injecting CFD debate transcripts directly into the classifier (relative gain +10.1%), while self-consistency reasoning traces slightly outperform debates on this split (Mao et al., 6 Dec 2025).

4. Annotation Signal, Dataset Features, and Evaluation Metrics

Posts are split 1,520/381 for enrichment and held-out evaluation. For each post, final annotation comprises (i) the multi-label sharenting behaviours, (ii) the single risk class, and (iii) the CFD-enriched debate transcript with fine-grained confidence. Annotation disagreement is inherently higher than image-based privacy datasets (e.g., VISPR), given the subtleties of language and context in text-based sharenting disclosures.

For classifier evaluation, metrics follow standard practice:

Precision, Recall, and F1 computed per risk class ( $\ell\in\{\mathrm{A},\mathrm{B},\mathrm{C},\mathrm{D}\}$ )
Macro-F1: arithmetic mean over classes
Accuracy: aggregate correct predictions over total $N$

Debate transcripts enable models to surface subtle cues and counter-arguments, especially beneficial when behaviour/risk mapping is not lexically explicit. For example, references to services like CAMHS or SENCO, non-obvious markers of intervention services, are better detected when the classifier is fed the underlying LLM debate.

5. Context: Relation to VISPR and Broader Online Safety Taxonomies

VISPR (Orekondy et al., 2017) serves as a key antecedent, providing conceptual and methodological scaffolding for privacy risk annotation—albeit focused on images. It introduces a taxonomy of 68 privacy attributes (identity, documents, relations, location, etc.), multi-label image annotations, and user-profile-driven risk scoring ( $R(u,I)=\max_{a} [y_a \cdot u_a]$ ). To adapt the VISPR methodology for Facebook sharenting, one would design a taxonomy emphasizing:

Child-specific identifiers (face, name, school, age)
Social context (family, friends, home life)
Location leakage (address, event, GPS)
Visual and textual signals (captions, tag metadata)

Traditional VISPR pipelines center on multi-label CNNs and personalized risk via profile clustering, while text-based sharenting risk pivots toward LLM-driven annotation and classification, yet both deploy expert-derived label taxonomies, multi-stage adjudication, and integration of user-defined privacy/risk preferences.

The explicit linkage to GDPR, COPPA, and “age-flag” variables reflects the distinctive regulatory landscape of child privacy. Incorporating such regulatory compliance as part of the attribute/taxonomy design is essential both for legal utility and for increasing the ecological validity of automated sharenting detection pipelines (Orekondy et al., 2017, Mao et al., 6 Dec 2025).

6. Analysis, Impact, and Research Directions

CFD-enriched annotation, and in particular debate transcripts, systematically outperform non-enriched and single-agent label sets on downstream sharenting risk classification. This is attributable to (1) the exposure of fine-grained, context-sensitive rationales and (2) the simulation of human annotator disagreement and reconciliation. Notably, even noisy LLM-generated labels can approximate or exceed the utility of “oracle” expert labels for downstream performance.

Representative failures of the baseline (e.g., misclassifying intervention disclosures as low risk) are corrected by the injection of enriched multi-agent rationales. A plausible implication is that future annotation and risk-detection systems will increasingly exploit such multi-agent, debate-driven signals rather than solely “ground-truth” expert supervision, particularly in domains with high ambiguity or where annotation costs are prohibitive.

Tasks with relatively few behaviour classes and moderate human disagreement (5 classes; ≈28.6% initial disagreement in this context) show the greatest benefit from full transcript enrichment, consistent with the need for models to infer implicature and indirect risk through discourse-level evidence aggregation.

7. Comparison, Limitations, and Potential Adaptations

Relative to image-based privacy datasets (e.g., VISPR), the Facebook sharenting risk data is text-native, targets child privacy, and incorporates multi-agent simulated annotation as a core feature. Current limitations include dataset scale (1,901 posts), possible language/cultural bias (English, public Facebook groups), and absence of explicit regional/age stratification within the core dataset (though recommended for future child-focused risk annotation (Orekondy et al., 2017)). The annotation protocol is not formally quantified for statistical agreement beyond descriptive, suggesting room for expanded reliability analysis and multi-site validation.

Adaptations may include:

Curating child-centric, synthetic, or more culturally/linguistically diverse data
Augmenting the taxonomy for emergent sharenting risks (e.g., AI re-identification, deepfake images)
Incorporating regulatory constraints as dynamic label modifiers (e.g., age, jurisdiction)
Deploying parent-participant user studies to ground risk taxonomies in lived parental preferences

This dataset and associated methodologies represent a significant advance in empirical online safety annotation, bridging human expert reasoning, LLM-simulated deliberation, and regulatory-aware risk classification (Orekondy et al., 2017, Mao et al., 6 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety (2025)

Towards a Visual Privacy Advisor: Understanding and Predicting Privacy Risks in Images (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Online Safety Facebook Sharenting Risk Dataset.