CBCP EEG Dataset for Childhood Empathy
- CBCP dataset is an EEG-based collection capturing multi-channel recordings from 57 children to quantify empathy responses.
- It employs a standardized film viewing and post-assessment protocol to generate binary empathy labels, ensuring reproducible model evaluation.
- The dataset supports advanced deep learning through multi-view fusion of cognitive and emotional EEG signals within the BEAM framework.
The CBCP dataset is an EEG-based collection used for empirical validation in empathy research for early childhood, specifically designed to facilitate objective and quantitative assessment of young children's empathetic responses. In the context of the BEAM (Brainwave Empathy Assessment Model) framework (Xie et al., 8 Sep 2025), the CBCP dataset offers high-resolution, multi-channel EEG recordings associated with validated behavioral empathy assessments, supporting advanced deep learning approaches in developmental neuroscience and affective computing.
1. Dataset Composition and Acquisition Protocol
The CBCP dataset consists of EEG signals from 57 typically developing children, aged 4 to 6 years (mean 4.91 ± 1.07 years). During data acquisition, each child participated in a standardized 6-minute viewing session of the Pixar short film “Partly Cloudy.” Immediately following the film, each child was subjected to a post-test assessment, quantifying their behavioral empathy through a willingness-to-help rating in a negative scenario.
EEG data were systematically collected with multi-channel equipment, yielding multi-view time series for each subject. Labels were generated by median-splitting the post-test empathy scores into binary classes—high and low empathy. The dataset segmentation followed a subject-level split: 70% for training, 20% for validation, and 10% for testing, repeated across five random seeds to ensure statistical robustness.
Attribute | Value | Description |
---|---|---|
N | 57 | Number of children (subjects) |
Age | 4.91 ± 1.07 years | Mean ± SD |
Data Type | Multi-channel EEG (multi-view) | Spatio-temporal signals per subject |
Session | 6 min, “Partly Cloudy” + post-assessment | Standardized empathy elicitation protocol |
Labels | Binary (high/low empathy via median split) | Derived from behavioral willingness-to-help rating |
The subject-level splitting and repeated randomization mitigate sampling bias and support generalizable model evaluation.
2. Experimental Design and Evaluation Criteria
Each EEG sample from the CBCP dataset is paired with behaviorally grounded empathy labels. The design leverages both cognitive (Theory-of-Mind, ToM) and emotional (EM) EEG views, facilitating multi-modal analysis through neural architectures. The evaluation protocol used in BEAM ensures rigorous assessment via stratified experiments, and metrics are reported as mean ± standard deviation across repetitions.
Three classification metrics are systematically employed:
- Accuracy: Proportion of correctly classified subjects.
- Specificity: True negative rate (correct classification of low empathy).
- Sensitivity: True positive rate (correct classification of high empathy).
Performance statistics for BEAM (Proposed Method) were: accuracy 0.647 ± 0.008, specificity 0.651 ± 0.009, sensitivity 0.646 ± 0.009. These results establish a quantitative benchmark for future models trained and evaluated on CBCP.
3. Feature Representation and Multi-View Structure
The CBCP dataset’s EEG records are formatted as , where is the number of EEG channels and is the temporal window length. The multi-view paradigm is explicit:
- Z_ToM: Encodes the cognitive Theory-of-Mind features.
- Z_EM: Encodes the emotional features.
The dataset structure allows for extraction and fusion of these distinct representations:
Each (for ) is decomposed into common and separate latent components, aligned with advanced feature fusion methods in BEAM. This structure supports both joint and differential analysis of cognitive versus emotional EEG signal patterns.
4. Usage in Deep Learning Empathy Models
CBCP’s design supports models requiring objective, high-dimensional neural time series coupled with reliable outcome labels. It enables training deep frameworks such as BEAM, which utilizes:
- LaBraM-based Encoder: Transformer model for spatio-temporal EEG patch embedding.
- Feature Fusion: Latent decomposition and similarity-based fusion loss:
Where:
- quantifies cosine similarity of common parts across views,
- quantifies cosine similarity of separate parts.
- Contrastive Learning: The InfoNCE loss () is employed to enforce discriminative latent representations for empathy classification:
where is batch size, temperature parameter, and are latent representations for anchor and positive samples, respectively.
CBCP thus provides sufficient sample diversity, granularity, and label reliability for data-hungry deep learning models targeting neurodevelopmental empathy assessment.
5. Benchmarking, Performance Results, and Model Comparison
The CBCP dataset constitutes the empirical testbed for evaluating methods in childhood empathy prediction. In the case of BEAM, comparison with state-of-the-art alternatives—BIOT, ST-Transformer, and SVM-asymmetry—showed consistent and superior performance:
Model | Accuracy | Specificity | Sensitivity |
---|---|---|---|
BEAM (Proposed) | 0.647 ± 0.008 | 0.651 ± 0.009 | 0.646 ± 0.009 |
BIOT | < 0.647 | < 0.651 | < 0.646 |
ST-Transformer | < 0.647 | < 0.651 | < 0.646 |
The lower standard deviation for BEAM’s metrics suggests increased robustness to variations in subject-level splitting. This suggests that the CBCP dataset supports reliable benchmarking for temporal deep learning models in neurobehavioral assessment scenarios.
6. Scientific Impact and Plausible Implications
CBCP’s design and application facilitate objective, scalable assessment of empathy in early childhood, overcoming limitations of self-report and observer-only methodologies. Its integration with multimodal EEG and advanced neural models enables nuanced evaluation of both cognitive and emotional empathy processes.
A plausible implication is that CBCP, if extended to larger and more heterogeneous samples, could underpin the development of predictive neurobehavioral diagnostics or personalized intervention strategies for prosocial development in children. The robust feature extraction and multi-view fusion paradigms it supports are instrumental for future studies aiming to disentangle the neurophysiological substrates of empathy and related affective traits.
7. Limitations and Directions for Future Research
CBCP, in its current instantiation, includes 57 subjects with binary empathy labels derived from a specific post-test paradigm. While effective for benchmarking, any expansion in sample size, age range, or time/event segmentation may further enhance model generalizability. The dataset’s utility for transfer learning, domain adaptation, or finer-grained behavioral stratification remains an area for future exploration. Moreover, its exclusive focus on EEG during passive film viewing may constrain some aspects of ecological validity, suggesting the benefit of supplementing CBCP with data from interactive or real-world empathy elicitation protocols.