What your brain activity says about you: A review of neuropsychiatric disorders identified in resting-state and sleep EEG data

Published 6 Oct 2025 in cs.NE, cs.CR, cs.CY, and q-bio.NC | (2510.04984v1)

Abstract: Electroencephalogram monitoring devices and online data repositories hold large amounts of data from individuals participating in research and medical studies without direct reference to personal identifiers. This paper explores what types of personal and health information have been detected and classified within task-free EEG data. Additionally, we investigate key characteristics of the collected resting-state and sleep data, in order to determine the privacy risks involved with openly available EEG data. We used Google Scholar, Web of Science and searched relevant journals to find studies which classified or detected the presence of various disorders and personal information in resting state and sleep EEG. Only English full-text peer-reviewed journal articles or conference papers about classifying the presence of medical disorders between individuals were included. A quality analysis carried out by 3 reviewers determined general paper quality based on specified evaluation criteria. In resting state EEG, various disorders including Autism Spectrum Disorder, Parkinson's disease, and alcohol use disorder have been classified with high classification accuracy, often requiring only 5 mins of data or less. Sleep EEG tends to hold classifiable information about sleep disorders such as sleep apnea, insomnia, and REM sleep disorder, but usually involve longer recordings or data from multiple sleep stages. Many classification methods are still developing but even today, access to a person's EEG can reveal sensitive personal health information. With an increasing ability of machine learning methods to re-identify individuals from their EEG data, this review demonstrates the importance of anonymization, and the development of improved tools for keeping study participants and medical EEG users' privacy safe.

Abstract PDF Upgrade to Chat

Summary

The paper provides a systematic comparison of machine learning algorithms using task-free EEG data to classify neuropsychiatric and sleep disorders.
It highlights high classification accuracies achieved with short resting-state recordings and extended sleep EEG sessions, detailing methodological frameworks.
The study addresses significant privacy risks in EEG data sharing and underscores the need for robust anonymization techniques.

Review of Neuropsychiatric Disorder Identification in Resting-State and Sleep EEG Data

Overview

This paper presents a comprehensive review of machine learning approaches for identifying neuropsychiatric and sleep disorders using task-free EEG data, specifically focusing on resting-state and sleep EEG. The review systematically evaluates the types of personal health information that can be extracted, compares methodological characteristics between resting-state and sleep EEG studies, and assesses the privacy risks associated with open EEG data sharing. The analysis is grounded in a rigorous literature search and quality assessment, providing a detailed synthesis of classification performance, data requirements, and methodological limitations.

Methodological Framework

The review employs a two-phase literature search strategy, utilizing Google Scholar and Web of Science, supplemented by citation tracking. Inclusion criteria are stringent: only peer-reviewed studies in English that classify medical disorders using resting-state or sleep EEG are considered. The final corpus comprises 43 studies, with 26 using resting-state EEG and 17 using sleep EEG or PSG. Quality assessment is performed by three independent reviewers using a custom tool, focusing on validity, machine learning protocol rigor, and reproducibility.

Key extracted study characteristics include:

Sample size
Number of EEG channels
Signal duration per subject
Classifier type and feature set
Classification accuracy and related metrics

Classification Performance and Data Characteristics

Resting-State EEG

Resting-state EEG studies typically utilize larger sample sizes (mean ≈ 229, median ≈ 161 after outlier removal) and more channels (mean ≈ 34.3). Signal durations required for classification are short (mean ≈ 4.7 min, minimum 30 s). Disorders classified include ASD, ADHD, Parkinson's disease, MDD, anxiety disorders, addictive disorders, and more. Classification accuracies are generally high, with a mean of 85.7%, and several studies report >90% accuracy for specific disorders (e.g., alcohol use disorder, Parkinson's disease, MDD).

Feature extraction methods are diverse, including spectral power, functional connectivity, fractal dimension, and graph-theoretic measures. Classifiers range from SVM, random forest, and logistic regression to deep learning architectures (CNN, LSTM). Notably, high accuracy is often achieved with relatively short recordings and moderate channel counts, indicating strong discriminative information in resting-state EEG.

Sleep EEG

Sleep EEG studies use fewer subjects (mean ≈ 40) and channels (mean ≈ 3.4), but require substantially longer recordings (mean ≈ 5.3 hours, up to 9 hours). Disorders classified include insomnia, sleep apnea, REM sleep behavior disorder, bruxism, and TBI. Mean classification accuracy is slightly higher than resting-state (90.3%), with several studies reporting >95% accuracy for insomnia and sleep apnea.

Sleep staging is critical, with most studies leveraging data from multiple sleep phases. Feature sets include spectral, statistical, and complexity measures, with classifiers such as SVM, decision trees, and deep neural networks. The need for long-duration recordings and sleep staging complicates data collection and analysis, but high accuracy is achievable even with single-channel EEG in some cases.

Privacy Risks and Data Protection

The review highlights significant privacy concerns associated with open EEG data sharing. Machine learning methods can re-identify individuals from resting-state and sleep EEG, even in the absence of direct identifiers. Moreover, sensitive health information—including psychiatric diagnoses and biometric traits—can be inferred from task-free EEG. The risk is exacerbated by the potential for cross-dataset de-anonymization if any dataset is linked to personal identifiers.

Current anonymization practices (removal of names, ages, etc.) are insufficient given the biometric and health information embedded in EEG signals. The review discusses advanced privacy-preserving techniques, such as feature selection, data masking, and generative adversarial networks for signal anonymization. However, these methods are still under development and not widely adopted.

Methodological Limitations and Reporting Quality

A substantial proportion of reviewed studies exhibit moderate or low methodological quality, particularly among sleep EEG studies. Common deficiencies include:

Inadequate reporting of signal duration and data splitting protocols
Lack of transparency in feature extraction and classifier training
Potential for data leakage due to improper train/test splits (subject overlap)
Limited consideration of real-world prevalence and comorbidity

These issues undermine replicability and may inflate reported classification performance. The review emphasizes the need for rigorous reporting standards, including explicit documentation of epoch length, data partitioning, and demographic characteristics.

Practical and Theoretical Implications

The findings demonstrate that task-free EEG contains rich information for both biometric identification and neuropsychiatric disorder classification. High classification accuracy is achievable with short resting-state recordings and minimal sleep EEG channels, suggesting practical utility for clinical screening and BCI applications. However, the privacy risks are substantial, necessitating robust anonymization and data protection protocols.

From a theoretical perspective, the review underscores the discriminative power of spontaneous brain activity and the feasibility of disorder classification without task-based paradigms. The results also highlight the need for further research into generalization across heterogeneous populations, comorbid conditions, and real-world data distributions.

Future Directions

Privacy-preserving EEG analysis: Development and standardization of signal anonymization techniques, including adversarial learning and feature obfuscation.
Robust machine learning protocols: Adoption of best practices for data splitting, cross-validation, and reporting to mitigate data leakage and enhance replicability.
Generalization studies: Evaluation of classifier performance on large, diverse, and comorbid populations to assess real-world applicability.
Regulatory frameworks: Extension of data protection legislation to address biometric and health information embedded in EEG, beyond traditional anonymization.

Conclusion

This review provides a detailed synthesis of machine learning approaches for neuropsychiatric disorder identification in resting-state and sleep EEG. While classification performance is high, especially for certain disorders, methodological limitations and privacy risks remain significant. The embedded biometric and health information in EEG signals necessitates advanced anonymization strategies and rigorous reporting standards. Future research should focus on privacy-preserving analysis, robust machine learning protocols, and generalization to real-world clinical settings.