Bonn University EEG Dataset Overview
- Bonn University EEG dataset is a publicly available collection of 500 single-channel EEG segments recorded under controlled normal and epileptic conditions.
- The dataset underpins diverse methodologies including deep learning, pyramidal 1D-CNN, and geometric as well as spectral-phase analyses with high classification accuracies.
- Its structured partitioning into five sets (A–E) enables flexible experiment configurations for binary, three-class, and five-class classification tasks.
The Bonn University EEG dataset is a canonical, publicly available corpus for the study of electrophysiological brain signals under normal and epileptic conditions. Originally assembled and introduced by Andrzejak et al. (2001), this dataset has become a standard for benchmarking algorithms in epileptic seizure detection, neurodynamical modeling, and advanced machine learning pipelines. Each record comprises a single-channel electroencephalogram (EEG) segment of 23.6 seconds duration, sampled at 173.61 Hz and subjected to precise experimental and clinical conditions. The dataset’s robust internal stratification and detailed signal statistics have underpinned advances in deep learning, geometric signal analysis, and clinical translational research.
1. Dataset Composition and Recording Protocol
The Bonn University EEG database consists of 500 non-overlapping EEG segments, each precisely 23.6 s (4097 or 4096 samples depending on export format), systematically partitioned into five sets (A–E), each containing 100 signals. The full specification is:
| Set | Anatomical/Clinical Context | Electrode Type | State |
|---|---|---|---|
| A | Healthy volunteers, eyes open | Scalp (10–20) | Baseline, eyes open |
| B | Healthy volunteers, eyes closed | Scalp (10–20) | Baseline, eyes closed |
| C | Epilepsy patients, interictal, outside focus | Depth (hippocampal) | Non-seizure, outside focus |
| D | Epilepsy patients, interictal, inside focus | Depth (hippocampal) | Non-seizure, within focus |
| E | Epilepsy patients, ictal (seizure) | Depth (epileptogenic) | During seizure |
All recordings are monopolar or common reference. Segments are individually selected to be approximately stationary, minimizing movement and physiological artifacts at the acquisition stage. Sets A and B capture surface EEG from five healthy individuals under two vigilance states, while C–E record depth EEG from surgical epilepsy patients during interictal and ictal episodes (Lu et al., 2019, Akbari et al., 2019, Ullah et al., 2018, Nath et al., 18 Aug 2025, Kumar et al., 24 Dec 2025).
2. Signal Specifications and Preprocessing
All EEG segments are 4097 (sometimes 4096) samples in length ( s, Hz). The acquisition protocol includes a band-pass filter (0.53–40 Hz) implemented at hardware or acquisition level. No additional artifact removal (such as ICA/PCA or manual rejection) is performed on the raw set; protocol details specify z-score normalization prior to network input, i.e.,
where and are the segment mean and standard deviation, respectively (Lu et al., 2019, Kumar et al., 24 Dec 2025, Nath et al., 18 Aug 2025, Akbari et al., 2019). In some derivatives (e.g., the UCI-repackaged Epileptic Seizure Recognition Dataset) the signals are resampled or averaged to vectors of length 178 (Gupta et al., 2021).
3. Data Organization, Experiment Splits, and Class Pooling
The segmentation protocol provides several analytic possibilities:
- Three-class grouping (e.g., healthy: A+B, unhealthy: C+D, seizure: E) (Lu et al., 2019, Ullah et al., 2018)
- Two-class groupings (seizure-free vs seizure, or healthy vs seizure; sometimes O/E for eyes-closed/open) (Akbari et al., 2019, Kumar et al., 24 Dec 2025)
- Five-class classification (A–E or Z, O, N, D, S) as per the UCI expansion (Gupta et al., 2021)
- Binary groupings for specific detection challenges (e.g., AB vs E, AB vs CD, etc. (Ullah et al., 2018))
Common cross-validation schemes include 5-fold or 10-fold CV, stratified by original class or subject. A typical split is 60/20/20 for training/validation/test, applied within each subset to maintain class balance (Lu et al., 2019).
4. Feature Engineering and Classification Pipelines
The dataset’s broad adoption reflects methodological diversity:
- End-to-end Deep Learning: Residual CNNs (two residual blocks with Conv1D, batch-norm, LReLU, max-pool, dropout) operate directly on cropped and normalized raw segments (e.g., 3800-sample crops), with no handcrafted features (Lu et al., 2019).
- Pyramidal 1D-CNN: Sliding windows (length 512 samples, stride 64 or 128) augment data, with majority voting at test time over sequential blocks (Ullah et al., 2018). Ternary and binary tasks achieve up to 99.1% record-level accuracy.
- Topological/Geometric Featurization: Takens’ embedding (delay τ=1, dimension m=3) converts short segments to point clouds, supporting persistent Betti number and graph spectrum feature series input to shallow CNNs (Bischof et al., 2021). Here, raw time-series outperforms geometric features under downsampling, with a drop from ~84% (raw) to ~55% (Betti) as resolution decreases.
- Spectral-Phase Analysis: Welch’s method isolates gamma activity (48–52 Hz, 74–78 Hz), and phase-space reconstruction quantifies dynamical differences between normal and ictal states; SVM classification using Laplacian-pooled image features yields ~94–95% accuracy (Nath et al., 18 Aug 2025).
- Universum Learning: IU-GEPSVM, a Universum-enhanced generalized eigenvalue SVM, integrates interictal (N) data as universum constraints, achieving up to 85% and 80% accuracy in O_vs_S and Z_vs_S tasks, respectively (Kumar et al., 24 Dec 2025).
- UCI Downsampled Variant: Segments collapsed to 178-points (23.6 s) used in 1D-CNNs with residual connections, yielding 99.9% specificity/99.5% sensitivity (binary), or 81.4% specificity/81.0% sensitivity (five-class) (Gupta et al., 2021).
5. Signal Band Structure and Derived Features
Custom frequency bands are extracted via empirical wavelet transforms (EWT) with Meyer-type filters targeting , , , , and bands (boundaries at 0, 4, 8, 16, 30, 60 Hz). Phase space is reconstructed via 2D embeddings (), and the 95% confidence ellipse area is used as a succinct rhythm-wise feature in KNN classification, producing 98% accuracy with - paired features (Akbari et al., 2019).
Fourier component reconstruction after Welch’s transform isolates high-frequency regimes; healthy EEGs show prominent gamma peaks, which are substantially attenuated during seizures. Phase-plane inspections reveal bistability in healthy and irregular collapse in ictal states, supporting robust machine learning separation of classes (Nath et al., 18 Aug 2025).
6. Limitations, Use Cases, and Impact
Limitations of the dataset include the small subject pool (five healthy, five patients), single-channel topology, and lack of artifact channel or continuous pre-ictal data. Despite these, it remains the reference for seizure detection benchmarking and algorithmic development (Nath et al., 18 Aug 2025). Notable applications include:
- Benchmarking for novel deep architectures and feature learning strategies (Lu et al., 2019, Ullah et al., 2018, Gupta et al., 2021)
- Comparative studies of handcrafted vs learned vs geometric features (Bischof et al., 2021, Akbari et al., 2019)
- Testing universum-based or domain-adaptive classifiers (Kumar et al., 24 Dec 2025)
- Analytical exploration of brain state dynamical transitions (Nath et al., 18 Aug 2025)
A plausible implication is that robust, dataset-specific artifact removal and more heterogeneous, high-density multi-channel recordings may further advance the translational capacity demonstrated in published work.
7. Access, Replication, and Future Directions
The Bonn EEG dataset is publicly accessible for non-commercial purposes. Download links and further documentation are available through the University of Bonn and institutional partners (Nath et al., 18 Aug 2025). All major published methods provide sufficient detail for precise reproduction, including specification of train/validation/test splits, feature transforms, cross-validation, and classifier hyperparameters. Current research recommends expanding to longer, multi-channel, and pre/post-ictal recordings; exploring higher sampling rates; and developing portable, real-time analysis pipelines for clinical application (Nath et al., 18 Aug 2025). Transfer learning, advanced data augmentation, connectivity measures, and integration with patient meta-data constitute active areas for methodological innovation.