Exercise-ECGID Dataset for ECG Biometrics

Updated 27 October 2025

Exercise-ECGID is a specialized collection of ECG recordings capturing both rest and post-exercise states for biometric identification benchmarking.
It supports rigorous analysis through standardized acquisition methods, advanced signal processing (e.g., QRS detection, STFT), and deep learning techniques.
Benchmark studies reveal significant cross-state performance gaps, with improved accuracy achieved via personalized augmentation and domain adaptation strategies.

The Exercise-ECGID dataset is a specialized collection designed for the evaluation of biometric identification systems using electrocardiogram (ECG) signals recorded under both rest and post-exercise physiological states. It is recognized within the field for enabling rigorous investigation of intra-subject and cross-state variability in ECG biometrics and for benchmarking algorithms against dynamic physiological stressors. This dataset is central to research on robust ECG-based authentication methods and to understanding the impact of physiological stress on cardiac electrical signals.

1. Dataset Design and Structure

The Exercise-ECGID dataset was collected at South China University of Technology and consists of paired ECG recordings from 45 healthy subjects (33 males, 12 females, ages 18–22) (Wang et al., 2019). Each subject underwent two distinct recording protocols: a rest condition (approximately 5 minutes, ~70 bpm) and a post-exercise condition (150 seconds, heart rates between 90–150 bpm). ECGs were acquired on lead II at 300 Hz using a wearable wrist setup, ensuring consistency across sessions and subjects.

Data acquisition protocols accounted for noise and signal integrity through standardized equipment and preprocessing. Signals are made available either as raw time series or after standard denoising and segmentation, facilitating downstream feature extraction and model evaluation.

2. Methodological Benchmarks and Signal Processing

ECGID methodologies applied to the Exercise-ECGID dataset span diverse paradigms (Wang et al., 2019, Zheng et al., 20 Oct 2025):

Time-domain analysis: Classical QRS detection via Pan-Tompkins algorithm (with transfer function $H(Z) = (1/8)[2 + Z^{-1} - Z^{-3} - 2Z^{-4}]$ ) produces beat-synchronous segmentation.
Frequency and time–frequency analysis: Features are derived from transforms such as STFT ( $\text{STFT}(t, \omega)$ ) and continuous wavelet transforms (using Daubechies 5), enabling extraction of localized and scale-dependent morphological markers.
Autocorrelation features: Normalized autocorrelation $R_{xx}[m]$ quantifies subject-specific periodicity and rhythm.
Deep learning: Architectures ranging from LSTM networks to advanced multi-scale convolutional branches and self-attention modules (e.g., CrossStateECG (Zheng et al., 20 Oct 2025)) are applied to normalized and windowed ECG segments.

Preprocessing pipelines typically utilize bandpass filtering (0.5–40 Hz, Butterworth), baseline drift correction (high-pass at 0.5 Hz), z-score normalization, and enhanced QRS detection (Hamilton-Tompkins algorithm), with adaptive segmentation favoring R-peak-centered, windowed extraction (6 s for rest, 4 s for exercise).

3. Cross-State Biometric Identification Performance

A distinguishing characteristic of the Exercise-ECGID dataset is its benchmarking utility for cross-state (rest-exercise) ECG biometric identification. Numerous studies demonstrate a pronounced performance gap when training and test data originate from different physiological states:

Traditional methods: QRS-segment and beat-based identification approaches yield 95–98% accuracy in rest-rest scenarios but collapse to 2–18% accuracy for rest-exercise recognition, indicating a failure to generalize across post-exertion morphological changes (Wang et al., 2019).
Feature selection: KL-divergence-based selection improves robustness, raising rest-exercise accuracy to 61.4% but still falls short of ideal (Wang et al., 2019).
Deep learning advances: CrossStateECG achieves 92.50% identification accuracy in Rest-to-Exercise and 94.72% in Exercise-to-Rest scenarios, with nearly perfect performance in same-state and mixed-state conditions (99.94% Rest2Rest, 97.85% Mix2Mix) (Zheng et al., 20 Oct 2025). Ablation studies confirm the necessity of multi-scale convolution and attention mechanisms for discriminative feature learning under varied physiological stress.
Adaptive authentication: Weighted global, personal, and local thresholds optimize decision boundaries for dynamic biometric verification (Zheng et al., 20 Oct 2025).

Method	Rest–Rest (%)	Rest–Exercise (%)	Exercise–Exercise (%)
Classical QRS/SVM	95–98	2–18	70–83
KL-based Selection	96	61.4	–
LSTM Deep Learning	95–97	12	–
CrossStateECG	99.94	92.50	99.86

4. Advanced Architectures and Augmentation Strategies

Recent work has targeted the physiological variability challenge using model innovations:

Personalized augmentation: DE-PADA leverages ECG-specific segmentation (PQRS and ST intervals) and individualized T-wave simulation, guided by heart rate–dependent linear fits, to generate synthetic post-exercise ECGs for robust model training (Saleh et al., 7 Feb 2025). Augmented data covers T-wave ranges $(T_{\text{peak min}}[k], T_{\text{peak max}}[k])$ per subject.
Domain adaptation: Auxiliary subjects' exercise data are incorporated during training to learn condition-invariant features, then removed for evaluation, enhancing adaptation to unseen physiological states.
Multi-expert CNNs: Dual-expert designs process temporally stable (PQRS) and variable (ST) intervals independently, capturing both invariant and dynamic biometric signatures (Saleh et al., 7 Feb 2025).

Architecture	Augmentation	Domain Adaptation	Key Metric (Exercise)
Standard CNN	None	No	54.4–77.4%
Conventional Augment	Heart-rate generic	No	66.6–81.1%
DE-PADA	Personalized T-wave	Yes	68.9–86.4%

5. Signal Trends and Physiological Insights

Functional data analysis on exercise ECG signals reveals statistically significant and physiologically meaningful trends (Cammarota et al., 2016):

Opposing R and T wave responses: In early recovery, the population mean R wave amplitude exhibits a localized dip while the T wave amplitude manifests a bump.
Statistical validation: Confidence bands $\bar{Y}(t) \pm [\hat{\sigma}(t) \cdot z_{1-\alpha/2}/\sqrt{n}]$ and derivative zero-crossings confirm the features are not artifacts.
Physiological implications: R amplitude is associated with diastolic filling (volume reduction post-exercise), T amplitude with systolic adaptation. These effects align with the Frank–Starling mechanism.

6. Model Generalization, Large-Scale Datasets, and Multimodality

Integration with large-scale and multimodal datasets is a growing theme:

Generalization: OpenECG demonstrates that self-supervised methods (BYOL, MAE) can generalize feature representations across diverse datasets, suggesting that specialized Exercise-ECGID data could complement broad clinical benchmarks for robust model training (Wan et al., 2 Mar 2025).
Multimodal alignment: Datasets such as MEETI synchronize raw signals, synthetic images, extracted quantitative parameters, and LLM-generated textual interpretations, facilitating transformer-based multimodal learning and explainable AI in ECG analysis (Zhang et al., 21 Jul 2025).
Synthetic augmentation: Open-source frameworks can generate synthetic ECG images with detailed annotations to support digitization, lead detection, and segmentation tasks, extending the Exercise-ECGID paradigm to image-based biometrics (Rahimi et al., 26 May 2025).

7. Applications and Future Directions

The Exercise-ECGID dataset underpins several domains:

Biometric authentication: Robust identification in consumer, legal, and clinical settings under dynamic physiological conditions.
Clinical monitoring: Early detection of exercise-induced myocardial abnormalities or ventricular dysfunctions via noninvasive ECG trend analysis.
Algorithmic research: Benchmarking for deep learning architectures, evaluation of augmentation and adaptation methods, and scalable model training with public and multimodal datasets.
Bioengineering and sensor development: Validation of wearable or ambulatory ECG systems for authentication and personalized health monitoring.

Continued work focuses on refining feature extraction (via PCA, attention, and derivative analysis), improving augmentation realism, optimizing cross-state generalization, and integrating multimodal evidence streams for holistic biometric and diagnostic systems.