LubDubDecoder: Cardiac Monitoring via Hearables
- LubDubDecoder is a system that transforms everyday hearables into clinical-grade cardiac monitors using advanced signal processing and deep learning.
- It employs a two-branch convolutional autoencoder with temporal self-attention, achieving high correlations (~0.95 SCG, ~0.94 GCG) across users and devices.
- Its robust design effectively handles motion artifacts and device variability, enabling continuous cardiac diagnostics in clinical and consumer settings.
LubDubDecoder is a system for fine-grained monitoring of micro-mechanical cardiac vibrations, enabling everyday hearables—such as in-ear earbuds, over-ear headphones, and bone conduction devices—to function as clinical-grade cardiac monitoring instruments. The system repurposes the built-in transducer, typically a speaker, in hearables to capture the canonical “lub-dub” heart sounds generated by valve activity. By leveraging the shared temporal and spectral characteristics of these sounds with underlying mechanical vibrations, LubDubDecoder reconstructs high-fidelity seismocardiography (SCG) and gyrocardiography (GCG) waveforms, extracting precise timing of micro-cardiac events. The approach obviates the need for specialized electrodes or chest-mounted sensors, robustly generalizes across device types, and maintains high accuracy in various use settings, including repeated remounting and music playback.
1. System Architecture and Workflow
LubDubDecoder comprises several distinct processing modules to enable cardiac monitoring via hearables. The workflow begins with acoustic capture using the device’s native transducer. For in-ear earbuds, the occlusion effect enhances low-frequency sensitivity for cardiac sounds; for over-ear headphones equipped only with speakers, the system exploits acoustic reciprocity, allowing the speaker to function as a passive microphone for acquiring heart sounds.
During the initial calibration phase, the user concurrently wears the hearable and places a smartphone equipped with an inertial measurement unit (IMU) on their chest to obtain reference SCG and GCG signals. Paired data collection provides the basis for learning the mapping between ear-based cardiac sounds and mechanical waveforms.
The pipeline includes:
- Motion artifact removal, utilizing Mel-frequency cepstral coefficients (MFCC) and a classifier with ROC AUC approaching 0.994, to excise segments corrupted by artifacts.
- Signal conditioning and segmentation, where audio is resampled to 500 Hz and filtered (4th-order Butterworth, 5–45 Hz), then partitioned into 800-ms windows corresponding to cardiac cycles.
- Deep encoder–decoder reconstruction, featuring a two-branch convolutional autoencoder (dilated 1D convolutions for local detail, large receptive fields for long-range dependency) augmented by temporal self-attention, residual connections, and pooling layers.
- Fiducial point labeling for critical events—mitral valve closure, isovolumetric contraction, aortic opening, maximal acceleration, rapid ejection—anchored via peak detection on the aortic opening with heuristics over a 200-ms span.
The system maintains high data quality by computing signal-to-noise ratio (SNR) for each cardiac cycle, whereby power in a 400-ms signal region centered at the S1 peak is contrasted with a noise region.
2. Signal Processing and Deep Learning Methodology
Heart sounds acquired at the ear are resampled and subjected to narrowband filtering to isolate cardiac frequencies. Segmentation utilizes anchor peaks—S1 for acoustic signals, aortic opening for mechanical signals—detected by algorithms assessing local prominence. SNR per cardiac cycle is calculated using the power formula , with SNR computed as .
The reconstruction model is a two-branch convolutional autoencoder:
- The first branch leverages dilated convolutions for short- and medium-term temporal context.
- The second branch uses large receptive fields for protracted dependencies.
- Combined outputs feed into temporal self-attention, with skip connections and pooling for stable gradient propagation.
For device generalization, frequency-domain equalization is incorporated:
- Mean cardiac cycle signals from the reference () and target devices () are Fourier-transformed: , .
- Mapping function: .
- For each new cycle, the signal is adjusted: , then normalized using the norm.
This normalization compensates for hardware variability, permitting “zero-effort” cross-device adaptation.
3. Quantitative Performance and Validation
The system was evaluated in an IRB-approved paper with 18 users. Key performance metrics include:
Scenario | SCG Correlation | GCG Correlation | Fiducial Timing Error |
---|---|---|---|
Within-user | ~0.95 (±0.04) | ~0.94 (±0.04) | Median 0–2 ms; 95th percentile 4–20 ms |
Cross-user | ~0.88 (±0.07) | ~0.89 (±0.05) | Similar timing range |
Cross-device (zero-effort) | ~0.91 (±0.04) | -- | -- |
The motion artifact removal module achieved 97.7% accuracy. During remounting or music playback—where interference might be expected—reconstruction quality was preserved (correlations remained ~0.89–0.95). This suggests strong operational robustness.
4. Adaptation to User and Device Variability
LubDubDecoder includes mechanisms for adapting both to diverse user physiology and device-specific characteristics:
- User adaptation is enabled by a brief calibration phase, requiring only five cardiac cycles (approx. 4 seconds) to tune the model for new heart-to-ear transmission profiles.
- Device adaptation leverages frequency-domain equalization for cross-device normalization, supporting “zero-effort” conversion between hearable hardware without additional calibration from the user.
Performance tests indicate only marginal degradation after remounting, and reliable operation persists in realistic dynamic environments, e.g., during music playback.
5. Clinical and Consumer Applications
Potential applications of LubDubDecoder span clinical, consumer, and research domains:
- Continuous ambulatory cardiac monitoring in non-clinical environments, with the capability to detect micro-cardiac valvular events and precisely measure relevant timings—critical for chronic disease tracking, arrhythmia monitoring, and acute episode identification.
- Integration into consumer electronics, including hearables and hearing aids, providing accessible health insights through unobtrusive biosensing, thus enhancing fitness tracking, proactive health management, and emergency detection schemes.
- Athletic and research use, for studying cardiac responses to exercise and investigating phenomena such as the “white-coat effect” in naturalistic settings. The system’s capacity for detailed event timing offers a new avenue for clinical-grade cardiac analytics without conventional medical instrumentation.
A plausible implication is that widespread deployment in consumer devices could democratize access to fine-grained cardiac diagnostics, allowing large-scale population studies and real-time health interventions.
6. Comparative Significance and Research Context
LubDubDecoder establishes a paradigm in mobile cardiac monitoring by utilizing generic wearable transducers and advanced signal processing frameworks. Its reconstruction fidelity matches traditional chest-mounted sensors, achieving Pearson correlations up to 0.95 for SCG and 0.94 for GCG in within-user tests and maintaining robust performance (0.88–0.91) across users and devices. This positions LubDubDecoder as a bridge between consumer wearables and clinical biosensing, with non-invasive operation and high adaptability. The system’s integration of machine learning, acoustic physics, and biomedically anchored timing analytics presents a comprehensive modality for next-generation cardiac monitoring.