CardioPHON: Integrated Cardiac Sensing

Updated 11 November 2025

CardioPHON is a suite of integrated hardware and algorithmic solutions for real-time phonocardiography and multi-modal cardiac sensing, enabling reliable screening and continuous monitoring.
The systems combine dedicated sensor front-ends with adaptive noise cancellation and self-supervised machine learning for enhanced signal clarity and diagnostic accuracy.
Applications span clinical point-of-care, wearable health monitoring, and telemedicine, offering efficient, portable, and cost-effective platforms for cardiac evaluation.

CardioPHON systems encompass a spectrum of integrated hardware and algorithmic solutions for phonocardiographic (PCG) and multi-modal cardiac sensing, targeted at reliable screening and continuous monitoring of cardiac function in diverse environments. These platforms combine dedicated front-end sensing hardware, advanced real-time noise-robust processing, and recent advances in machine learning, notably self-supervised learning and on-device inference, to facilitate portable auscultation, arrhythmia discrimination, blood pressure estimation, and heart rate monitoring. Systems under this designation include both professional-grade dual-modality (ECG+PCG) devices, consumer smartphone-based screening with neural models, and miniaturized wearable and patch systems optimized for energy efficiency and clinical integration.

1. Hardware Architectures for CardioPHON Systems

CardioPHON devices feature a range of hardware from medical-grade multi-modal acquisition systems to low-cost consumer and wearable configurations.

Professional Dual-modality Acquisition (Herath et al., 27 Oct 2025): Three-lead dry Ag/AgCl ECG electrodes drive a 24-bit ADS1294 ΔΣ analog front end (maximum 500 Hz sampling). PCG is acquired via two CMM-4030DT MEMS microphones acoustically coupled to a Littmann stethoscope diaphragm, supplemented by a quad-mic noise reference array and a custom 3D-printed seal for ambient noise attenuation. Data flows to an ARM Cortex-M MCU running real-time pipelines, with Bluetooth Low Energy (BLE) for host visualization.
Blood Pressure Estimation Configuration (Esmaili et al., 2019): A minimal chest-worn electret microphone with diaphragm collects PCG (20–240 Hz), a reflective PPG sensor is attached to the finger (0.5–20 Hz), and a Force Sensing Resistor (FSR) beneath a BP cuff accurately timestamps reference measurements. All channels are digitized at 1 kHz via custom acquisition hardware.
Fetal PCG Wearables (Müller et al., 14 Jul 2025): Wide-band analog front ends and digital 4th-order Butterworth bandpass filtering (15–55 Hz) are used, sampled at ~333 Hz. Onboard MCUs such as Cortex-M4/M7 facilitate real-time implementation.
Wearable and IoMT Platforms (Ibrahim et al., 21 Oct 2025, Tailor et al., 2020): Sensor patch integrates a differential ECG front end and chest-mounted MEMS PCG microphone, powered by NXP NHS52S04 (Cortex-M33), with data downsampled for edge inference or streaming via BLE. Piezoelectric contact microphones and ultra-low power analog filtering (20–200 Hz) facilitate continuous wearable operation using MCUs such as STM32L4 series.
Smartphone-based Approaches (Vu et al., 4 Dec 2024): Modern MEMS-equipped smartphones (Android/iOS) serve as acquisition platforms, with no need for external stethoscope attachments. Users press the phone’s mic port directly to the auscultation site (10–20 s, 16 kHz PCM), enabling broad accessibility.

These architectures are chosen to optimize one or multiple design criteria: noise robustness (quad-mic arrays, custom diaphragm seals), portability (MEMS, chest-strap, or smartphone-based designs), power and compute efficiency (MCU or embedded NPU deployment), or clinical compatibility (24-bit ADCs, medical front ends).

2. Real-Time Signal Processing and Noise Cancellation

Robustness to acoustic interference, motion artifact, and environmental noise is a central design axis.

Adaptive Noise Cancellation (Herath et al., 27 Oct 2025): Burst-adaptive normalized least-mean squares (NLMS) algorithms operate in real time. For reference $r(n)$ and primary $x(n)$ , the error is $e(n) = x(n) - \mathbf{w}^T(n)\mathbf{r}(n)$ , with weight updates

$\mathbf{w}(n+1) = \mathbf{w}(n) + \frac{\mu_{\mathrm{eff}}(n)}{\varepsilon + \mathbf{r}^T(n)\mathbf{r}(n)} e(n)\mathbf{r}(n).$

Bursts are detected when input energy exceeds a dynamic threshold; step size is scaled adaptively for fast convergence. ECG denoising employs a cascade of elliptic IIR filters: high-pass (0.5 Hz), low-pass (150 Hz), and 50 Hz notch.

Preprocessing in Blood Pressure and Fetal Workflows (Esmaili et al., 2019, Müller et al., 14 Jul 2025): IIR bandpass filtering isolates S1/S2 heart sounds (20–240 Hz for adults, 15–55 Hz for fetal). Envelopes are extracted via Hilbert or homomorphic transforms, optionally including wavelet and Teager energy operators to enhance transient detection.
On-device Segmentation and Spectral Features (Tailor et al., 2020): Windowed DFTs (16-sample Hann, 500 Hz input) and random forest-based state emission models feed into a 4-state hidden semi-Markov model (HSMM) for cardiac phase segmentation, enabling online state decoding at <750 ms/second audio compute cost on 80 MHz Cortex-M4 platforms.
Smartphone Pathways (Vu et al., 4 Dec 2024): End-to-end raw waveform processing is emphasized; no bandpass or standard denoising is performed. The neural model's trainable FIR front-end handles all spectral filtering.

3. Algorithmic Frameworks: Quality Assessment, Classification, and Downstream Tasks

CardioPHON leverages diverse algorithmic stacks tailored for resource constraints and task specificity.

Quality Assessment Pipeline (Despotovic et al., 6 Nov 2025): A Voting Classifier ensemble (SVM, RF, GB) operates on a subset of 416 time/spectral/HMM-based features, with mutual information-based feature selection. The classifier achieves 0.933 accuracy, 0.951 precision, 0.955 F1, and AUROC of 0.983, removing ~25% of recordings as low quality in large-scale screening tasks.
Self-supervised and Multimodal Heart Sound Classification (Despotovic et al., 6 Nov 2025): The backbone is a BYOL-A encoder (96×64 log-Mel), pretrained via non-contrastive self-supervised learning (SSL) on ∼13k PCG clips. Downstream, a two-layer FC head (256 neurons per layer with dropout) is attached for binary (normal/abnormal) classification. Socio-demographic variables (gender, age group, BMI, pregnancy) are one-hot encoded and concatenated at the feature level. SSL and multimodal pipelines outperform zero-shot and purely audio models (cost=11,107, F1=0.612 for audio+demographic). Audio-only fine-tuned models hold the first rank among unimodal approaches.
Fetal Heart Sound and Rate Detection (Müller et al., 14 Jul 2025): Multiple approaches are covered:
- Heuristic peak detection using Teager energy or RMS envelope, with beat-length windows and adaptive thresholds.
- HSMM with logistic regression emission for four-state sequence labeling, yielding S1 F1-score 97.4%, S2 F1-score 91.3%.
- U-net/CNN segmentation augmented by HSMM post-processing.
Deep Learning on Wearables and Phones (Ibrahim et al., 21 Oct 2025, Vu et al., 4 Dec 2024, Latif et al., 2018): Models range from tiny-CNNs with MobileNetV2-style bottlenecks (CNN-128: 15.9k params, 3.72M FLOPs, F1 0.9707, AUC 0.9695) for edge deployment, to raw waveform interpretable CNNs (IConNet: 154k params, F1 92.05%) suitable for smartphones. RNN-based approaches (BLSTM, BiGRU) achieve >97% accuracy, with LSTM yielding peak sensitivity.

4. Quantitative Results, Evaluation Metrics, and Benchmarking

Evaluation adheres to challenge and dataset-specific standards, emphasizing both task and system-level performance.

Noise Suppression (Herath et al., 27 Oct 2025): ΔSNR improvements of +37.01 dB (PCG) and +30.32 dB (ECG) on real hospital recordings. Burst-adaptive NLMS yields +1.9 dB additional ΔSNR under strong noise bursts vs. standard NLMS.
Heart Sound and Fetal Applications (Müller et al., 14 Jul 2025):
- S1 detection: PPV 97.6%, F1 97.4%, MAE 12.2±8.0 ms.
- S2 detection: PPV 91.4%, F1 91.3%, MAE 17.3±12.2 ms.
- Fetal heart rate: mean MSE 0.644 (bpm²).
Blood Pressure Estimation (Esmaili et al., 2019): SBP prediction—MAE 7.47 mmHg, STD 11.08 mmHg, correlation r=0.84; DBP—MAE 3.56 mmHg, STD 4.53 mmHg, r=0.86. PCG-PPG PTT models match or exceed ECG-PPG PAT approaches.
Classification Benchmarks (Despotovic et al., 6 Nov 2025, Ibrahim et al., 21 Oct 2025, Vu et al., 4 Dec 2024, Latif et al., 2018):
- CardioPHON (audio+dem): cost=11,107, Acc=0.625, F1=0.612, AUROC=0.693.
- Tiny-CNN (ECG+PCG): Acc=0.9705, F1=0.9707, AUC=0.9695 (three orders of magnitude smaller and more efficient than prior SOTA).
- IConNet (raw PCG): unweighted accuracy 87.48%, F1 92.05%.
- RNN (PCG): BLSTM and BiGRU reach >97% accuracy/specificity.

5. Implementation Constraints, Efficiency, and Clinical Integration

Deployment scenarios dictate energy, latency, and algorithm choice.

Embedded and Wearable Efficiency (Ibrahim et al., 21 Oct 2025, Tailor et al., 2020): On-patch inference with CNN-128 on NXP MCX N947 yields 0.092 mJ per decision (NPU), 20.36 ms latency, and 26.5 KB memory. NPU is ~3.9× faster and ~6.9× more energy-efficient than CPU. NPU-based inference can be ~53% more energy-efficient than low-rate continuous BLE streaming. STM32L4-based HSMM segmentation fits in <20 kB RAM with end-to-end latencies (<1 s) suitable for true real-time feedback.
Smartphone Integration (Vu et al., 4 Dec 2024): FP32 model occupies ~493 kB; 8-bit quantization reduces to 125 kB, supporting sub-20 ms inference per 4 s window on mobile CPUs. Standard software stacks (TFLite, PyTorch Mobile, Core ML) can run these models with minimal RAM overhead.
Clinical and Telemedicine Scalability (Herath et al., 27 Oct 2025, Despotovic et al., 6 Nov 2025): BLE links allow wireless transmission of denoised data for real-time visualization, remote review, or cloud AI pipelines, facilitating both point-of-care and telecardiology deployment.
Limitations and Open Challenges: Full embedded ANC realization pending in some prototypes; further dynamic adaptation of filter lengths and control parameters needed for highly variable noise. Blood pressure PTT approaches require user-specific calibration and periodic recalibration; motion artifacts and unconventional sensor placement can degrade accuracy across modalities. Some methods fall short of clinical-grade accuracy standards.

6. Applications and Clinical Impact

CardioPHON systems target multiple domains:

Point-of-care and Mobile Auscultation: Enabling reliable screening for murmurs, arrhythmias, and abnormal heart sounds in high-noise or resource-limited environments (e.g., pediatric wards, remote clinics).
Continuous Monitoring and Wearable Health: Form factors ranging from chest-straps to sensor patches support multi-hour deployment, with heart rate and rhythm estimation robust to ordinary ambient noise and moderate motion.
Blood Pressure and Fetal Monitoring: Portable, low-cost PCG+PPG/FSR platforms open the door to cuffless BP estimation and widespread fetal heart rate tracking with standardized, transparent evaluation metrics.
Data-Efficient and Transferable ML Models: Self-supervised pretraining, quality assessment, and open-source models lower the data requirements for developing specialized detectors, advancing generalizability to diverse populations and pathologies.

By merging robust hardware integration, advanced signal processing, and efficient machine learning—often with open-source toolkits and explicit real-world validation—CardioPHON systems represent a technologically rigorous approach to scalable, accessible, and clinically meaningful cardiac monitoring and screening in both traditional and rapidly evolving healthcare contexts.