Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition

Published 8 Sep 2023 in eess.SP and cs.MS | (2309.04630v1)

Abstract: Dealing with time series with missing values, including those afflicted by low quality or over-saturation, presents a significant signal processing challenge. The task of recovering these missing values, known as imputation, has led to the development of several algorithms. However, we have observed that the efficacy of these algorithms tends to diminish when the time series exhibit non-stationary oscillatory behavior. In this paper, we introduce a novel algorithm, coined Harmonic Level Interpolation (HaLI), which enhances the performance of existing imputation algorithms for oscillatory time series. After running any chosen imputation algorithm, HaLI leverages the harmonic decomposition based on the adaptive nonharmonic model of the initial imputation to improve the imputation accuracy for oscillatory time series. Experimental assessments conducted on synthetic and real signals consistently highlight that HaLI enhances the performance of existing imputation algorithms. The algorithm is made publicly available as a readily employable Matlab code for other researchers to use.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces HaLI, which leverages harmonic decomposition to improve missing data imputation accuracy in non-stationary oscillatory signals.
It combines template-based initial imputation with harmonic extraction and shape-preserving interpolation, significantly reducing mean absolute error in both synthetic and real physiological data.
The method offers scalable, modular enhancements that restore amplitude and phase synchrony, ensuring reliable clinical signal interpretation.

Harmonic Level Interpolation for Non-stationary Signal Imputation

Introduction

Imputation of missing data in non-stationary oscillatory time series is a critical challenge in numerous signal processing applications, especially in biomedicine, where data loss due to sensor disconnections, saturation, or artifacts is pervasive. Traditional imputation approaches typically exhibit degraded performance when the underlying signals are non-stationary and possess rich oscillatory dynamics. The paper "Enhancing Missing Data Imputation of Non-stationary Signals with Harmonic Decomposition" (2309.04630) introduces Harmonic Level Interpolation (HaLI), a novel method that systematically augments the accuracy of extant imputation algorithms for such non-stationary, oscillatory signals.

Non-trivial characteristics such as time-varying amplitude and instantaneous frequency, as well as variable oscillatory wave-shapes, are prevalent in biomedical signals, as showcased in Figure 1. The complexity and structure of missingness (e.g., x-missing, y-missing, and corruption) further complicate the extraction of clinically relevant features. The proposed framework not only addresses classical x-missing types but also leverages harmonic decomposition to exploit the latent regularity in amplitude and phase evolution, leading to more accurate recovery and improved synchrony with physiological phenomena.

Figure 1: Examples of missing data in the biomedical field: (top) PPG with sensor disconnection, (middle) oversaturated airflow signal, (bottom) low-quality airflow segment.

Model of Non-Stationary Oscillatory Signals

The core model underpinning HaLI is the adaptive nonharmonic model (ANHM), which generalizes classical adaptive harmonic models to enable harmonics with time-varying amplitude, phase, and even time-varying oscillatory patterns (i.e., wave-shape functions). Formally, a signal under this model is parameterized as a superposition of intrinsic mode type (IMT) functions, harmonics, a trend, and a random noise process. The flexible formulation accommodates both amplitude and frequency modulation, as well as gradual trends capturing non-oscillatory features relevant to physiological interpretation, such as mean arterial pressure.

The ANHM facilitates unique harmonic decomposition, contingent on regularity conditions for amplitude and phase, which ultimately allows for principled interpolation across missing intervals. When observed signals are discretized, the imputation task reduces to estimating missing entries guided by the structure elucidated by the ANHM.

Harmonic Level Interpolation (HaLI) Algorithm

The HaLI method is explicitly designed for time series with oscillatory and non-stationary structure. The imputation pipeline consists of three stages:

Initial Imputation: Existing methods (e.g., Takens' Lag Map, DMD, EDMD, GPR, ARIMA, TBATS, Data-driven TF analysis, LSW) are employed to fill missing intervals, providing a piecewise-continuous surrogate for the original signal.
Harmonic Decomposition & Trend Extraction: The imputed signal undergoes time-frequency analysis (STFT/de-shape STFT) to decompose the signal into its constituent harmonics and underlying trend. A model selection step is performed to estimate the harmonic degree using trigonometric regression criteria. Each harmonic's amplitude and phase are extracted for further manipulation.
Figure 2: Schematic of the proposed missing data imputation method showing the initial imputation, harmonic decomposition and trend extraction, and final harmonic-level interpolation steps.
Harmonic-level Interpolation: Amplitudes and phases within the missing intervals are interpolated using shape-preserving and monotonic schemes (e.g., pchip, cubic splines), exploiting their smoothness. The imputed signal is reconstructed by superposing harmonics with interpolated parameters and adding a similarly interpolated trend.

This pipeline addresses boundary artifacts in TF analysis via initial imputation and focuses interpolation on the most regular and semantically meaningful components of the signal—the amplitude and phase trajectories of the extracted harmonics.

Numerical Results

Synthetic Signals

Extensive validation on synthetic signals generated under the ANHM demonstrates that HaLI achieves statistically significant reductions in mean absolute error (MAE) relative to state-of-the-art imputation algorithms, even at high missing-data rates (up to 20%) and under varying noise levels. In all cases, the shape-preserving pchip interpolation further improves over cubic splines, minimizing overshoots and artifacts in the reconstructed signal.

Figure 3: Top: Synthetic noisy signal with 20% missing data and its STFT; Bottom: Imputed signal using TLM and resultant STFT—note the reduction in TF artifacts.

Figure 4: MAE distribution (boxplots) for initial imputation, and HaLI with spline and pchip interpolation across varying missing rates and noise levels; statistically significant improvements for HaLI over best initial imputation.

Ablation and comparative analyses indicate that Takens’ Lag Map (TLM) is the most effective initial imputation scheme for x-missing intervals, attributed to its strong template-matching mechanism leveraging delay-embedding theory. Dynamic model-based methods (GPR, DMD, EDMD) and ARIMA variants exhibit competitive but suboptimal performance, especially as missing-interval lengths increase.

Figure 5: Histogram of best-performing initial imputation method as a function of missingness rate for synthetic signals—TLM is clearly dominant at lower missing rates.

Real-world Physiological Signals

The method is further benchmarked on real physiological data (PPG, ABP, airflow, nasal pressure, thorax impedance, and accelerometry) from multiple open-access databases. Across nearly all cases, HaLI yields consistent improvement in normalized mean absolute error (NMAE) compared to initial imputation baselines.

Figure 6: NMAE for best initial imputation and HaLI variants (spline/pchip) across various physiological signals and missing data rates—robust improvement across modalities.

Visualizations illustrate that HaLI provides refined amplitude and phase matching, preventing common pathologies of naive imputation such as amplitude overshoot and phase mismatch—particularly crucial for clinical interpretability.

Figure 7: Comparison between initial and HaLI-imputed segments for real physiological signals—HaLI restores amplitude and phase synchrony inside missing intervals.

In most real signals, TLM outperforms competing methods for the initial imputation step, validating synthetic observations. The computational efficiency and scalability analysis further underscore the feasibility of the approach for large-scale physiological monitoring.

Figure 8: Frequency of initial imputation algorithm selected as best for each physiological signal shows consistent dominance of TLM.

Practical and Theoretical Implications

HaLI fundamentally augments existing imputation pipelines, operating as a post-processing stratagem capable of leveraging robust template-matching, dynamic system identification, or model-based approaches. The harmonic-level abstraction confers considerable regularity, which is essential for propagating information through missing intervals in signals with both non-stationarity and time-varying morphology. This is particularly pertinent for clinical time series (e.g., cardiac, respiratory signals), where error in amplitude/phase estimation can hinder downstream diagnosis or data fusion tasks.

The method’s modularity permits future extension to multivariate and spatial domains. The current framework assumes univariate signals; however, the authors conjecture that joint imputation for multivariate signals and generalization to spatial and higher-dimensional settings is feasible by leveraging coupled harmonic models or manifold-based decompositions. Moreover, the flexible ANHM provides a bridge to nonparametric and Bayesian methodologies, suggesting opportunities for incorporating uncertainty quantification and more finely-grained prior information in biomedical and physical sciences.

Conclusion

HaLI delivers a principled, robust method for missing data imputation in non-stationary, oscillatory time series, outperforming classical and model-based approaches, particularly when combined with template-based initial imputation strategies like TLM. By harmonically decomposing the imputed signal and interpolating the smooth amplitude and phase evolution, HaLI ensures high-fidelity reconstruction, maintaining both statistical and physiological validity. The approach is made available to the community as efficient, open-source Matlab code, facilitating ready adoption and further research.

Future work should address extensions to mode-mixed signals, spiky oscillatory patterns, multivariate domains, and integration with uncertainty estimation methods for robust deployment in diverse, real-world data environments.

Markdown