Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 27 tok/s Pro

GPT-5 High 30 tok/s Pro

GPT-4o 93 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 460 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Financial Time Series Analysis and Forecasting with HHT Feature Generation and Machine Learning (2105.10871v1)

Published 23 May 2021 in q-fin.CP and q-fin.ST

Abstract: We present the method of complementary ensemble empirical mode decomposition (CEEMD) and Hilbert-Huang transform (HHT) for analyzing nonstationary financial time series. This noise-assisted approach decomposes any time series into a number of intrinsic mode functions, along with the corresponding instantaneous amplitudes and instantaneous frequencies. Different combinations of modes allow us to reconstruct the time series using components of different timescales. We then apply Hilbert spectral analysis to define and compute the associated instantaneous energy-frequency spectrum to illustrate the properties of various timescales embedded in the original time series. Using HHT, we generate a collection of new features and integrate them into machine learning models, such as regression tree ensemble, support vector machine (SVM), and long short-term memory (LSTM) neural network. Using empirical financial data, we compare several HHT-enhanced machine learning models in terms of forecasting performance.

Summary

The paper introduces a novel pipeline using CEEMD and HHT to generate adaptive, multiscale features that significantly reduce prediction errors.
The paper demonstrates that integrating HHT features into ML models like LSTM enhances forecasting accuracy compared to traditional methods.
The paper addresses practical challenges such as the end effect and information leakage, providing robust solutions validated across multiple financial instruments.

HHT Feature Generation and Machine Learning for Financial Time Series Forecasting

Introduction and Motivation

This paper addresses the challenge of modeling and forecasting nonstationary, nonlinear financial time series by leveraging the Hilbert-Huang Transform (HHT) and its noise-assisted variant, Complementary Ensemble Empirical Mode Decomposition (CEEMD). Traditional spectral methods such as Fourier and wavelet transforms are limited by their reliance on stationary, linear bases and lack adaptivity to the complex, multiscale, and noisy nature of financial data. The HHT framework, particularly when combined with CEEMD, provides a fully adaptive, data-driven decomposition that is robust to noise and capable of extracting interpretable features across multiple timescales.

The authors propose a pipeline in which CEEMD is used to decompose financial time series into intrinsic mode functions (IMFs), from which instantaneous amplitude and frequency features are extracted via the Hilbert transform. These HHT-derived features are then integrated into ML models—including regression tree ensembles (RTE), support vector machines (SVM), and long short-term memory (LSTM) networks—for forecasting tasks. The paper systematically evaluates the predictive utility of these features and addresses practical issues such as the end effect in EMD-based decompositions and information leakage in real-time forecasting.

CEEMD and HHT: Methodological Framework

CEEMD for Multiscale Decomposition

CEEMD extends the empirical mode decomposition (EMD) by introducing pairs of complementary white noise to the input signal, mitigating the mode mixing problem and enhancing robustness to intrinsic noise. For a given time series $x(t)$ , CEEMD produces a set of IMFs $c_j(t)$ and a residual $r_n(t)$ such that:

$x(t) = \sum_{j=1}^n c_j(t) + r_n(t)$

Each IMF captures oscillatory behavior at a specific timescale, with higher-order IMFs representing lower-frequency, longer-term trends.

Hilbert Spectral Analysis

Applying the Hilbert transform to each IMF yields analytic signals from which instantaneous amplitude $a_j(t)$ and frequency $f_j(t)$ are computed:

$a_j(t) = \sqrt{c_j^2(t) + \hat{c}_j^2(t)}, \quad f_j(t) = \frac{1}{2\pi} \frac{d}{dt} \arctan\left(\frac{\hat{c}_j(t)}{c_j(t)}\right)$

This enables the construction of a time-frequency-energy representation (Hilbert spectrum), providing a sparse, adaptive alternative to traditional spectral methods.

Figure 2: The collection of HHT features derived from the original time series $x(t)$ .

Feature Engineering and Filtering

The HHT framework enables the generation of a rich set of features: real and imaginary parts of IMFs, instantaneous amplitude, and frequency for each mode. These features can be selectively combined to construct low-pass or high-pass filtered versions of the original series, facilitating both denoising and the isolation of relevant timescale dynamics.

Machine Learning Integration

Training and Testing Protocols

The authors emphasize the importance of avoiding information leakage, particularly given the end effect in EMD-based methods, where decomposition near the series boundaries is less reliable. To address this, they propose a rolling-window, one-shot extrapolation protocol: at each prediction step, CEEMD and HHT are applied only to past data, and ML models are trained and evaluated in a strictly forward-looking manner.

ML Models and Feature Sets

Three classes of ML models are considered:

Regression Tree Ensemble (RTE): Gradient-boosted decision trees for nonlinear regression.
Support Vector Machine (SVM): Kernel-based regression.
LSTM Neural Network: Deep recurrent architecture for sequence modeling, with a custom structure incorporating HHT features.
Figure 4: The structure of LSTM with HHT features.

The LSTM architecture consists of two stacked LSTM layers (100–200 units each), dropout regularization, and two fully connected layers with ReLU activations. The input at each time step is the concatenation of HHT features across all IMFs.

Feature Selection and Empirical Results

The paper systematically evaluates the predictive performance of different feature subsets:

Original time series only
IMFs only
Complex IMFs (real + imaginary parts)
Full HHT features (IMFs, Hilbert transforms, amplitude, frequency)

Across multiple financial instruments (S&P 500, VIX, GLD, TNX), the following empirical findings are reported:

HHT features consistently reduce mean squared error (MSE) compared to using the original time series alone.
The combination of real and imaginary parts of IMFs (complex IMFs) often achieves similar or superior performance to the full HHT feature set, despite lower dimensionality.
High-frequency IMFs are more informative for short-term prediction, while low-frequency IMFs contribute less and may introduce redundancy.
LSTM models benefit most from HHT features, especially under the extrapolation protocol, due to their ability to adapt to nonstationary input distributions and incorporate the end effect correction factor.

End Effect and Extrapolation

The end effect, arising from boundary interpolation in EMD/CEEMD, leads to increased decomposition error near the series endpoints. The authors introduce an end effect factor $\lambda(t)$ to quantify the position within the time window and incorporate it as an additional input feature for ML models. This correction is shown to improve out-of-sample forecasting accuracy, particularly in the rolling extrapolation setting.

Comparative Analysis and Theoretical Implications

The HHT/CEEMD approach is contrasted with Fourier and wavelet transforms:

Method	Nonstationarity	Nonlinearity	Basis	Spectrum
Fourier	No	No	Fixed	Global, Dense
Wavelet	Yes	No	Fixed	Regional, Dense
HHT	Yes	Yes	Adaptive	Local, Sparse

HHT's adaptivity and sparsity are particularly advantageous for financial time series, which are characterized by regime shifts, volatility clustering, and multiscale structure.

Practical Considerations and Limitations

Computational Cost: CEEMD is more computationally intensive than standard EMD, but the use of complementary noise pairs reduces the required ensemble size.
Feature Explosion: The number of HHT features scales with the number of IMFs; feature selection or regularization is necessary to avoid overfitting.
End Effect: While the proposed correction mitigates boundary errors, decomposition quality still degrades near the series endpoints, limiting the horizon for reliable extrapolation.
Model Selection: LSTM models are best suited for nonstationary, evolving distributions, but require careful tuning and are more resource-intensive than tree-based models.

Implications and Future Directions

The integration of HHT-derived features with modern ML models provides a principled framework for capturing the multiscale, nonlinear, and nonstationary dynamics of financial time series. The empirical results demonstrate that these features enhance predictive accuracy and model efficiency, particularly in deep learning architectures. The approach is generalizable to other domains characterized by similar data properties (e.g., geophysics, engineering).

Future research directions include:

Automated feature selection or dimensionality reduction for HHT features
Online or streaming implementations of CEEMD-HHT for real-time forecasting
Extension to multivariate and cross-asset modeling
Theoretical analysis of the statistical properties of HHT features in ML contexts

Conclusion

This work establishes a comprehensive methodology for financial time series analysis and forecasting by combining CEEMD-based HHT feature generation with advanced machine learning models. The adaptive, multiscale decomposition yields interpretable and predictive features that outperform traditional approaches, especially when integrated with LSTM networks and equipped with end effect correction. The findings have both practical and theoretical significance for time series modeling in finance and related fields.