FS-LSTM Ensemble: Robust Forecasting

Updated 26 May 2026

The paper demonstrates how ensemble LSTM architectures combine parameter perturbation, evolutionary feature selection, and stacking to reduce forecasting error.
FS-LSTM (Ensemble) is a framework that integrates multiple LSTM strategies for sequential prediction, providing robust generalization and explicit feature interpretability.
Empirical results show state-of-the-art performance in small-sample regression, air quality, and financial forecasting, outperforming traditional LSTM models.

The FS-LSTM (Ensemble) framework encompasses a range of ensemble learning strategies built around Long Short-Term Memory (LSTM) neural networks, designed for robust forecasting and representation learning in sequential data. Ensemble LSTM methodologies combine multiple LSTMs—either through parameter perturbation, stacking with other models, or multi-objective optimization—to improve generalization, reduce overfitting, and extract interpretable feature importances. These architectures have demonstrated state-of-the-art performance in domains such as small-sample sequence regression, time series forecasting with embedded feature selection, and financial prediction via hybrid stacking (Chen et al., 2020, Espinosa et al., 2023, Liu et al., 2024).

1. Core Architectural Paradigms

1.1 Ensemble Long Short-Term Memory (EnLSTM)

EnLSTM (Chen et al., 2020) fuses two principles: an Ensemble Neural Network (ENN) framework employing ensemble-randomized maximum likelihood (EnRML) updates without gradients, and a cascaded LSTM (C-LSTM) backbone for sequential prediction. The model creates an ensemble of $N_e$ perturbed parameter sets via model-parameter smoothing and runs parallel feedforward passes. Training involves a covariance-based update across the ensemble, synchronously adjusting entire LSTM parameter sets according to cross-covariances between model weights and predictions.

1.2 Multi-Objective Evolutionary Embedded Feature Selection

EFS-LSTM-MOEA (Espinosa et al., 2023) introduces a hybrid evolutionary approach that directly embeds binary feature masks within each LSTM’s parameter set. A mixed binary–real vector $(s, w)$ encodes both feature selection and LSTM weights. A multi-objective optimization (typically via NSGA-II) targets RMSE minimization on time-partitioned folds, generating a Pareto set of diverse, sparsified LSTMs. Final predictions use stacking-based ensemble aggregation (e.g., Random Forest meta-regressor), and feature importance is assessed via feature selection frequency across non-dominated models.

1.3 Stacking LSTM Ensembles with Heterogeneous Base Models

The LSTM+ANN ensemble (Liu et al., 2024) constructs parallel LSTM and fully connected ANN subnetworks, both operating on normalized, windowed financial time series data (OHLC, 30-day window). A meta-learner (linear regression) then integrates base network outputs. Notably, this approach does not apply explicit algorithmic feature selection; rather, all candidate features are used throughout, and ensemble gain arises purely from the diversity of underlying function approximators (recurrent vs. static feedforward architectures).

2. Mathematical Formulation of Ensemble and Optimization Strategies

2.1 EnRML Update Rule (ENN in EnLSTM)

Given ensemble members $\{m_j\}$ and their outputs $\{g_j\}$ , the EnRML update for parameters $m_j$ at iteration $\ell$ is:

$m_j^{(\ell+1)} = m_j^{(\ell)} - C_{mg}^{(\ell)}\left(C_g^{(\ell)} + \sigma_d^2 I\right)^{-1}\left(g_j^{(\ell)} - d_{\text{obs},j}\right) - \lambda\left[m_j^{(\ell)} - \overline{m}^{(\ell)}\right]$

where $C_{mg}$ is the cross-covariance between parameters and outputs, $C_g$ is the output covariance, $\sigma_d^2 I$ is observation noise, and $(s, w)$ 0 is a damping parameter. Model-parameter perturbation (kernel smoothing) follows each update to prevent collapse:

$(s, w)$ 1

2.2 Embedded Feature Selection

Each individual in EFS-LSTM-MOEA is characterized by $(s, w)$ 2, with $(s, w)$ 3 a feature mask and $(s, w)$ 4 the LSTM flat parameter vector. The multi-objective loss over $(s, w)$ 5 time partitions $(s, w)$ 6 is:

$(s, w)$ 7

Offspring are generated via crossover and mutation of both mask and weights, and NSGA-II is applied to maintain Pareto-optimal diversity.

2.3 Stacking Integration

For $(s, w)$ 8 base models $(s, w)$ 9, meta-level features $\{m_j\}$ 0 are fed into a regressor such as

$\{m_j\}$ 1

where $\{m_j\}$ 2 is typically fitted via ridge regression or, as in the air quality application, a Random Forest.

3. Training and Regularization Schemes

Data Preparation: Windowed sequence construction via sliding window, leave-one-out cross-validation.
Ensemble Initialization: Draw $\{m_j\}$ 3 parameter vectors from a fitted Gaussian over initial weights.
Feedforward and Update: Batch-wise feedforward for each ensemble member, target perturbation for observation realism, covariance computation, EnRML parameter update, kernel smoothing.
Convergence Criteria: Early stopping upon stagnation of held-out fold MSE.
Regularization: Dropout (rate 0.3), batch normalization, and perturbation to both model parameters and observations.

Population: Size $\{m_j\}$ 4, generations $\{m_j\}$ 5, crossover/mutation rates $\{m_j\}$ 6.
Partitioning: 5 contiguous, nonoverlapping time blocks as objectives.
Base Architecture: Single-layer LSTM; evolutionary search directly tunes all weights, biases, and binary mask.
Meta-Learning: Train regression ensemble (Random Forest) on outputs of all final Pareto-optimal models.

Architecture: Two-layer LSTM (100 hidden units each, dropout after first layer) and ANN (100 and 50 hidden units, ReLU activations).
Input/Target: 30-day, 4-feature window for both models, target is next-day close price.
Training: MSE loss, unspecified optimizer (standard is Adam), data normalization and imputation.
Ensemble: OLS linear regression meta-learner on outputs from base learners.

4. Empirical Results and Comparative Performance

Small Sample Sequential Forecasting

On well-log datasets (6 wells, 4 inputs to 3 outputs), EnLSTM achieved an averaged per-well MSE of 0.43, outperforming C-LSTM (0.61), LSTM (0.76), and FCNN (1.38). This equates to a 34% MSE reduction relative to C-LSTM. On 14-well, 12-output datasets, convergence occurs within 3 epochs, yielding robust fits, especially on physical/mechanical logs (Chen et al., 2020).

Time Series with Embedded FS

On air-quality datasets (Italy NOₓ, Lorca NO₂), EFS-LSTM-MOEA produced lower test RMSE (0.1111 Italy; 0.1800 Lorca) and higher stability (overfitting ratios Italy=1.033, Lorca=0.992) compared to conventional LSTM (0.1826, 0.2160; 0.492, 0.688) and CancelOut (Espinosa et al., 2023). Feature selection interpretability is quantified via selection frequency $\{m_j\}$ 7.

Financial Time Series Stacking

On S&P 500 data, LSTM+ANN stacking delivered $\{m_j\}$ 8 and RMSE = 69.75, surpassing base ANN ( $\{m_j\}$ 9), LSTM ( $\{g_j\}$ 0), CNN, and BiLSTM. The ensemble achieved over a 20% improvement compared to the best single model (Liu et al., 2024).

Model	$\{g_j\}$ 1	MAE	MSE	RMSE
ANN	0.4098	46.92	6,209.1	78.80
LSTM	0.2717	42.39	7,662.1	87.53
CNN	0.1918	51.17	8,502.6	92.21
BiLSTM	-0.0784	46.40	11,345.6	106.52
LSTM+ANN (Ensemble)	0.5375	37.78	4,865.2	69.75

5. Interpretability, Generalization, and Tradeoffs

Interpretability via Feature Selection

EFS-LSTM-MOEA quantifies feature importance by selection frequency across Pareto-optimal models: $\{g_j\}$ 2, delivering explicit variable relevance estimates—an advantage in environmental and scientific forecasting (Espinosa et al., 2023).

Generalization and Overfitting

Ensemble-based updates (EnRML, evolutionary ensembles, stacking) collectively increase generalization, especially in small sample or high-noise contexts. EnLSTM avoids over-convergence through parameter and observation perturbations, whereas EFS-LSTM-MOEA leverages Pareto diversity and explicit regularization. All methods report reduced overfitting ratios relative to single LSTM models.

Computational Considerations

Training cost for ensemble methods, especially EnLSTM (with $\{g_j\}$ 3 covariance operations) and evolutionary FS-LSTM ensembles (long evolutionary sweeps), is higher than single-model backpropagation. EnLSTM achieves rapid convergence (3–5 epochs), while EFS-LSTM-MOEA reports average runtimes $\{g_j\}$ 4 minutes/run, compared to conventional LSTM ( $\{g_j\}$ 5 min) (Chen et al., 2020, Espinosa et al., 2023).

Method	RMSE_test	Overfit Ratio	Runtime (min)
EFS-LSTM-MOEA	0.1111	1.033	926.6
Conv. LSTM	0.1826	0.492	0.59
CancelOut	0.1216	0.706	0.40
EAR-FS	0.2364	0.998	1.26

6. Strengths, Limitations, and Extensions

Strengths

Superior performance on small-data, noisy, or high-dimensional sequential domains (Chen et al., 2020, Espinosa et al., 2023).
Provides robust generalization and explicit model diversity through ensemble design.
Enables direct quantification of feature importance (EFS-LSTM-MOEA).
Converges rapidly with respect to epochs in some configurations.

Limitations

Increased training costs (covariance steps, evolutionary sweeps, stacking meta-fit).
Cascade designs (as in C-LSTM) can propagate upstream errors.
Hyperparameter tuning (ensemble size $\{g_j\}$ 6, perturbation factors $\{g_j\}$ 7) is nontrivial.

Suggested Extensions

Replacement/augmentation of LSTM base with CNN or Transformer backbones within the EnLSTM framework.
Application to generic small-data sequential modeling tasks, including environmental, sensor, and industrial time series (Chen et al., 2020).
Sparse or low-rank approximations in covariance computations to reduce training overhead.

7. Usage Scenarios and Application Domains

Geophysical Data: Well-log regression, mechanical property inference from limited or incomplete measurements (Chen et al., 2020).
Environmental Forecasting: NOₓ/NO₂ multistep prediction and real-world air quality forecasting with embedded feature selection (Espinosa et al., 2023).
Financial Forecasting: Next-day S&P 500 regression via a heterogeneous ensemble leveraging both temporal memory (LSTM) and static nonlinear projections (ANN) (Liu et al., 2024).
A plausible implication is that ensemble LSTM techniques are particularly valuable where data are scarce, noise is significant, and model interpretability is desirable.

References:

"Ensemble long short-term memory (EnLSTM) network" (Chen et al., 2020)
"Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting" (Espinosa et al., 2023)
"Application of an ANN and LSTM-based Ensemble Model for Stock Market Prediction" (Liu et al., 2024)

Markdown Report Issue Upgrade to Chat

References (3)

Ensemble long short-term memory (EnLSTM) network (2020)

Embedded feature selection in LSTM networks with multi-objective evolutionary ensemble learning for time series forecasting (2023)

Application of an ANN and LSTM-based Ensemble Model for Stock Market Prediction (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FS-LSTM (Ensemble).

FS-LSTM Ensemble: Robust Forecasting

1. Core Architectural Paradigms

1.1 Ensemble Long Short-Term Memory (EnLSTM)

1.2 Multi-Objective Evolutionary Embedded Feature Selection

1.3 Stacking LSTM Ensembles with Heterogeneous Base Models

2. Mathematical Formulation of Ensemble and Optimization Strategies

2.1 EnRML Update Rule (ENN in EnLSTM)

2.2 Embedded Feature Selection

2.3 Stacking Integration

3. Training and Regularization Schemes

EnLSTM Pipeline (Chen et al., 2020)

EFS-LSTM-MOEA Pipeline (Espinosa et al., 2023)

LSTM+ANN Stacking (Liu et al., 2024)

4. Empirical Results and Comparative Performance

Small Sample Sequential Forecasting

Time Series with Embedded FS

Financial Time Series Stacking

5. Interpretability, Generalization, and Tradeoffs

Interpretability via Feature Selection

Generalization and Overfitting

Computational Considerations

6. Strengths, Limitations, and Extensions

Strengths

Limitations

Suggested Extensions

7. Usage Scenarios and Application Domains

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FS-LSTM Ensemble: Robust Forecasting

1. Core Architectural Paradigms

1.1 Ensemble Long Short-Term Memory (EnLSTM)

1.2 Multi-Objective Evolutionary Embedded Feature Selection

1.3 Stacking LSTM Ensembles with Heterogeneous Base Models

2. Mathematical Formulation of Ensemble and Optimization Strategies

2.1 EnRML Update Rule (ENN in EnLSTM)

2.2 Embedded Feature Selection

2.3 Stacking Integration

3. Training and Regularization Schemes

EnLSTM Pipeline (Chen et al., 2020)

EFS-LSTM-MOEA Pipeline (Espinosa et al., 2023)

LSTM+ANN Stacking (Liu et al., 2024)

4. Empirical Results and Comparative Performance

Small Sample Sequential Forecasting

Time Series with Embedded FS

Financial Time Series Stacking

5. Interpretability, Generalization, and Tradeoffs

Interpretability via Feature Selection

Generalization and Overfitting

Computational Considerations

6. Strengths, Limitations, and Extensions

Strengths

Limitations

Suggested Extensions

7. Usage Scenarios and Application Domains

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research