AttCLX Model: Hybrid Forecasting
- AttCLX is a hybrid forecasting model that integrates classical time-series techniques, deep learning, and XGBoost regression for robust stock price prediction.
- It employs ARIMA detrending and a multi-scale CNN with self-attention and BiLSTM decoding to capture local and long-range temporal patterns.
- The pretrain–fine-tune architecture achieves superior accuracy, outperforming standalone approaches in key metrics like RMSE, MAE, and R².
The Attention-based CNN-LSTM and XGBoost hybrid model (AttCLX) is a multi-stage machine learning architecture designed to predict stock prices by integrating classical time-series techniques, deep neural sequence models, and gradient-boosted tree ensembles in a pretrain–fine-tune pipeline. AttCLX preprocesses market data with ARIMA for detrending, encodes multi-scale and long-range features with a deep attentional CNN–BiLSTM sequence-to-sequence model, and leverages XGBoost as a final regressor for robust prediction. AttCLX achieves state-of-the-art single-step daily forecasting accuracy on empirical financial data, demonstrating superior performance to standalone classical and deep learning baselines (Shi et al., 2022).
1. Model Pipeline Overview
AttCLX is structured as a three-stage system:
- ARIMA detrending and residual computation: Removes linear trends from the raw price series and generates both differenced and residual series as additional features.
- Attentional CNN–BiLSTM sequence encoding: Utilizes a convolutional encoder with an integrated self-attention layer to extract multi-scale and global patterns, followed by a deep bidirectional LSTM decoder for long-range temporal dependencies.
- XGBoost regressor fine-tuning: Consumes the neural sequence representation (typically the final hidden state) alongside selected engineered features and outputs the final price prediction.
The interaction of these components is visualized as:
1 2 3 4 5 |
[ARIMA preprocessing → features (raw + diff + residual)]
↓
[Attentional CNN → multi-head self-attention → Bi-LSTM decoder]
↓
[XGBoost regressor → final prediction] |
2. Mathematical Specification of Model Components
2.1 ARIMA Preprocessing
Given a price series , AttCLX applies ARIMA(2,1,0): first, price differences are computed as , and modeled via AR(2):
Residuals,
are derived and concatenated with raw market features: open, high, low, close, volume, and amount; yielding an 8-dimensional feature vector at each time step.
2.2 CNN Feature Extraction
Let be the input matrix over a look-back window of and features. Multiple 1D convolutions along the time axis extract temporal motifs:
for kernel sizes , filters each. Features from all scales are concatenated ().
2.3 Self-Attention
The CNN feature output is mapped to projections:
with , . For each time , compute:
and attend:
Four attention heads are used. Concatenated context vectors are passed to the decoder.
2.4 BiLSTM Decoder
A stack of bidirectional LSTM layers ( per direction) processes the sequence : \begin{align*} f_t &= \sigma(W_f [h_{t-1}, x_t] + b_f) \ i_t &= \sigma(W_i [h_{t-1}, x_t] + b_i) \ o_t &= \sigma(W_o [h_{t-1}, x_t] + b_o) \ \hat{g}t &= \tanh(W_g [h{t-1}, x_t] + b_g) \ C_t &= f_t \odot C_{t-1} + i_t \odot \hat{g}_t \ h_t &= o_t \odot \tanh(C_t) \end{align*} Bidirectional outputs at all time steps are concatenated (). Either the full decoded sequence or the final time-step embedding is used for downstream regression.
2.5 XGBoost Regression
The feature vector for XGBoost consists of the last Bi-LSTM hidden state, optionally augmented with ARIMA residuals and CNN summaries. XGBoost constructs an additive ensemble of regression trees, using squared error loss and regularization:
with learning rate $0.1$, max depth $6$, , .
3. Training Regimens and Hyperparameters
- ARIMA: , selected via ADF test and ACF/PACF analysis on first-differenced series.
- CNN-Attention-BiLSTM: Look-back window , ; CNN kernels , per kernel; 4 attention heads, ; BiLSTM , per direction; dropout $0.3$; batch size $32$; Adam optimizer, learning rate $0.01$; trained for up to $50$ epochs with early stopping ($10$ epochs no improvement).
- XGBoost: , learning rate , max_depth , subsample and colsample_bytree , , .
The sequence-to-sequence neural network is pretrained on the stock sequence, then the resulting representations are used to fit the XGBoost regressor.
4. Empirical Evaluation and Comparative Analysis
4.1 Component Ablations
Ablation experiments evaluated combinations of neural pretraining and XGBoost fine-tuning on daily closing price prediction for Bank-of-China (601988.SH, 2007–2022, split at 2021-06-22):
| Pretraining | Fine-tuning | RMSE | MAE | R² |
|---|---|---|---|---|
| None | None | 0.02734 | 0.02368 | 0.7440 |
| None | XGBoost | 0.01755 | 0.01223 | 0.8241 |
| SL-LSTM | SL-LSTM | 0.02282 | 0.01960 | 0.7943 |
| ML-LSTM | ML-LSTM | 0.01720 | 0.01265 | 0.8235 |
| BiLSTM | BiLSTM | 0.01652 | 0.01201 | 0.8421 |
| BiLSTM | XGBoost | 0.01605 | 0.01187 | 0.8630 |
| CNN-BiLSTM | XGBoost | 0.01529 | 0.01145 | 0.8772 |
| ACNN-BiLSTM | XGBoost | 0.01424 | 0.01126 | 0.8834 |
Results indicate that the integration of AttCLX components progressively reduces prediction error; the attention-augmented CNN-BiLSTM sequence, when coupled with XGBoost, yields the highest accuracy.
4.2 Comparison to State-of-the-Art
AttCLX was compared to classical and recent models:
| Model | RMSE | MAE | MAPE | R² |
|---|---|---|---|---|
| ARIMA | 0.02734 | 0.02368 | 0.02368 | 0.7440 |
| ARIMA-NN (’03) | 0.02608 | 0.02350 | 0.02350 | 0.7504 |
| LSTM-KF (’21) | 0.02381 | 0.02192 | 0.02192 | 0.7625 |
| Transformer-KF | 0.01924 | 0.01525 | 0.01525 | 0.8023 |
| TL-KF | 0.01656 | 0.01372 | 0.01372 | 0.8192 |
| AttCLX | 0.01424 | 0.01126 | 0.01126 | 0.8834 |
AttCLX outperforms all baselines in RMSE, MAE, MAPE, and R², with its RMSE (0.01424) and MAE (0.01126) representing a substantial reduction over transformer and LSTM-based models.
5. Design Principles and Architectural Rationale
- Modular Detrending and Nonlinear Extraction: The separation of linear ARIMA preprocessing from nonlinear neural encoding enables the model to exploit both statistical and deep learning strengths.
- Multi-scale Temporal Pattern Modeling: CNNs with multi-scale kernels detect short, local, and longer time-lag features critical in stock sequences.
- Enhanced Context via Attention: Multi-head self-attention layers capture dependencies beyond the receptive field limits of convolutions and LSTMs, incorporating global temporal information.
- Long-Range Memory with Deep BiLSTM: BiLSTM layers ensure both forward and backward dependencies, with a depth sufficient to capture complex time series phenomena.
- Flexible Nonlinear Ensemble via XGBoost: Ensemble regression trees adaptively model any residual nonlinearity or feature interactions not captured by the preceding neural extractors.
A plausible implication is that this hybridization, particularly the use of XGBoost on neural sequence encodings, allows AttCLX to correct systematic neural errors while leveraging the expressive power of tree ensembles.
6. Significance and Application Scope
AttCLX demonstrates practical effectiveness for high-variance, nonlinear, and non-stationary financial time series forecasting. The design is extensible to other time series regimes where both long-range nonlinear dependencies and complex engineered/regression features are relevant. Empirical results suggest robust generalization and error reduction in domains where classical and pure neural approaches are suboptimal (Shi et al., 2022). Source code is available at https://github.com/zshicode/Attention-CLX-stock-prediction.