Chaotic Oscillatory Transformer Network
- Chaotic Oscillatory Transformer Network (COTN) is a deep neural forecasting model that integrates a Transformer backbone with chaotic Lee Oscillator activations to manage extreme volatility.
- It employs innovative components like Max-over-Time pooling, lambda gating, and an Autoencoder Self-Regressive module to enhance anomaly detection and prediction accuracy.
- Experimental evaluations show up to 17% lower forecasting error compared to models like Informer and GARCH, proving its robustness in volatile environments.
The Chaotic Oscillatory Transformer Network (COTN) is a neural forecasting architecture designed for highly volatile, nonlinear time-series systems encountered in domains such as financial markets and electricity trading. It combines a Transformer backbone with a chaotic Lee Oscillator activation, Max-over-Time pooling, a lambda () gating mechanism, and an Autoencoder Self-Regressive (ASR) module for anomaly isolation. COTN addresses the limitations of standard activation functions (e.g., ReLU, GELU) under the regime of extreme fluctuations, enabling accurate prediction and robust anomaly handling during abrupt systemic changes.
1. Architectural Framework
COTN architecture consists of multiple sequential processing stages, each tailored for stability and adaptivity in chaotic environments:
- Input Preprocessing: Missing timestamps are forward-filled; statistical outliers (e.g., ±20% returns) are removed; derived features include log-returns, moving averages, and volatilities. The Autoencoder Self-Regressive (ASR) module precomputes anomaly scores on the raw input.
- Embedding Layer: Maps each time-step feature vector to a continuous latent representation.
- Encoder Blocks: Each Transformer encoder contains:
- Distilled Multi-Head Self-Attention (adapted from DAT) for efficient context aggregation ( complexity with strided/pooling options).
- Add & Norm.
- Feed-forward sub-layer: Linear projection, Lee Oscillator Activation, Max-over-Time pooling, -Gating with GELU, Linear projection.
- Add & Norm.
- Decoder or Prediction Head: Predicts all future steps () in one pass.
- Output Layer: Final linear projection to produce raw forecasted values.
A key novelty is the replacement of conventional nonlinearities in feed-forward layers with a hybrid Lee Oscillator + GELU activation pipeline, mediated by pooling and gating mechanisms to modulate responsiveness and stability.
2. Lee Oscillator Activation and Dynamics
The Lee Oscillator is a discrete-time dynamical system originally formulated with excitatory (), inhibitory (), input (), and output () variables:
$\begin{aligned} E(t+1) &= \Sig(e_1\,E(t) - e_2\,I(t) + S(t) - \xi_E) \ I(t+1) &= \Sig(i_1\,E(t) - i_2\,I(t) - \xi_I) \ \Omega(t+1) &= \Sig(S(t)) \ L(t) &= [E(t) - I(t)] e^{-k S^2(t)} + \Omega(t)\ \end{aligned}$
where $\Sig(z)$ is typically either the sigmoid or ; is the external stimulus; and are thresholds.
COTN employs the extended LORS variant, incorporating retrograde signaling for additional dynamical richness:
The Lee Oscillator demonstrates a progressive chaotic growth regime, amplifying responsiveness to small input perturbations without causing gradient blow-up. It combines regions of smooth (tanh-like) and chaotic responses, outperforming traditional activations in capturing rapid, sub-cycle volatility, especially when subjected to extreme system shocks. Eight pre-tuned parameter sets () govern these dynamics, with the optimal set validated per dataset.
3. Max-over-Time Pooling and Lambda-Gating
After the internal 100-step simulation of the Lee Oscillator per scalar input, Max-over-Time (MoT) pooling selects the most salient response:
For each batch and feature dimension, the optimal oscillator type () is selected by validation. The final activation is then computed as a convex fusion of the chaotic oscillator’s peak and GELU activation:
Here, regulates the trade-off between smoothness (large ) and sensitivity to chaotic patterns (small ). MoT preserves the highest-magnitude, potentially rare, oscillatory state, while -gating allows explicit control over the chaos-predictability spectrum.
4. Autoencoder Self-Regressive (ASR) Module for Anomaly Handling
The ASR module serves dual functions: feature denoising and anomaly detection. It is structured as:
- Encoder: Maps a window of recent inputs to a latent state .
- Decoder: Attempts to reconstruct the input window .
- Autoregressive Head: Predicts the next value .
Training minimizes the combined objective:
where
Anomaly points are identified where pointwise reconstruction error exceeds a threshold (typically a high quantile of the empirical distribution). During both training and inference, anomalous points are down-weighted or masked, preventing their propagation into the Lee Oscillator dynamics and ensuring prediction robustness.
5. Implementation, Training, and Stability
COTN’s implementation leverages the following components, with sample pseudocode for key layers:
1 2 3 4 5 6 7 8 9 10 |
def COTN_FeedForward(X, λ): # X: [batch, length, d_model] U = Linear1(X) # [batch, length, d_ff] U_activated = torch.zeros_like(U) for idx, u in enumerate(U.flatten()): LORS_traj = run_LORS(u, param=T*) # length 100 f_lee = max(LORS_traj) # scalar u_activated[idx] = λ * GELU(u) + (1-λ) * f_lee V = Linear2(U_activated.reshape_as(U)) return V |
Training methodology:
- Initial warm-start phase using GELU activation ( epochs) before enabling Lee activation, reducing convergence time by ~40%.
- Hyperparameter selection: typically λ = 0.5, batch size 64, learning rate 1e-4; oscillator type () validated per dataset.
Stability: Theoretical contraction mapping is satisfied if the λ-gated composite activation yields a total Lipschitz constant :
This property ensures fixed-point convergence within each residual feed-forward block.
6. Experimental Evaluation
Extensive benchmarking was performed on both synthetic and real-world volatility datasets:
- Datasets: ETT-H and ETT-M (electricity; 8,640 and 69,120 samples), 1-minute A-share stock data (17,000+ samples).
- Preprocessing: Missing data imputation, Z-score outlier removal, ±20% return truncation, feature construction (OHLCV returns, moving averages, volatility bands).
- Baselines: Informer (deep-learning, Transformer-based), GARCH (statistical, volatility modeling).
- Performance Metrics: Mean Absolute Error (MAE), Mean Squared Error (MSE).
| Dataset | Informer MAE | GARCH MAE | COTN MAE | Informer MSE | GARCH MSE | COTN MSE |
|---|---|---|---|---|---|---|
| ETTh1 (24h) | 0.549 | 0.712 | 0.515 | 0.577 | 0.810 | 0.530 |
| ETTm2 (48h) | 0.614 | 0.830 | 0.571 | 0.689 | 0.995 | 0.635 |
| A-share (96m) | 1.567 | 1.892 | 1.427 | 3.608 | 4.212 | 3.394 |
COTN achieves up to 17% lower error than Informer and up to 40% lower error than GARCH, validating its effectiveness in capturing nonstationary, high-volatility dynamics.
7. Practical Considerations and Extensions
- Tuning Recommendations:
- Begin with GELU-only training; fine-tune with λ-gated Lee activation.
- Select the optimal oscillator type (–) using a hold-out validation set.
- Adjust according to volatility levels; extreme cases may benefit from a lower .
- Limitations:
- Increased computational demand and memory footprint due to internal 100-step oscillator simulation, partially alleviated by MoT pooling.
- Additional complexity stemming from oscillator type and λ as hyperparameters, necessitating careful validation.
- Future Directions:
- Learnable, data-driven λ rather than a fixed hyperparameter.
- End-to-end trainable oscillator parameters or integration into neural ODE frameworks.
- Application to broader classes of volatile systems (e.g., climate, traffic, cyber-attacks) is plausible based on current robustness results.
The central innovation of COTN lies in the seamless fusion of chaos-theoretic dynamical activation, real-time anomaly isolation, and Transformer-based deep sequence modeling, offering a substantially more responsive and robust tool for time-series forecasting in complex, nonstationary contexts.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free