Learnable Weighted-average Integration (LWI)

Updated 14 November 2025

Learnable Weighted-average Integration (LWI) is a neural module that aggregates forecasts from multiple historical periods using context-dependent, learnable weights.
It employs CNN-based feature extraction and dense layer transformations to generate adaptive weights for fusing period-specific predictions in a multi-period TSF framework.
Empirical studies demonstrate that using LWI reduces MSE by up to 3% compared to fixed-weight methods, improving both short-term and long-term financial forecasting accuracy.

Learnable Weighted-average Integration (LWI) is a neural module designed to aggregate forecasts from multiple historical periods, with learnable contextual weights for each period, within a transformer-based multi-period time series forecasting (TSF) architecture. The LWI mechanism was introduced in the context of the Multi-period Learning Framework (MLF), a dedicated approach for financial TSF where data windows of heterogeneous lengths are processed in parallel and their predictions are adaptively fused (Zhang et al., 7 Nov 2025).

1. Definition and Core Principle

Learnable Weighted-average Integration (LWI) is an architectural module that receives a collection of intermediate forecasts $\{\bar{X}_f^1, \dots, \bar{X}_f^S\}$ for $S$ input periods (differing in historical length or scale) and produces a unified output forecast $\hat{X}_f \in \mathbb{R}^{m \times c}$ via an adaptive, context-dependent convex combination.

Unlike traditional fixed or manually-tuned weighting (e.g., simple arithmetic mean, static attention), LWI’s weights $\text{Att}^s$ are computed dynamically by a parameterized subnetwork that extracts features from the longest historical input. This allows the model to condition period weightings on the actual content of recent history, capturing which input periods are most predictive in a given context.

2. Mathematical Formulation

The LWI mechanism operates as follows:

Context Feature Extraction
- Compute features $\nu$ from the maximum-length input period $X_h^{S_{\max}}$ using a small convolutional neural network (CNN) with padding, batch normalization, and max-pooling:
$\nu = \text{MaxPool}\left(\mathrm{BN}\left(\mathrm{Conv}(\mathrm{Pad}(X_h^{S_{\max}}))\right)\right)$
Attention Weight Generation
- Generate a weight vector $\text{Att} \in \mathbb{R}^S$ with the following nonlinearity:
$\text{Att} = \sigma\left(\tanh(\Theta_1 \nu + b_1) \odot \tanh(\Theta_2 \nu + b_2)\right)$

where $\sigma$ is the sigmoid function, $\odot$ denotes elementwise multiplication, and the $\Theta, b$ are learnable parameters.
Weighted Forecast Aggregation
- Fuse forecasts from all periods using these weights:
$\hat{X}_f = \frac{1}{S} \sum_{s=1}^S \text{Att}^s \cdot \bar{X}_f^s$

This approach ensures that the aggregation is both data-driven and differentiable end-to-end.

3. Role within Multi-period Time Series Frameworks

In the Multi-period Learning Framework (MLF), multiple historical time windows of varying lengths are processed in parallel to mitigate the information loss and bias that arise when relying on a single pre-selected window. Each period goes through patching, embedding, patch squeezing (to reduce redundancy and accelerate attention), and stacked transformer blocks with Inter-period Redundancy Filtering (IRF). At every block, period-specific intermediate forecasts are produced. After the last block, LWI integrates these per-period forecasts to form the model’s final output.

By learning a context-sensitive weighting, LWI accounts for the fact that the predictive value of historical windows varies according to market regime, temporal locality, and signal-to-noise ratio. This is especially important for financial TSF, where short-term and long-term drivers can alternate in importance.

4. Implementation Details and Complexity

A standard LWI implementation requires three consecutive steps:

CNN feature extraction over $X_h^{S_{\max}}$ . If $X_h^{S_{\max}} \in \mathbb{R}^{n \times c}$ , this involves a convolution over time, padding to preserve sequence length, batch normalization, and max-pooling (typically yielding $\nu \in \mathbb{R}^{d_\nu}$ , with $d_\nu$ generally much smaller than $n$ ).
Dense layer transformations ( $\Theta_1$ , $\Theta_2$ ) and nonlinearities (tanh, sigmoid, elementwise product) with output in $\mathbb{R}^S$ .
Weighted sum and broadcast for each forecast $\bar{X}_f^s \in \mathbb{R}^{m \times c}$ , combined via elementwise multiplication and reduction across periods.

The computational complexity is dominated by the feature extraction and the forecast weighting: $O(c m + D |\nu|)$ , which is negligible compared to transformer self-attention ( $O((S N/r)^2 D)$ with standard notation).

5. Empirical Performance and Ablation

Ablation studies in (Zhang et al., 7 Nov 2025) demonstrate that removing LWI and reverting to uniform or fixed weights consistently increases mean squared error (MSE) by 1–3% on a wide range of TSF tasks, indicating the necessity of learnable, context-aware fusion. LWI is synergistic with other MLF modules such as IRF (without which performance drops by 2–5%). The addition of LWI improves forecast accuracy both in short-term (e.g., 5–150 steps) and long-term (e.g., up to 2048 steps) financial prediction settings.

LWI differs from classic attention mechanisms (e.g., self-attention, cross-attention) in that it operates after all per-period sequences have been transformed and “forecasted” at the block level, focusing specifically on the fusion of period-level predictions. Fixed-weight averaging, ensemble-by-stacking, or naive period concatenation lack the adaptive, data-dependent weighting that LWI provides.

Naive alternatives such as Patch-Concat, Patch-Ensemble, or FiLM-based fusions in comparative experiments show higher MSE and less robustness to input period selection. LWI’s context-adaptive weighting enables superior integration of heterogeneous temporal signals without manual period tuning or expensive hyperparameter sweeps.

7. Practical Application and Deployment Considerations

MLF with LWI is deployed in production TSF pipelines for high-frequency financial sales forecasting (e.g., daily forecast and inventory adjustment in Alipay’s Fund Inventory Management System). Empirical use cases confirm uplift in gross merchandise value (GMV) and reductions in operating error. LWI is compatible with large-batch inference, supports GPU acceleration, and does not require explicit supervision for weighting learning, leveraging the end-to-end loss.

Limitations include the need for manual selection of period counts ( $S$ ) and patching hyperparameters. Current designs do not implement dynamic period gating, so all periods are always fused, potentially incurring unnecessary computation when some periods are uninformative. Incorporation of external signals into $\nu$ is suggested as a future enhancement. Cross-period attention at the forecast stage is another potential direction for improving over the additive LWI model.

All essential code and pretrained models for LWI within MLF are available at https://github.com/Meteor-Stars/MLF (Zhang et al., 7 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Multi-period Learning for Financial Time Series Forecasting (2025)

Follow Topic

Get notified by email when new papers are published related to Learnable Weighted-average Integration (LWI).