Papers
Topics
Authors
Recent
Search
2000 character limit reached

Patch-Based Dual-Branch CTNet

Updated 7 December 2025
  • The paper introduces a dual-branch network that decouples intra-channel temporal evolution and inter-variable correlations, achieving state-of-the-art forecasting accuracy.
  • It employs patch embedding, global inter-patch attention, and an adaptive frequency-domain correction to robustly handle non-stationary industrial data.
  • Ablation studies confirm that removing any core component, particularly the dual-branch structure, significantly degrades performance on various benchmarks.

The Patch-Based Dual-Branch Channel-Temporal Forecasting Network (D-CTNet) is a neural architecture designed for accurate multivariate time series (MTS) forecasting, with specific applicability to collaborative industrial systems facing inter-variable complexity and non-stationary distribution shifts. D-CTNet introduces a modular, patch-based dual-branch approach that jointly decouples and learns intra-channel temporal evolution and inter-variable correlations, enhanced by a global attention fusion mechanism and an adaptive frequency-domain stationarity correction to suppress environmental distributional drift. The architecture attains state-of-the-art performance across seven standard benchmarks, achieving superior forecasting accuracy and robustness relative to contemporary baselines (Wang et al., 30 Nov 2025).

1. Architecture and Data Flow

D-CTNet ingests input tensors Xin∈RB×L×C\mathbf{X}_{\rm in} \in \mathbb{R}^{B \times L \times C}, where BB is batch size, LL is historical sequence length, and CC is the channel/variable dimension. Pre-normalization employs RevIN norm\mathrm{norm} (Kim et al., 2021) as: Xnorm=RevINnorm(Xin)∈RB×L×C\mathbf{X}_{\rm norm} = \mathrm{RevIN}_{\rm norm}(\mathbf{X}_{\rm in}) \in \mathbb{R}^{B\times L\times C}

Patch Embedding

Patches of length PP are extracted with stride SS, yielding N=⌊(L−P)/S⌋+1N = \left\lfloor (L - P)/S \right\rfloor + 1 discrete patches.

  • The input is reshaped into RB×C×N×P\mathbb{R}^{B \times C \times N \times P}, processed with a 1D convolution into latent dimension DD, and augmented with learnable positional embeddings Epos∈RN×D\mathbf{E}_{\rm pos} \in \mathbb{R}^{N \times D}: Xpatch=Conv1d(Xslice)+Epos∈RB×C×N×D\mathbf{X}_{\rm patch} = \mathrm{Conv1d}(\mathbf{X}_{\rm slice}) + \mathbf{E}_{\rm pos} \in \mathbb{R}^{B\times C\times N\times D}

Dual-Branch Channel–Temporal Module

The central innovation is a dual-branch module, each operating on Xpatch\mathbf{X}_{\rm patch}.

  • Linear Temporal Branch (over patches NN for each channel cc and latent dd): A learnable (N×N)(N \times N) weight Wtime\mathbf{W}_{\rm time} processes patches via GELU activation and residual-layernorm: H~time=GELU(Xpatch×patchWtime)\tilde{\mathbf{H}}_{\rm time} = \mathrm{GELU}\left(\mathbf{X}_{\rm patch} \times_{\rm patch} \mathbf{W}_{\rm time}\right)

Htime=LayerNorm(H~time+Xpatch)\mathbf{H}_{\rm time} = \mathrm{LayerNorm}\left(\tilde{\mathbf{H}}_{\rm time} + \mathbf{X}_{\rm patch}\right)

  • Channel Attention Branch (multi-head attention over channels CC): Channels are treated as the "sequence" dimension; queries, keys, and values use D×DD \times D projections and standard MHA softmax attention, followed by Dropout and LayerNorm: Q=XWQ,K=XWK,V=XWVQ = XW_Q,\quad K = XW_K,\quad V = XW_V

A=softmax(QKTdk),out=AVA = \mathrm{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right),\quad \mathrm{out} = AV

Hchannel=LayerNorm(Dropout(out)+Xpatch)\mathbf{H}_{\rm channel} = \mathrm{LayerNorm}\left(\mathrm{Dropout}(\mathrm{out}) + \mathbf{X}_{\rm patch} \right)

The two branches are fused additively: Hfused=Htime+Hchannel\mathbf{H}_{\rm fused} = \mathbf{H}_{\rm time} + \mathbf{H}_{\rm channel}

Global Inter-Patch Attention Fusion

Applying multi-head attention along the patch dimension NN extends receptive field length, supporting long-range dependencies: Hglobal=LayerNorm(Dropout(MHApatch(Hfused))+Hfused)\mathbf{H}_{\rm global} = \mathrm{LayerNorm}\bigl( \mathrm{Dropout} ( \mathrm{MHA}_{\rm patch}( \mathbf{H}_{\rm fused} ) ) + \mathbf{H}_{\rm fused} \bigr )

Forecast Head and Output Construction

  • Feature correction applies adaptive frequency-domain correction (§3), then flattens to B×(CND)B \times (CND), projecting linearly to B×(T C)B \times (T\,C) for forecast horizon TT.
  • Reshaping and inverse normalization (RevIN "denorm") yield output Y^∈RB×T×C\hat{\mathbf{Y}} \in \mathbb{R}^{B \times T \times C}.

2. Frequency-Domain Stationarity Correction

To address non-stationarity—crucial in industrial or environmental data under shifting regime—the penultimate feature representations undergo frequency-domain alignment. This mechanism computes FFTs for both Hglobal\mathbf{H}_{\rm global} and original patch inputs, calculates their power spectra, aligns autocorrelations, and scales features via an adaptive factor: Fpred=FFT(Hglobal),Finput=FFT(Xpatch)F_{\rm pred} = \mathrm{FFT}(H_{\rm global}),\quad F_{\rm input} = \mathrm{FFT}(X_{\rm patch})

Spred=Fpred⊙Fpred∗,Sinput=Finput⊙Finput∗S_{\rm pred} = F_{\rm pred} \odot F_{\rm pred}^*,\quad S_{\rm input} = F_{\rm input} \odot F_{\rm input}^*

S^pred=clamp(IFFT(Spred),0,∞),S^input=clamp(IFFT(Sinput),0,∞)\widehat{S}_{\rm pred} = \mathrm{clamp}(\mathrm{IFFT}(S_{\rm pred}), 0, \infty), \quad \widehat{S}_{\rm input} = \mathrm{clamp}(\mathrm{IFFT}(S_{\rm input}), 0, \infty)

α=∑S^predS^input∑S^input2+ϵ\alpha = \sqrt{ \frac{ \sum \widehat{S}_{\rm pred} \widehat{S}_{\rm input} }{ \sum \widehat{S}_{\rm input}^2 + \epsilon } }

Hfinal=α⊙HglobalH_{\rm final} = \alpha \odot H_{\rm global}

No auxiliary frequency-alignment loss is used; correction is in-line.

3. Mathematical Specification

All core operations are given with precise tensor semantics to support reimplementation.

  • Temporal Branch:

H~time=GELU(X×patchWtime),Htime=LayerNorm(H~time+X)\tilde{H}_{\rm time} = \mathrm{GELU}(X \times_{\rm patch} W_{\rm time}),\quad H_{\rm time} = \mathrm{LayerNorm}(\tilde{H}_{\rm time} + X)

  • Channel Attention Branch:

Q=XWQ, K=XWK, V=XWV, A=softmax(QKTdk), out=AVQ = XW_Q,\, K = XW_K,\, V = XW_V,\, A = \mathrm{softmax}\left( \frac{QK^T}{\sqrt{d_k}} \right ),\, \mathrm{out} = AV

Hchannel=LayerNorm(Dropout(out)+X)H_{\rm channel} = \mathrm{LayerNorm}( \mathrm{Dropout}(\mathrm{out}) + X)

  • Global Inter-Patch Attention:

Hglobal=LayerNorm(Dropout(MHApatch(Hfused))+Hfused)H_{\rm global} = \mathrm{LayerNorm}( \mathrm{Dropout}( \mathrm{MHA}_{\rm patch}( H_{\rm fused} ) ) + H_{\rm fused} )

  • Forecast Loss (MSE):

L=1B∑b=1B∥Y^(b)−Y(b)∥22\mathcal{L} = \frac{1}{B} \sum_{b=1}^B \| \hat Y^{(b)} - Y^{(b)} \|_2^2

4. Optimization and Training Regime

Training employs the Adam optimizer, batch size 32, learning rate typically 1e−31\mathrm{e}{-3}, no weight decay, for 30–50 epochs (dataset dependent). Only direct forecasting error (MSE) is optimized. The architecture is implemented to facilitate chronological splits for standard MTS datasets.

5. Experimental Evaluation and Results

D-CTNet is validated on seven public MTS forecasting datasets: ETTm1, ETTm2, ETTh1, ETTh2, Exchange-Rate, Electricity, and Weather, using chronological train/val/test splits (6:2:2 or 7:1:2), with mean squared error (MSE) and mean absolute error (MAE) as evaluation metrics. Performance is averaged over forecast horizons T∈{96,192,336,720}T \in \{96, 192, 336, 720\}.

Dataset D-CTNet MSE Best Baseline (MSE)
ETTm1 0.398 PatchTST (0.402)
ETTm2 0.283 RLinear (0.286)
ETTh1 0.359 RLinear (0.446)
ETTh2 0.386 PatchTST (0.684)
Electricity 0.182 MSGNet (0.194)
Weather 0.247 MSGNet (0.249)
Exchange 0.347 PatchTST (0.367)

Full results, including per-horizon errors and MAE, are tabulated in the source.

6. Component Analysis and Ablative Insights

Comprehensive ablation studies demonstrate the impact of each architectural component:

  • Removal of the dual-branch structure (DBCT) leads to the greatest performance drop (e.g., MSE increasing from 0.340 to 0.401 on ETTh1).
  • Exclusion of the Global Patch Attention Fusion (GPAF) degrades performance (MSE to 0.386 on ETTh1).
  • Omission of Frequency-Domain Stationarity Correction (FSC) results in moderate accuracy loss (MSE to 0.361 on ETTh1).

The dual-branch decoupling of temporal and channel-wise dependencies emerges as the most critical contributor to accuracy, with global inter-patch fusion supporting robust long-horizon extrapolation (notably at T=720T=720), and frequency-domain correction particularly benefiting non-stationary datasets such as Exchange-Rate.

7. Context and Significance

D-CTNet systematically addresses key challenges in multivariate forecasting for collaborative industrial applications: (1) decoupling intra-variable dynamics and inter-variable correlations, (2) extending long-range dependency modeling, and (3) improving model robustness to non-stationary environmental shifts. The architecture is directly comparable to Transformer-based and linear patch-wise models, incorporating innovations such as dual-branch parallelism and spectrum alignment. Its empirical superiority across diverse and standard MTS benchmarks, together with detailed architectural and training specifications, make it a reproducible framework and candidate for practical deployment in forecasting-driven digital twin and industrial monitoring scenarios (Wang et al., 30 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Patch-Based Dual-Branch Channel-Temporal Forecasting Network (D-CTNet).