Papers
Topics
Authors
Recent
Search
2000 character limit reached

GAF-FusionNet: Multimodal ECG Analysis

Updated 25 February 2026
  • The paper presents a dual-branch network that combines 1D temporal signals and GAF-transformed spatial features through a dual-layer split-attention mechanism.
  • Temporal dynamics are captured with 1D CNN and bidirectional LSTM, while spatial patterns are extracted using a 2D CNN on GAF images.
  • Ablation studies confirm that the integrated attention and dual-modal fusion significantly boost ECG classification accuracy on multiple benchmark datasets.

GAF-FusionNet is a multimodal deep learning framework for electrocardiogram (ECG) analysis that integrates both temporal and spatial information by jointly processing the raw 1D ECG signal and a Gramian Angular Field (GAF) image encoding of the same segment. The network architecture relies on a dual-branch system: one branch applies 1D convolutional and recurrent neural layers to the raw signal, while the other processes a GAF-transformed image with a 2D CNN. Feature fusion is accomplished via a dual-layer cross-channel split-attention mechanism, enabling adaptive inter- and intra-modal integration for improved classification performance across diverse ECG datasets (Qin et al., 2024).

1. Background and Motivation

Conventional deep learning approaches for ECG classification are typically unimodal, relying either on raw time-series representation or simple feature fusion strategies. Many do not fully exploit complementary temporal and spatial features that may be distributed across different representations of the same ECG segment. GAF-FusionNet addresses these challenges by encoding each ECG window as both a normalized time series and its GAF image. This dual representation allows the model to leverage fine-grained temporal patterns (such as the shape of QRS complexes) and global dynamic features (such as rhythm regularity), with the GAF providing a 2D structure that makes temporal dependencies more accessible to standard CNN architectures (Qin et al., 2024).

2. Gramian Angular Field (GAF) Transformation

GAF transformation enables the conversion of a 1D time series X={x1,x2,,xn}X=\{x_1, x_2, \dots, x_n\} into a 2D image. The transformation comprises three stages:

  1. Rescaling: Each value is normalized to [1,1][-1,1]:

x~j=xjmin(X)max(X)min(X)×21\tilde x_j = \frac{x_j - \min(X)}{\max(X) - \min(X)} \times 2 - 1

  1. Angular Encoding: The normalized value is mapped to an angle:

ϕj=arccos(x~j)\phi_j = \arccos(\tilde x_j)

  1. Gramian Matrix Computation:

    • Gramian Summation Field (used in GAF-FusionNet):

    GS[i,j]=cos(ϕi+ϕj)G_S[i,j] = \cos(\phi_i + \phi_j)

  • Gramian Difference Field (alternative, not used in current implementation):

    GD[i,j]=sin(ϕiϕj)G_D[i,j] = \sin(\phi_i - \phi_j)

The GSG_S matrix serves as a 2D image input, preserving global and local temporal structure for the spatial-branch CNN (Qin et al., 2024).

3. Network Architecture and Feature Fusion

GAF-FusionNet consists of parallel temporal and spatial branches followed by dual-layer cross-channel split attention for feature fusion.

Temporal Branch:

  • Input: raw ECG segment SRw×1S \in \mathbb{R}^{w \times 1}.
  • Processing: Several 1D convolutional layers with ReLU nonlinearity, followed by a bidirectional LSTM, and global average pooling, producing a feature vector FtF_t.

Spatial Branch:

  • Input: GAF image GSRw×wG_S \in \mathbb{R}^{w \times w}.
  • Processing: 2D CNN (ResNet-34 or custom), ending with global average pooling to produce feature vector [1,1][-1,1]0.

Dual-Layer Split-Attention Fusion:

  • Layer 1: Intra-modality self-attention for [1,1][-1,1]1 and [1,1][-1,1]2:

[1,1][-1,1]3

  • Layer 2: Cross-modality attention:

[1,1][-1,1]4

  • Fused outputs:

[1,1][-1,1]5

  • Classification: [1,1][-1,1]6 is concatenated and passed to an MLP classifier with softmax for final prediction.

This design enables nuanced modeling of both within-modality dependencies and inter-modality interactions, advancing beyond naive concatenation or averaging (Qin et al., 2024).

4. Training Methodology and Datasets

The network is trained end-to-end on standard ECG benchmarks (ECG200, ECG5000, MIT-BIH Arrhythmia Database) with the following protocol:

  • Preprocessing: Butterworth bandpass filtering, z-score normalization, segmentation into windows.
  • Optimizer: Adam with cosine-annealing learning rate scheduling, batch size of 64.
  • Spatial CNN Backbone: ResNet-34 (pretrained on ImageNet).
  • Regularization and Augmentation: Early stopping based on validation loss; data augmentation arises from window overlap.
  • Loss Function: Cross-entropy, with categorical labels.

GAF-FusionNet demonstrates robust generalization across small to large datasets, consistently outperforming state-of-the-art methods in classification accuracy (ECG200: 94.5%, ECG5000: 96.9%, MIT-BIH: 99.6%) (Qin et al., 2024).

5. Empirical Performance and Ablation Analysis

Comparative evaluation demonstrates that GAF-FusionNet outperforms leading models such as LSTM-FCN, Informer, Attention-CNN, and Multi-Scale CNN on all standard ECG datasets. Ablation studies reveal:

  • Removing dual-layer attention reduces MIT-BIH accuracy by 1.8%.
  • Removing the cross-channel module reduces accuracy by 1.5%.
  • Using only the time series or only the GAF branch reduces accuracy by 2.6% and 2.1%, respectively.
Method ECG200 Acc. ECG5000 Acc. MIT-BIH Acc.
DNN (raw) 88.5 % 93.2 % 95.7 %
LSTM-FCN 91.0 % 94.1 % 96.3 %
Informer 91.5 % 94.8 % 97.1 %
Attention-CNN 92.0 % 95.3 % 97.5 %
Multi-Scale CNN 92.5 % 95.7 % 97.8 %
GAF-FusionNet 94.5 % 96.9 % 99.6 %

Performance gains trace directly to the hybrid exploitation of GAF-based spatial information, temporal encoding, and adaptive attention-based feature fusion (Qin et al., 2024).

6. Limitations and Future Directions

Despite superior benchmark performance, GAF-FusionNet currently faces two primary limitations:

  • Generality to Clinical Settings: Experiments are limited to public benchmarks. Extending the model to real-world clinical ECG datasets—characterized by broader noise/artifact distributions—remains an open challenge.
  • Computational Overhead: The spatial branch (2D CNN on [1,1][-1,1]7 GAF images) and the dual-layer attention module introduce additional computational cost, particularly in large-scale deployment.

Future research directions include validation on prospective clinical and wearable-device ECG streams, exploration of lightweight attention and pruning techniques, and interpretability tools such as Grad-CAM and attention visualizations to facilitate clinical adoption (Qin et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GAF-FusionNet.