Papers
Topics
Authors
Recent
2000 character limit reached

ECG-aBcDe: Universal ECG Encoding

Updated 19 December 2025
  • ECG-aBcDe is a universal ECG encoding paradigm that converts continuous signals into a symbolic language by alternating lower-case voltage tokens with upper-case interval tokens for explicit time-scale representation.
  • It ensures architecture-agnostic integration with LLMs through bidirectional mapping, allowing clinical attention heatmaps to be extracted and enhancing interpretability.
  • Empirical evaluations demonstrate substantial BLEU-4 gains (2.8×–3.9×) and <1% amplitude distortion, confirming the method’s robustness and precise waveform reconstruction.

ECG-aBcDe is a universal electocardiogram (ECG) encoding paradigm that transcribes continuous ECG signals into a symbolic language interpretable by any LLM. The goal is to overcome historic limitations of model-dependence, loss of time-scale information, and poor interpretability in ECG-to-LLM pipelines. ECG-aBcDe achieves architecture-agnostic integration of ECG analysis with LLMs, bidirectional mapping between waveforms and symbolic representations, and explicit encoding of both voltage and timing information inherent to clinical ECG interpretation. Performance metrics indicate substantial gains over prior methods—specifically BLEU-4 improvements of 2.8×–3.9× in in-dataset and cross-dataset scenarios—without requiring any LLM architecture modification (Xia et al., 16 Sep 2025).

1. Motivation and Problem Scope

Prior ECG-to-LLM conversion methods suffer from three main deficiencies: (a) model dependence, wherein ECG encoders are tailored to a single text encoder and must be retrained for new LLMs (as with MIM or contrastive objectives); (b) inability to learn critical time-scale information, since Transformers struggle to internalize explicit interval measurements (e.g., QRS width, RR interval); and (c) weak interpretability, due to the black-box nature of neural encoders and lack of back-projection from token-level attention to waveform segments. ECG-aBcDe directly responds to all three:

  • It encodes ECGs as sequences of symbolic tokens—alternating lower-case (voltage) and upper-case (interval)—creating a universal “ECG language” that any LLM can process post–instruction tuning.
  • Time-scale data is mapped directly to distinct tokens, circumventing the Transformer’s limitations in positional encoding and “counting” tasks.
  • The mapping remains strictly bidirectional, allowing clinical attention heatmaps to be extracted by back-projecting token attention weights from the LLM to ECG segments, thereby enhancing interpretability (Xia et al., 16 Sep 2025).

2. Universal ECG Language: Vocabulary and Grammar

The ECG-aBcDe language leverages two distinct 26-character alphabets for representing each ECG lead:

  • Voltage alphabet AL={a,b,...,z}\mathcal{A}_L = \{a, b, ..., z\} encodes quantized amplitude at waveform key-points.
  • Interval alphabet AU={A,B,...,Z}\mathcal{A}_U = \{A, B, ..., Z\} represents quantized durations between consecutive key-points.

A typical encoded ECG sequence alternates:

xlang=v0i1v1i2v2iKvKx_{\text{lang}} = v_0\, i_1\, v_1\, i_2\, v_2\, \dots\, i_{K} v_{K}

where vjALv_j \in \mathcal{A}_L and ijAUi_j \in \mathcal{A}_U.

Partitioning and mapping procedures are strictly defined:

  • Voltage cut-points {Vj}\{V_j\} divide the centralized amplitude range into 25 bins, referencing sample percentiles.
  • Interval cut-points {Tm}\{T_m\} partition all observed span lengths similarly.
  • The encoder mappings are:

fV(xi)=AL[max{j:xiVj}],fT(Δi)=AU[max{m:ΔiTm}]f_V(x_i) = \mathcal{A}_L[\max\{j: x_i \leq V_j\}], \quad f_T(\Delta_i) = \mathcal{A}_U[\max\{m: \Delta_i \leq T_m\}]

  • Decoding employs:

gV(a)=Vindex(a),gT(Am)=Tmg_V(a_\ell) = V_{\mathrm{index}(a_\ell)},\quad g_T(A_m) = T_m

with conventions to avoid unbounded values (V25V_{25} and T25T_{25} are set to double their respective maximums).

3. Explicit Encoding of Time-Scale Information

Rather than traditional positional encoding, ECG-aBcDe utilizes upper-case interval tokens to make timing explicit within the representation. For each pair of adjacent waveform key-points at indices iji_j and ij+1i_{j+1} (with sampling rate FF), the duration is quantized as:

Δj=ij+1ijF\Delta_j = \frac{i_{j+1} - i_j}{F}

fT(Δj)=AU[max{m:ΔjTm}]f_T(\Delta_j) = \mathcal{A}_U[\max\{m: \Delta_j \leq T_m\}]

This construction directly addresses the established deficiency of Transformers in “counting” problems by representing all time intervals as first-class sequence elements, not simple position-dependent features (Xia et al., 16 Sep 2025).

4. Bidirectional Convertibility: Algorithms and Reconstruction

Both encoding (ECG \to ECG language) and decoding (ECG language \to ECG waveform) algorithms are explicitly documented:

Encoding:

1
2
3
4
5
6
7
8
9
Input: x  ℝ^{F·T} (one lead)
id = L1_Trend_Filter(x)       # key‐point indices
x_key = x[id]                # voltages at key‐points
x_lang = []
for j in 0|id|2:
  x_lang.append(f_V(x_key[j]))       # lowercase
  interval = id[j+1]  id[j]
  x_lang.append(f_T(interval))       # uppercase
return x_lang

Decoding (reconstruction):

1
2
3
4
5
6
7
8
9
10
11
Input: x_lang (length 2K+1)
m = 0; x[0] = g_V(x_lang[0])
for j in 1K:
  interval = g_T(x_lang[2j1])
  n = m + interval
  x[n] = g_V(x_lang[2j])
  # linear interpolation between m and n
  for k in (m+1)(n1):
    x[k] = x[m] + (km)/(nm) * (x[n]x[m])
  m = n
return x

Empirical evaluation demonstrates <1% amplitude distortion in reconstituted signals; visual morphology is preserved in peak-to-peak alignment, demonstrating information-theoretic sufficiency of the representation (Xia et al., 16 Sep 2025).

5. Dataset Construction and Model Training

A large hybrid dataset is constructed by pairing ECG language encodings with natural-language QA prompts auto-generated for each ECG sample:

  • Training sources: PTB-XL and MIMIC-IV, each with \sim20 000 10 s 12-lead ECGs.
  • For each ECG, seven question types are generated via ChatGPT, including single-choice, verification, and comparison tasks.
  • Training format includes the question type, QA prompt, ECG language sequence (bounded by <es>…<ed> tokens), and the answer.
  • The total dataset scales to approximately 153 000 samples per source.

Preprocessing steps encompass lead reordering, notch/bandpass filters, wavelet denoising, and resampling to 250 Hz.

Fine-tuning follows a "construct once, use anywhere" protocol: pretrained LLMs are instruction-tuned using LoRA adapters, freezing base text encoders. Hyperparameters include LoRA rank r=16r=16, α=32\alpha=32, AdamW optimizer, batch size 2, masked autoregressive cross-entropy loss

LSFT(θ)=1Ni=1Nt=P(i)+1S(i)logP(st(i)s<t(i);θ)L_{\mathrm{SFT}}(\theta) = -\frac{1}{N}\sum_{i=1}^N\sum_{t=|P^{(i)}|+1}^{|S^{(i)}|} \log P(s_t^{(i)}| s_{<t}^{(i)}; \theta)

The prompt/answer format generalizes across all examined LLM architectures (Llama 3.2-1B-Instruct, Gemma3, Qwen2.5) (Xia et al., 16 Sep 2025).

6. Interpretability via Attention Heatmaps

ECG-aBcDe provides full interpretability by leveraging bidirectional mapping for attention visualization:

  • Decoder-layer attention weights for each token are extracted.
  • Lowercase tokens are mapped to specific ECG key-points; uppercase tokens to intervals between key-points.
  • Attention is rendered as colored markers (key-points) and lines (segments) atop the reconstructed ECG waveform.

Empirical examples show that attention peaks correspond to clinically salient features—e.g., for SVT diagnosis, R-peak letters (such as "s,t,u,v") concentrate the highest attention weights, mirroring clinical prioritization of RR interval analysis (Xia et al., 16 Sep 2025).

7. Experimental Results and Comparative Performance

ECG-aBcDe demonstrates strong performance across metrics and scenarios:

Scenario BLEU-4 ROUGE-L METEOR
PTB-XL → PTB-XL 42.58 50.55 29.32
PTB-XL → MIMIC-IV 30.76 38.57 22.56
Gemma3-1B (PTB-XL) 44.14 52.61 30.33
Llama 3.2-1B (PTB-XL) 42.58 50.55 29.32
Qwen2.5-0.5B (PTB-XL) 34.14 47.93 24.63

Compared to prior methods (D-BETA, ECGBERT, MERL, ECG-Byte), ECG-aBcDe achieves 2.8× (in-distribution) and 3.9× (cross-dataset) BLEU-4 improvements. These results indicate high transferability, robustness to dataset shift, and suitability for universal instruction-tuning across LLMs (Xia et al., 16 Sep 2025). Ablation studies confirm that naive scaling of fine-tuning data does not enable Transformer models to accurately count long-range intervals, reinforcing the essential role of explicit time-token encodings.

8. Context and Implications

ECG-aBcDe establishes a paradigm for integrating continuous physiological signal analysis with the universal reasoning frameworks of LLMs. Its universal symbolic encoding, explicit time-scale representation, and bidirectional mapping overcome historic deficiencies in interpretability and transferability. As validated by substantial BLEU-4 improvements and successful cross-model adaptation, ECG-aBcDe provides a theoretically sufficient and practically scalable bridge between high-throughput ECG data and generalizable, interpretable LLM analysis (Xia et al., 16 Sep 2025).

A plausible implication is that this approach can inform analogous strategies in other time-series biomedical domains where architectural agnosticism and interpretability are required. This suggests further research into universal signal-to-language encodings and bidirectional mappings for additional physiological modalities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to ECG-aBcDe.