Papers
Topics
Authors
Recent
Search
2000 character limit reached

Brain-Informed Graph Transformer Readout

Updated 22 May 2026
  • The paper introduces BIGTR, a modular graph transformer readout that integrates spatio-temporal fMRI data for neurodegenerative disease classification.
  • It employs unique token augmentation with type and node identifiers to infuse explicit spatio-temporal context without traditional position embeddings.
  • Empirical results show significant performance gains over standard pooling methods, achieving high AUC and accuracy in various diagnostic tasks.

Brain-Informed Graph Transformer Readout (BIGTR) is a modular component within the Brain Tokenized Graph Transformer (Brain TokenGT) framework, developed for longitudinal brain functional connectome (FC) embedding. Its primary purpose is to provide interpretable, graph-level readout for multitype tokenized embeddings representing brain regions, spatial edges, and temporal edges derived from longitudinal fMRI data, specifically aimed at applications such as neurodegenerative disease diagnosis and prognosis (Dong et al., 2023).

1. Role within the Brain TokenGT Pipeline

BIGTR operates as a downstream module, ingesting embeddings produced by the Graph Invariant and Variant Embedding (GIVE) module. GIVE generates the following tokenized vectors:

  • Node embeddings xvx_v for each brain region vv at each time point tt.
  • Spatial-edge embeddings x(u,v)x_{(u,v)} for edges within each FC at each time point.
  • Temporal-edge embeddings x(v,v′)x_{(v,v')} that connect a node vv at time tt with itself at time t+1t+1.

BIGTR's function is to process this heterogeneous collection of tokens, infuse them with spatio-temporal context using uniquely designed identifier embeddings, project them into a unified embedding space, introduce an explicit [graph] token, and read out a graph-level representation via a Transformer encoder. The final representation is used for classification tasks such as MCI vs. Control, MCI conversion prediction, and amyloid status discrimination (Dong et al., 2023).

2. Token Construction and Augmentation

Each input token is represented as a concatenation of its original embedding, a type identifier, and node identifiers. Let hh denote the dimensionality of the GIVE embeddings. The augmentation mechanism is as follows:

  • Type Identifiers: A learnable matrix P∈R3×dpP \in \mathbb{R}^{3 \times d_p} contains three distinct vectors vv0 (for node tokens), vv1 (for spatial-edge tokens), vv2 (for temporal-edge tokens).
  • Node Identifiers: A fixed matrix vv3 contains one-hot or orthonormal vectors for each brain region--time pair.

Token augmentations:

  • For node token vv4, form vv5.
  • For spatial-edge token vv6, form vv7.
  • For temporal-edge token vv8 (linking vv9), form tt0.

All such augmented tokens are stacked into tt1. A learnable projection tt2 is applied: tt3. A learnable "[graph]" token tt4 is prepended to form the Transformer input tt5 [(Dong et al., 2023), Eqs. (4), (5)].

3. Transformer Encoder and Self-Attention Readout

The Transformer encoder processes the input tt6 through tt7 layers and tt8 attention heads. For each layer tt9 and head x(u,v)x_{(u,v)}0:

x(u,v)x_{(u,v)}1

Outputs across heads are concatenated and linearly transformed with x(u,v)x_{(u,v)}2. Each layer applies Add & Norm, a position-wise feed-forward network (FFN), and another Add & Norm:

x(u,v)x_{(u,v)}3

After x(u,v)x_{(u,v)}4 layers, the hidden state corresponding to the [graph] token, x(u,v)x_{(u,v)}5, is adopted as the graph-level embedding.

4. Readout Function and Classification

The final graph-level embedding x(u,v)x_{(u,v)}6 undergoes a linear transformation and a sigmoid to produce the predicted probability for the class of interest:

x(u,v)x_{(u,v)}7

The model is trained using binary cross-entropy loss for downstream tasks (Dong et al., 2023).

5. Spatio-Temporal Context Encoding

Unlike standard Transformer architectures, BIGTR does not employ sinusoidal or learned position embeddings. Instead, spatio-temporal context is directly encoded through the non-trainable node identifier x(u,v)x_{(u,v)}8, providing explicit information on both the brain region and time. The type identifier x(u,v)x_{(u,v)}9 conveys token category, ensuring the model can disambiguate between node, spatial-edge, and temporal-edge tokens. This explicit design encodes spatial, temporal, and type-specific token context without the need for position embeddings (Dong et al., 2023).

6. Interpretability Features

BIGTR supports interpretability through two avenues:

  • Token-level attention visualization is afforded by extracting the attention weights associated with the [graph] token. Specifically, for each head x(v,v′)x_{(v,v')}0, the row corresponding to the [graph] token in x(v,v′)x_{(v,v')}1 indicates which input tokens most strongly inform the graph-level representation. Averaging over heads and layers enables inspection of the ROIs and connections prioritized by the model on each instance.
  • Because tokens are augmented with explicit region, time, and type identifiers, attention diagnostics directly map to neurobiologically interpretable entities. This has enabled identification of salient regions such as parahippocampal edges in MCI conversion cases (Dong et al., 2023).

7. Empirical Performance and Hyperparameterization

In five-fold cross-validation (repeated five times), the full GIVE + BIGTR pipeline demonstrated superior performance compared to standard global pooling and ablated variants on longitudinal fMRI datasets:

Task AUC (%) Accuracy (%)
MCI vs. Control (ADNI HC vs. MCI) 90.48 (±4.99) 84.62 (±8.43)
MCI Conversion Prediction (OASIS-3) 87.14 (±7.16) 89.23 (±7.84)
Amyloid + vs. – Classification (OASIS-3) 94.60 (±4.96) 87.11 (±7.88)

These results indicate a substantial gain attributed to the richer token-level readout enabled by BIGTR. Default hyperparameters include: x(v,v′)x_{(v,v')}2 Transformer layers, x(v,v′)x_{(v,v')}3 attention heads, x(v,v′)x_{(v,v')}4 post-projection hidden size, x(v,v′)x_{(v,v')}5 type-ID dimension, x(v,v′)x_{(v,v')}6 node-ID dimension, feed-forward inner dimension of 512, and dropout of 0.1 on both attention and FFN modules (Dong et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Brain-Informed Graph Transformer Readout (BIGTR).