Papers
Topics
Authors
Recent
Search
2000 character limit reached

Brain TokenGT: Graph Transformer for FC Analysis

Updated 22 May 2026
  • Brain TokenGT is an end-to-end framework that models the spatio-temporal evolution of brain functional connectomes for neurodegenerative diagnosis.
  • It employs invariant node embeddings and variant edge embeddings to capture stable regional features and dynamic connectivity changes over time.
  • Experimental results indicate significantly improved prediction of MCI and Alzheimer’s progression, with enhanced interpretability through HyperDrop edge selection.

Brain Tokenized Graph Transformer (Brain TokenGT) is an end-to-end interpretable framework designed to embed and analyze longitudinal brain functional connectome (FC) data for neurodegenerative disease diagnosis and prognosis. Diverging from traditional snapshot-based graph neural network (GNN) methods, Brain TokenGT directly models the spatio-temporal trajectories of FC evolution observed in longitudinal resting-state fMRI, enabling predictions that reflect disease progression at a fine-grained temporal and network level (Dong et al., 2023).

1. Motivation and Problem Formulation

In network-based neurodegeneration, brain functional connectomes are represented as weighted graphs: nodes correspond to predefined cortical and subcortical regions of interest (ROIs), edges encode the pairwise correlations of regional resting-state fMRI time series, typically using Pearson correlation. Conventional GNNs operate on a single FC "snapshot," disregarding the temporal evolution of connectivity that typifies neurodegenerative progression (e.g., emergence of Alzheimer’s disease from amyloid-positive but cognitively normal to mild cognitive impairment (MCI) to dementia).

Brain TokenGT addresses the need to capture both spatial topology and temporal trajectory by embedding longitudinal sequences

G={G1,G2,…,GT},Gt=(V,E,At),  At∈RM×M\mathcal{G} = \{G_{1}, G_{2}, \dots, G_{T}\}, \quad G_{t}=(V,E,A_{t}),\; A_{t}\in\mathbb R^{M\times M}

with fixed node set V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\} and time-varying adjacency matrices AtA_t. The objective is to map each subject’s trajectory into a prediction y^s\hat y_{s} (diagnosis, conversion risk), leveraging temporal patterns in FC evolution that are inaccessible to snapshot pooling or per-visit classifiers.

2. Graph Invariant and Variant Embedding (GIVE)

GIVE consists of two intertwined streams:

  • Invariant Node Embedding (INE): Models the evolution of each ROI’s features while preserving node identity across time.
  • Variant Edge Embedding (VEE): Explicitly encodes spatio-temporal changes in edge connectivity, both within and between visits.

2.1 Invariant Node Embedding (INE)

Each node viv_i is characterized over time by its dynamic KK-hop neighborhood subgraphs. The evolution of GCN parameters for node viv_i is modeled as a recurrent GRU updating the ll-th GCN layer’s weights: Wi,tl=GRU(Hi,tl, Wi,t−1l),Hi,tl+1=GCN(Ai,t, Hi,tl, Wi,tl)W^{l}_{i,t} = \mathrm{GRU}(H^{l}_{i,t},\,W^{l}_{i,t-1}), \qquad H^{l+1}_{i,t} = \mathrm{GCN}(A_{i,t},\,H^{l}_{i,t},\,W^{l}_{i,t}) After LL layers, the output V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}0 constitutes a spatio-temporal embedding V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}1, preserving identity invariance while capturing evolving features.

2.2 Variant Edge Embedding (VEE)

A "giant graph" V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}2 is assembled with vertex set V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}3 for all nodes and time points, supporting two classes of edges:

  • Spatial edges (V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}4): connect V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}5 and V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}6 with weights V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}7 (within-visit).
  • Temporal edges (V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}8): connect V={v1,…,vM}V = \{v_{1},\ldots,v_{M}\}9 and AtA_t0 (between-visit).

A dual hypergraph transformation (DHT) swaps nodes and edges, enabling edge-centric embedding via hypergraph convolution: AtA_t1 where AtA_t2 is a learnable diagonal hyperedge-weight matrix. After AtA_t3 layers, each dual-node corresponds to an edge embedding AtA_t4.

For interpretability, HyperDrop selects the most salient edges by ranking edge scores after an additional hypergraph-convolution layer and thresholding the top-AtA_t5 edges by type, facilitating the identification of diagnostically critical connections.

2.3 Embedding Tokenization

Each output from GIVE is regarded as a distinct token:

  • Node tokens: AtA_t6
  • Spatial edge tokens: AtA_t7
  • Temporal edge tokens: AtA_t8

Tokens are concatenated with type and node identifiers for subsequent transformer-based integration.

3. Brain Informed Graph Transformer Readout (BIGTR)

BIGTR processes GIVE’s tokenized embeddings using a Transformer encoder, augmented by auxiliary identifiers to distinguish token type and absolute position.

3.1 Type and Node Identifiers

Trainable Type Identifiers (AtA_t9) designate whether a token is a node, spatial edge, or temporal edge. Node Identifiers (y^s\hat y_{s}0) are fixed, orthonormal positional vectors unique to each node–time tuple.

Token vectors are constructed as: y^s\hat y_{s}1

y^s\hat y_{s}2

y^s\hat y_{s}3

A learned projection maps these vectors to a common hidden dimension, and a special "graph token" precedes all others to enable global aggregation.

3.2 Transformer Architecture and Attention

A stack of y^s\hat y_{s}4 Transformer encoder layers processes the token sequence using multi-head self-attention (y^s\hat y_{s}5 heads). Queries, keys, and values are computed via learned projections, and scaled dot-product attention is applied per head. Layer outputs are combined, normalized, and passed through feedforward sub-networks.

No separate position encodings are required—the node identifiers y^s\hat y_{s}6 inject both structural and temporal locality.

3.3 Final Prediction

The representation of the special graph token after the final Transformer layer serves as the global trajectory embedding, linearly decoded via sigmoid or softmax for disease state prediction: y^s\hat y_{s}7

4. Training Procedures and Interpretability

The framework is trained end-to-end with a binary cross-entropy loss over labeled subjects: y^s\hat y_{s}8 HyperDrop provides built-in interpretability at the edge level by dropping negligible edges and retaining those contributing most to discrimination. No auxiliary regularization beyond Transformer dropout and weight decay is utilized.

5. Experimental Setup and Quantitative Results

Experiments employed two longitudinal resting-state fMRI datasets:

Dataset Task Groups N (per group) Visits/Subject
ADNI HC vs MCI HC, MCI 65, 60 2–3
OASIS-3 MCI non-converter vs converter Non-conv., Conv. 31, 29 2–3
OASIS-3 Amyloid+ CN vs Amyloid– CN Amyloid+, Amyloid– 41, 50 2–3

All FCs used 90 AAL atlas ROIs; Pearson correlations formed the adjacency matrices. Standard preprocessing pipelines were applied. GIVE utilized 2 evolving-GCN and 2 hypergraph-convolution layers (y^s\hat y_{s}9), BIGTR used 4-layer, 8-head transformer (viv_i0), with trainable identifier dimensions (viv_i1). Adam optimizer (viv_i2, weight decay viv_i3) and five-fold cross-validation were applied.

Brain TokenGT substantially outperformed benchmarks:

  • HC vs MCI: Shallow models AUC ≈ 60%, single-snapshot GNNs ≈ 68%, dynamic deep models ≈ 82%, Brain TokenGT: AUC = 90.48% ± 4.99, Accuracy = 84.62% ± 8.43 (p < 0.05 vs all)
  • MCI Conversion, Amyloid Classification: AUCs of 87.14% and 94.60%, respectively

6. Interpretability and Clinical Relevance

HyperDrop yielded edge-scores localizing the most discriminative FC connections and temporal changes. The top-ranked discriminative edges for AD progression implicated parahippocampal, temporal, and orbitofrontal ROIs, aligning with established early Alzheimer’s network breakdowns; superior frontal ROIs were prominent for amyloid classification.

Clinically, Brain TokenGT has utility in stratifying at-risk individuals at predementia stages, including both amyloid-positive cognitively normal and MCI populations, with high interpretability in network change attribution and suitability for small-scale longitudinal clinical datasets.

7. Limitations and Prospective Directions

Brain TokenGT currently assumes uniformly spaced visits and does not explicitly model inter-visit interval variation or temporal uncertainty. Prospective enhancements could incorporate continuous-time positional encodings or uncertainty modeling. The interpretability provided by HyperDrop is binary (retain/drop edge-level significance), suggesting further development toward time-specific and node-level saliency metrics. Application to larger, more heterogeneous longitudinal datasets is a plausible next step, and integration with real-world clinical workflow is supported by feasibility on limited-visit cohorts (Dong et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Brain TokenGT.