Brain TokenGT: Graph Transformer for FC Analysis
- Brain TokenGT is an end-to-end framework that models the spatio-temporal evolution of brain functional connectomes for neurodegenerative diagnosis.
- It employs invariant node embeddings and variant edge embeddings to capture stable regional features and dynamic connectivity changes over time.
- Experimental results indicate significantly improved prediction of MCI and Alzheimer’s progression, with enhanced interpretability through HyperDrop edge selection.
Brain Tokenized Graph Transformer (Brain TokenGT) is an end-to-end interpretable framework designed to embed and analyze longitudinal brain functional connectome (FC) data for neurodegenerative disease diagnosis and prognosis. Diverging from traditional snapshot-based graph neural network (GNN) methods, Brain TokenGT directly models the spatio-temporal trajectories of FC evolution observed in longitudinal resting-state fMRI, enabling predictions that reflect disease progression at a fine-grained temporal and network level (Dong et al., 2023).
1. Motivation and Problem Formulation
In network-based neurodegeneration, brain functional connectomes are represented as weighted graphs: nodes correspond to predefined cortical and subcortical regions of interest (ROIs), edges encode the pairwise correlations of regional resting-state fMRI time series, typically using Pearson correlation. Conventional GNNs operate on a single FC "snapshot," disregarding the temporal evolution of connectivity that typifies neurodegenerative progression (e.g., emergence of Alzheimer’s disease from amyloid-positive but cognitively normal to mild cognitive impairment (MCI) to dementia).
Brain TokenGT addresses the need to capture both spatial topology and temporal trajectory by embedding longitudinal sequences
with fixed node set and time-varying adjacency matrices . The objective is to map each subject’s trajectory into a prediction (diagnosis, conversion risk), leveraging temporal patterns in FC evolution that are inaccessible to snapshot pooling or per-visit classifiers.
2. Graph Invariant and Variant Embedding (GIVE)
GIVE consists of two intertwined streams:
- Invariant Node Embedding (INE): Models the evolution of each ROI’s features while preserving node identity across time.
- Variant Edge Embedding (VEE): Explicitly encodes spatio-temporal changes in edge connectivity, both within and between visits.
2.1 Invariant Node Embedding (INE)
Each node is characterized over time by its dynamic -hop neighborhood subgraphs. The evolution of GCN parameters for node is modeled as a recurrent GRU updating the -th GCN layer’s weights: After layers, the output 0 constitutes a spatio-temporal embedding 1, preserving identity invariance while capturing evolving features.
2.2 Variant Edge Embedding (VEE)
A "giant graph" 2 is assembled with vertex set 3 for all nodes and time points, supporting two classes of edges:
- Spatial edges (4): connect 5 and 6 with weights 7 (within-visit).
- Temporal edges (8): connect 9 and 0 (between-visit).
A dual hypergraph transformation (DHT) swaps nodes and edges, enabling edge-centric embedding via hypergraph convolution: 1 where 2 is a learnable diagonal hyperedge-weight matrix. After 3 layers, each dual-node corresponds to an edge embedding 4.
For interpretability, HyperDrop selects the most salient edges by ranking edge scores after an additional hypergraph-convolution layer and thresholding the top-5 edges by type, facilitating the identification of diagnostically critical connections.
2.3 Embedding Tokenization
Each output from GIVE is regarded as a distinct token:
- Node tokens: 6
- Spatial edge tokens: 7
- Temporal edge tokens: 8
Tokens are concatenated with type and node identifiers for subsequent transformer-based integration.
3. Brain Informed Graph Transformer Readout (BIGTR)
BIGTR processes GIVE’s tokenized embeddings using a Transformer encoder, augmented by auxiliary identifiers to distinguish token type and absolute position.
3.1 Type and Node Identifiers
Trainable Type Identifiers (9) designate whether a token is a node, spatial edge, or temporal edge. Node Identifiers (0) are fixed, orthonormal positional vectors unique to each node–time tuple.
Token vectors are constructed as: 1
2
3
A learned projection maps these vectors to a common hidden dimension, and a special "graph token" precedes all others to enable global aggregation.
3.2 Transformer Architecture and Attention
A stack of 4 Transformer encoder layers processes the token sequence using multi-head self-attention (5 heads). Queries, keys, and values are computed via learned projections, and scaled dot-product attention is applied per head. Layer outputs are combined, normalized, and passed through feedforward sub-networks.
No separate position encodings are required—the node identifiers 6 inject both structural and temporal locality.
3.3 Final Prediction
The representation of the special graph token after the final Transformer layer serves as the global trajectory embedding, linearly decoded via sigmoid or softmax for disease state prediction: 7
4. Training Procedures and Interpretability
The framework is trained end-to-end with a binary cross-entropy loss over labeled subjects: 8 HyperDrop provides built-in interpretability at the edge level by dropping negligible edges and retaining those contributing most to discrimination. No auxiliary regularization beyond Transformer dropout and weight decay is utilized.
5. Experimental Setup and Quantitative Results
Experiments employed two longitudinal resting-state fMRI datasets:
| Dataset | Task | Groups | N (per group) | Visits/Subject |
|---|---|---|---|---|
| ADNI | HC vs MCI | HC, MCI | 65, 60 | 2–3 |
| OASIS-3 | MCI non-converter vs converter | Non-conv., Conv. | 31, 29 | 2–3 |
| OASIS-3 | Amyloid+ CN vs Amyloid– CN | Amyloid+, Amyloid– | 41, 50 | 2–3 |
All FCs used 90 AAL atlas ROIs; Pearson correlations formed the adjacency matrices. Standard preprocessing pipelines were applied. GIVE utilized 2 evolving-GCN and 2 hypergraph-convolution layers (9), BIGTR used 4-layer, 8-head transformer (0), with trainable identifier dimensions (1). Adam optimizer (2, weight decay 3) and five-fold cross-validation were applied.
Brain TokenGT substantially outperformed benchmarks:
- HC vs MCI: Shallow models AUC ≈ 60%, single-snapshot GNNs ≈ 68%, dynamic deep models ≈ 82%, Brain TokenGT: AUC = 90.48% ± 4.99, Accuracy = 84.62% ± 8.43 (p < 0.05 vs all)
- MCI Conversion, Amyloid Classification: AUCs of 87.14% and 94.60%, respectively
6. Interpretability and Clinical Relevance
HyperDrop yielded edge-scores localizing the most discriminative FC connections and temporal changes. The top-ranked discriminative edges for AD progression implicated parahippocampal, temporal, and orbitofrontal ROIs, aligning with established early Alzheimer’s network breakdowns; superior frontal ROIs were prominent for amyloid classification.
Clinically, Brain TokenGT has utility in stratifying at-risk individuals at predementia stages, including both amyloid-positive cognitively normal and MCI populations, with high interpretability in network change attribution and suitability for small-scale longitudinal clinical datasets.
7. Limitations and Prospective Directions
Brain TokenGT currently assumes uniformly spaced visits and does not explicitly model inter-visit interval variation or temporal uncertainty. Prospective enhancements could incorporate continuous-time positional encodings or uncertainty modeling. The interpretability provided by HyperDrop is binary (retain/drop edge-level significance), suggesting further development toward time-specific and node-level saliency metrics. Application to larger, more heterogeneous longitudinal datasets is a plausible next step, and integration with real-world clinical workflow is supported by feasibility on limited-visit cohorts (Dong et al., 2023).