Brain-Informed Graph Transformer Readout
- The paper introduces BIGTR, a modular graph transformer readout that integrates spatio-temporal fMRI data for neurodegenerative disease classification.
- It employs unique token augmentation with type and node identifiers to infuse explicit spatio-temporal context without traditional position embeddings.
- Empirical results show significant performance gains over standard pooling methods, achieving high AUC and accuracy in various diagnostic tasks.
Brain-Informed Graph Transformer Readout (BIGTR) is a modular component within the Brain Tokenized Graph Transformer (Brain TokenGT) framework, developed for longitudinal brain functional connectome (FC) embedding. Its primary purpose is to provide interpretable, graph-level readout for multitype tokenized embeddings representing brain regions, spatial edges, and temporal edges derived from longitudinal fMRI data, specifically aimed at applications such as neurodegenerative disease diagnosis and prognosis (Dong et al., 2023).
1. Role within the Brain TokenGT Pipeline
BIGTR operates as a downstream module, ingesting embeddings produced by the Graph Invariant and Variant Embedding (GIVE) module. GIVE generates the following tokenized vectors:
- Node embeddings for each brain region at each time point .
- Spatial-edge embeddings for edges within each FC at each time point.
- Temporal-edge embeddings that connect a node at time with itself at time .
BIGTR's function is to process this heterogeneous collection of tokens, infuse them with spatio-temporal context using uniquely designed identifier embeddings, project them into a unified embedding space, introduce an explicit [graph] token, and read out a graph-level representation via a Transformer encoder. The final representation is used for classification tasks such as MCI vs. Control, MCI conversion prediction, and amyloid status discrimination (Dong et al., 2023).
2. Token Construction and Augmentation
Each input token is represented as a concatenation of its original embedding, a type identifier, and node identifiers. Let denote the dimensionality of the GIVE embeddings. The augmentation mechanism is as follows:
- Type Identifiers: A learnable matrix contains three distinct vectors 0 (for node tokens), 1 (for spatial-edge tokens), 2 (for temporal-edge tokens).
- Node Identifiers: A fixed matrix 3 contains one-hot or orthonormal vectors for each brain region--time pair.
Token augmentations:
- For node token 4, form 5.
- For spatial-edge token 6, form 7.
- For temporal-edge token 8 (linking 9), form 0.
All such augmented tokens are stacked into 1. A learnable projection 2 is applied: 3. A learnable "[graph]" token 4 is prepended to form the Transformer input 5 [(Dong et al., 2023), Eqs. (4), (5)].
3. Transformer Encoder and Self-Attention Readout
The Transformer encoder processes the input 6 through 7 layers and 8 attention heads. For each layer 9 and head 0:
1
Outputs across heads are concatenated and linearly transformed with 2. Each layer applies Add & Norm, a position-wise feed-forward network (FFN), and another Add & Norm:
3
After 4 layers, the hidden state corresponding to the [graph] token, 5, is adopted as the graph-level embedding.
4. Readout Function and Classification
The final graph-level embedding 6 undergoes a linear transformation and a sigmoid to produce the predicted probability for the class of interest:
7
The model is trained using binary cross-entropy loss for downstream tasks (Dong et al., 2023).
5. Spatio-Temporal Context Encoding
Unlike standard Transformer architectures, BIGTR does not employ sinusoidal or learned position embeddings. Instead, spatio-temporal context is directly encoded through the non-trainable node identifier 8, providing explicit information on both the brain region and time. The type identifier 9 conveys token category, ensuring the model can disambiguate between node, spatial-edge, and temporal-edge tokens. This explicit design encodes spatial, temporal, and type-specific token context without the need for position embeddings (Dong et al., 2023).
6. Interpretability Features
BIGTR supports interpretability through two avenues:
- Token-level attention visualization is afforded by extracting the attention weights associated with the [graph] token. Specifically, for each head 0, the row corresponding to the [graph] token in 1 indicates which input tokens most strongly inform the graph-level representation. Averaging over heads and layers enables inspection of the ROIs and connections prioritized by the model on each instance.
- Because tokens are augmented with explicit region, time, and type identifiers, attention diagnostics directly map to neurobiologically interpretable entities. This has enabled identification of salient regions such as parahippocampal edges in MCI conversion cases (Dong et al., 2023).
7. Empirical Performance and Hyperparameterization
In five-fold cross-validation (repeated five times), the full GIVE + BIGTR pipeline demonstrated superior performance compared to standard global pooling and ablated variants on longitudinal fMRI datasets:
| Task | AUC (%) | Accuracy (%) |
|---|---|---|
| MCI vs. Control (ADNI HC vs. MCI) | 90.48 (±4.99) | 84.62 (±8.43) |
| MCI Conversion Prediction (OASIS-3) | 87.14 (±7.16) | 89.23 (±7.84) |
| Amyloid + vs. – Classification (OASIS-3) | 94.60 (±4.96) | 87.11 (±7.88) |
These results indicate a substantial gain attributed to the richer token-level readout enabled by BIGTR. Default hyperparameters include: 2 Transformer layers, 3 attention heads, 4 post-projection hidden size, 5 type-ID dimension, 6 node-ID dimension, feed-forward inner dimension of 512, and dropout of 0.1 on both attention and FFN modules (Dong et al., 2023).