ASDFormer: Transformer for ASD Diagnosis

Updated 3 July 2026

ASDFormer is a transformer-based neuroimaging classifier that tokenizes ROI connectivity profiles for precise autism spectrum disorder diagnosis.
It utilizes a Mixture of Pooling-Classifier Experts decoder with sparse attention to highlight key brain regions driving ASD disruptions.
Empirical evaluation on the ABIDE dataset shows state-of-the-art performance with high AUROC, accuracy, and sensitivity, offering clear interpretability.

ASDFormer is a transformer-based neuroimaging classifier for robust autism spectrum disorder (ASD) diagnosis and biomarker discovery from resting-state fMRI connectome data. Specifically designed to resolve both diagnostic and interpretability challenges in functional connectome analysis, ASDFormer leverages attention-based modeling at the region-of-interest (ROI) level and introduces a Mixture of Pooling-Classifier Experts (MoE) decoding mechanism that pinpoints brain regions and network interactions most involved in autism-related disruptions (Izadi et al., 19 Aug 2025).

1. Functional Connectivity Representation and Tokenization

ASDFormer operates on second-order representations of brain function: for each subject, the resting-state fMRI time series are parcellated using the Craddock 200 atlas, yielding precisely $N=200$ ROIs grouped into 8 canonical functional communities. Pairwise Pearson correlations between all ROI time series produce symmetric $N \times N$ functional connectivity (FC) matrices per subject.

Each ROI is then represented as its full connectivity profile (i.e., a single row or column of the FC matrix), providing an $N$ -dimensional feature vector per ROI. These are collectively treated as a sequence of $N$ tokens per subject, embedding both local and global connectivity characteristics for each ROI. This tokenization strategy is critical for transformer-based modeling, as it exposes the model to patterning at the scale of both micro-connectivity (individual ROI relationships) and macro-connectivity (network/global dependencies).

2. Transformer Encoder Architecture

The central encoder is a standard transformer stack, adapted to the connectome domain. Each token (ROI profile) is projected by a shared multilayer perceptron (MLP) and layer-normalized, producing $N$ latent $d$ -dimensional embeddings per subject.

Successive self-attention and feedforward layers contextualize each ROI embedding by integrating patterns across all connected regions, exploiting the transformer's capacity for modeling non-local interactions. Standard multi-head attention, GELU activation, and residual normalization are employed, with all hyperparameters (dimension, number of heads, etc.) tuned for the connectomic scale.

The output is a contextualized set of $N$ ROI embeddings that encapsulate both the individual connectome structure and the mutual influences between brain regions, aligning with the hypothesis that ASD-linked pathology is distributed and network-based rather than reducible to isolated loci.

3. Mixture of Pooling-Classifier Experts (MoE) Decoder

A principal methodological contribution of ASDFormer is the Mixture of Experts decoder. Instead of relying on a single readout head for classification, ASDFormer introduces $E$ parallel "experts," each comprising:

A sparse attention-based pooling operator, which learns ROI-wise attention scores and selects the top- $k$ most informative ROIs post-encoding;
An expert-specific classifier MLP, consuming the pooled ROI representation to generate expert-specific logits for ASD vs control.

For a given subject, only the top- $k$ ROIs per expert are pooled, with attention scores normalized over the selected subset. The mixture is combined by a gating MLP network, which softmaxes over features reduced from all ROI embeddings to produce per-expert weights ( $N \times N$ 0). The final output is a weighted sum of the expert predictions: $N \times N$ 1 This architecture ensures that each expert specializes in different subpopulations or diagnostic presentations, with interpretability enforced by sparse selection of diagnosis-driving ROIs. The balance among experts is encouraged by a coefficient-of-variation regularization term, ensuring all experts contribute meaningfully.

4. Implementation: Data Pipeline, Training, and Evaluation

ASDFormer is evaluated on the ABIDE rs-fMRI dataset (1009 subjects, 51.14% ASD, 17 sites), preprocessed via the CPAC pipeline (bandpass 0.01–0.1 Hz, no global signal regression), with Craddock 200 atlas parcellation. Model development employs stratified sampling: 70% train, 10% validation, 20% test. The transformer is implemented in PyTorch with typical settings of 8 heads, 200-d embedding, Adam optimizer, and batch size of 64.

Training minimizes cross-entropy classification loss plus expert balance regularization, with early stopping on validation AUROC. Five independent runs are used for statistical robustness. Main metrics are AUROC, accuracy, sensitivity, and specificity.

5. Diagnostic Results and Empirical Performance

ASDFormer establishes new state-of-the-art performance among tested baselines:

Model	AUC	Accuracy	Sensitivity	Specificity
FBNETGNN	72.64 ± 8.91	65.60 ± 9.72	62.19 ± 12.10	67.53 ± 13.86
BrainNetCNN	76.45 ± 4.54	68.90 ± 5.48	66.45 ± 13.83	70.99 ± 7.26
BrainNetTF	77.58 ± 6.40	68.00 ± 5.05	77.95 ± 13.62	58.89 ± 20.12
Com-BrainTF	78.77 ± 3.89	69.60 ± 4.07	74.50 ± 8.73	65.76 ± 10.16
CNN-FC	72.91 ± 5.66	66.99 ± 7.68	71.27 ± 7.23	62.33 ± 10.16
ASDFormer	81.17 ± 5.00	74.60 ± 4.83	82.55 ± 10.19	66.09 ± 4.74

ASDFormer surpasses alternatives in overall AUROC, accuracy, and especially sensitivity, supporting its claim of state-of-the-art ASD connectomic classification (Izadi et al., 19 Aug 2025).

6. Interpretability and Biomarker Discovery

Interpretability is explicit in ASDFormer. Each expert's diagnosis can be directly explained by the subset of ROIs it selects via sparse attention pooling. The transformer’s attention matrices provide additional information on how ROI features influence each other in the network context. Gating weights reveal subject-specific expert specializations: for example, some experts' predictions are heavily favored for ASD cases, others for typical controls.

Key findings include:

Repeated implicature of the default mode network (DMN), sensorimotor network (SMN), and limbic systems in ASD predictions.
Recurring identification of intra-network (e.g., DMN–DMN) and inter-network (SMN–FPN, SMN–DMN, DMN–FPN, DMN–Limbic) connectivity disruptions.
The interpretability mechanism allows visualization of the precise ROIs and connectivity patterns that drive individual classifications, and supports alignment with established ASD network pathophysiology—especially DMN organization and sensorimotor/affective network interplay.

ASDFormer differs fundamentally from related transformer models such as METAFormer (Mahler et al., 2023) and BrainTWT (Piriyasatit et al., 16 Mar 2025).

METAFormer operates as an ensemble of BERT-style encoders on vectorized multi-atlas connectomes, with self-supervised masked-value pretraining but with no explicit MoE decoding or token-level interpretability. It achieves higher accuracy (0.837) and AUC (0.832) on ABIDE I but does not deploy a transformer-based sparse-pooling decoder.
BrainTWT focuses on dynamic connectome evolution, modeling temporal random walks with transformer encoders and self-supervised loss, targeting dynamic biomarker identification. Its best AUC is 0.7527, lower than ASDFormer’s.

ASDFormer’s uniqueness is in its joint modeling of global connectome context (via transformer self-attention), fine-grained ROI selection (via MoE decoding), and direct integration of interpretability into the classification process, together with competitive or superior diagnostic accuracy compared to prior alternatives (Izadi et al., 19 Aug 2025). Limitations include reliance on Pearson-correlation FC matrices, restriction to ABIDE data, and possible sensitivity to parameterization of expertise (e.g., choice of $N \times N$ 2 and number of experts).

References

ASDFormer: "ASDFormer: A Transformer with Mixtures of Pooling-Classifier Experts for Robust Autism Diagnosis and Biomarker Discovery" (Izadi et al., 19 Aug 2025)
METAFormer: "Pretraining is All You Need: A Multi-Atlas Enhanced Transformer Framework for Autism Spectrum Disorder Classification" (Mahler et al., 2023)
BrainTWT: "ASD Classification on Dynamic Brain Connectome using Temporal Random Walk with Transformer-based Dynamic Network Embedding" (Piriyasatit et al., 16 Mar 2025)