MATCH-AD: Adaptive Transport Clustering for AD

Updated 26 December 2025

The paper introduces MATCH-AD, which fuses deep representation learning, graph-based label propagation, and optimal transport clustering to extract disease-relevant signals from heterogeneous neuroimaging data.
It achieves robust quantification of Alzheimer’s progression with strong theoretical guarantees and near-perfect diagnostic reliability under limited label availability.
Extensive evaluations demonstrate that MATCH-AD significantly outperforms baseline methods in accuracy and Cohen's kappa, enhancing clinical interpretability and decision-making.

Multi-view Adaptive Transport Clustering for Heterogeneous Alzheimer’s Disease (MATCH-AD) is a semi-supervised learning framework designed to address diagnostic challenges in Alzheimer’s disease (AD) using heterogeneous neuroimaging datasets with limited ground truth annotations. By integrating deep representation learning, graph-based label propagation, and optimal transport clustering, MATCH-AD enables the extraction of disease-relevant structure, label-efficient classification, and explicit quantification of disease progression, with strong theoretical guarantees and empirical superiority over existing methods (Moayedikia et al., 19 Dec 2025).

1. Overview of MATCH-AD Framework

MATCH-AD is constructed as a unified, alternating-minimization pipeline comprising three tightly coupled modules:

Deep Representation Learning: Heterogeneous input data—including structural MRI features ( $X_{MRI}\in\mathbb{R}^{n\times190}$ ), CSF biomarkers ( $X_{CSF}\in\mathbb{R}^{n\times6}$ ), and clinical/demographic variables ( $X_{demo}\in\mathbb{R}^{n\times23}$ )—are preprocessed by kNN imputation (k=5) and robust scaling, then concatenated into a tensor $X\in\mathbb{R}^{n\times219}$ . An autoencoder $f_\theta,g_\phi$ with encoder-decoder architecture learns a latent representation $Z=f_\theta(X)\in\mathbb{R}^{n\times32}$ designed to preserve disease-relevant manifold structure.
Graph-based Label Propagation: A kNN graph is built over $Z$ , constructing an affinity matrix $W$ from Gaussian kernels and normalizing to a similarity matrix $S=D^{-1/2} W D^{-1/2}$ . This graph is used to propagate sparse clinical labels ( $\sim$ 29% labeled) to the majority of the cohort.
Optimal Transport Clustering: Using the propagated multi-class labels, samples are partitioned into disease stages. Entropically regularized Wasserstein distances are computed between empirical distributions at adjoining stages, providing a continuous, metric-based quantification of disease progression using the Sinkhorn algorithm.

All modules contribute to a joint training objective, with alternating updates for representations, label distributions, and transport plans.

2. Mathematical Foundation

2.1 Input Views and Shared Embedding

Let $X^{(1)} = X_{MRI}$ , $X^{(2)} = X_{CSF}$ , $X^{(3)} = X_{demo}$ ; $X = [X^{(1)}, X^{(2)}, X^{(3)}] \in \mathbb{R}^{n\times219}$
Encoder: $f_\theta: \mathbb{R}^{219} \to \mathbb{R}^{32}$
Decoder: $g_\phi: \mathbb{R}^{32} \to \mathbb{R}^{219}$
Latent representation: $Z = f_\theta(X)$

2.2 Graph Construction

Pairwise Euclidean distance matrix in latent space: $D_{ij} = \|z_i - z_j\|_2$
For kNN graph, affinity weights:

$w_{ij} = \begin{cases} \exp\left(-D_{ij}^2/(2\sigma_i\sigma_j)\right), & j\in\mathcal{N}_k(i)\;\text{or}\; i\in\mathcal{N}_k(j)\ 0, & \text{otherwise} \end{cases}$

with $\sigma_i$ the mean distance to $i$ ’s $k$ -th neighbor.

2.3 Label Propagation

Label matrix $Y \in \mathbb{R}^{n\times c}$ : labeled rows one-hot, unlabeled rows uniform ($1/c$)
Propagation update:

$F^{(t+1)} = \alpha S F^{(t)} + (1-\alpha) Y,\quad \alpha\in[0,1)$

This admits a closed-form solution:

$F^* = (1-\alpha)(I-\alpha S)^{-1}Y$

2.4 Optimal Transport

Successive disease stages $s_i$ , $s_{i+1}$ induce empirical distributions over $Z$ ; 2-Wasserstein distance:

$W_2(\mu_{s_i}, \mu_{s_{i+1}}) = \min_{T\ge0,\, T\mathbf{1}=\mathbf{a},\, T^T\mathbf{1}=\mathbf{b}} \langle T, C \rangle - \lambda H(T)$

where $C_{jk} = \|z_j^{(i)} - z_k^{(i+1)}\|_2^2$ and $H(T)$ is the entropy regularizer.

3. Training Objectives and Optimization

MATCH-AD is optimized by minimizing a composite loss:

$\mathcal{L}_{total} = \mathcal{L}_{AE} + \beta_1 \mathcal{L}_{prop} + \beta_2 \mathcal{L}_{OT} + \beta_3 \mathcal{L}_{smooth}$

Autoencoder Loss $\mathcal{L}_{AE}$ :

$\mathcal{L}_{AE} = \frac{1}{n}\sum_{i=1}^n \|x_i - \hat{x}_i\|_2^2 + \lambda_1 (\|\theta\|_2^2+\|\phi\|_2^2) + \lambda_2\,\mathrm{KL}(z_i\|N(0,I))$

Propagation Loss $\mathcal{L}_{prop}$ :

$\mathcal{L}_{prop} = \|F - \alpha S F - (1-\alpha)Y\|_F^2$

Optimal Transport Loss $\mathcal{L}_{OT}$ : sum of Wasserstein distances
Smoothness Loss $\mathcal{L}_{smooth}$ : promotes local consistency of soft label distributions

Algorithmic optimization proceeds by pretraining the autoencoder, followed by alternating updates of labels, transport plans, and representations until convergence.

4. Theoretical Analysis

MATCH-AD is accompanied by several theoretical guarantees:

Convergence of Label Propagation: The label propagation update converges geometrically for $\alpha\in[0,1)$ , with

$\|F^{(t)}-F^*\|_F \le \alpha^t\|F^{(0)}-F^*\|_F$

Label Consistency Bound: Under appropriate manifold assumptions, the expected label error is bounded:

$1-\mathbb{E}[\mathbf{1}\{\hat{y}_i\neq y_i\}] \ge 1 - O\left(\frac{m\delta^2}{k}+\alpha^T\right)$

where $m$ is the intrinsic dimension and $\delta$ the geodesic error.

Stability of Wasserstein Distance: Empirical Wasserstein distances are stable to sampling, with

$|W_2(\mu_n,\nu_n) - W_2(\mu,\nu)| \le O(n^{-1/(2m)})$

Global Convergence: Alternating minimization is guaranteed to reach a stationary point (under Lipschitz and boundedness assumptions).
Sample Complexity: To achieve $\epsilon$ -accurate label propagation with probability $1-\delta$ requires

$|\mathcal{L}| = O\left(\frac{c}{\epsilon^2}\log\left(\frac{n}{\delta}\right)\,\mathrm{poly}(k,1/\alpha)\right)$

5. Experimental Evaluation

5.1 Data and Preprocessing

Cohort: $n=4,968$ subjects from National Alzheimer’s Coordinating Center; 219 total features per subject after integration
Semi-supervised regime: 29.1% labeled for training/testing, 70.9% unlabeled
Class distribution (labeled subset): Normal (63.1%), Impaired-not-MCI (3.7%), MCI (19.9%), Dementia (13.3%)
Preprocessing: $>$ 50% missing—excluded; remaining imputed by kNN (k=5)
Train/test split: 80/20 stratified on labeled data

5.2 Performance under Label Scarcity

Labeled Fraction	Accuracy (\%)	Cohen’s $\kappa$
5\% ( $\approx$ 70)	59.1 ± 1.8	0.159
30\%	72.6 ± 0.6	0.430
50\%	81.1 ± 0.7	0.614
80\%	91.9 ± 0.3	0.842
100\% ( $n=289$ test set)	98.4	0.970

MATCH-AD retains clinically meaningful κ ( $>$ 0.4) with only 30% labels, reaching “almost perfect agreement” (κ $>$ 0.8) at 80% (Moayedikia et al., 19 Dec 2025).

5.3 Baseline Comparisons and Ablations

Best baseline (SelfTraining_SVM): 71.3% accuracy, κ=0.329
MATCH-AD: 98.4% accuracy, κ=0.970, F1=0.976
SelfTraining_SVM underperforms on minority classes, while MATCH-AD maintains F1 $>$ 0.86 across all classes (“Normal”: 98.9%, “Impaired-not-MCI”: 90.9%, “MCI”: 86.2%, “Dementia”: 97.4%)
Removal of the autoencoder collapses κ to zero/negative; removal of optimal transport impairs modeling of progression but weakly affects classification performance

5.4 Hyperparameter Sensitivity

Propagation $\alpha$ : peak performance at $\alpha\approx0.2$ –$0.3$
Neighborhood $k$ : plateau of robust performance for $k\in[10,20]$ (default: $k=15$ )
Outer iterations $T_{outer}\approx10$ –20 for convergence

6. Clinical Relevance and Interpretability

Accuracy alone understates performance under class imbalance; Cohen’s κ provides a stricter criterion
MATCH-AD illustrates resource allocation tradeoffs: moderate agreement (κ $>$ 0.4) attainable with only ~30% labeled subjects, substantial (κ $>$ 0.6) at ~50–60%, and almost perfect (κ $>$ 0.8) at ~80%
Minority/early-stage classes (Impaired-not-MCI) benefit most from the integrated representation+propagation approach
The explicit transport-based progression quantifies state transition distances, potentially offering interpretable “disease trajectories” in clinical settings

7. Summary and Availability

MATCH-AD introduces a mathematically principled, scalable, and label-efficient framework for mining disease structure in heterogeneous neuroimaging datasets. It achieves nearly perfect diagnostic reliability (κ up to 0.97), robust performance across extreme label scarcity (≥5% labeled), and continuous quantification of progression via Wasserstein distances, with strong theoretical guarantees at every stage (Moayedikia et al., 19 Dec 2025). All code and data splits are provided at https://github.com/amoayedikia/brain-network.git.

PDF Markdown Chat (Pro)

References (1)

Alzheimer's Disease Brain Network Mining (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Multi View Adaptive Transport Clustering for Heterogeneous Alzheimer's Disease (MATCH-AD).