Papers
Topics
Authors
Recent
2000 character limit reached

MATCH-AD: Adaptive Transport Clustering for AD

Updated 26 December 2025
  • The paper introduces MATCH-AD, which fuses deep representation learning, graph-based label propagation, and optimal transport clustering to extract disease-relevant signals from heterogeneous neuroimaging data.
  • It achieves robust quantification of Alzheimer’s progression with strong theoretical guarantees and near-perfect diagnostic reliability under limited label availability.
  • Extensive evaluations demonstrate that MATCH-AD significantly outperforms baseline methods in accuracy and Cohen's kappa, enhancing clinical interpretability and decision-making.

Multi-view Adaptive Transport Clustering for Heterogeneous Alzheimer’s Disease (MATCH-AD) is a semi-supervised learning framework designed to address diagnostic challenges in Alzheimer’s disease (AD) using heterogeneous neuroimaging datasets with limited ground truth annotations. By integrating deep representation learning, graph-based label propagation, and optimal transport clustering, MATCH-AD enables the extraction of disease-relevant structure, label-efficient classification, and explicit quantification of disease progression, with strong theoretical guarantees and empirical superiority over existing methods (Moayedikia et al., 19 Dec 2025).

1. Overview of MATCH-AD Framework

MATCH-AD is constructed as a unified, alternating-minimization pipeline comprising three tightly coupled modules:

  • Deep Representation Learning: Heterogeneous input data—including structural MRI features (XMRIRn×190X_{MRI}\in\mathbb{R}^{n\times190}), CSF biomarkers (XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}), and clinical/demographic variables (XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23})—are preprocessed by kNN imputation (k=5) and robust scaling, then concatenated into a tensor XRn×219X\in\mathbb{R}^{n\times219}. An autoencoder fθ,gϕf_\theta,g_\phi with encoder-decoder architecture learns a latent representation Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32} designed to preserve disease-relevant manifold structure.
  • Graph-based Label Propagation: A kNN graph is built over ZZ, constructing an affinity matrix WW from Gaussian kernels and normalizing to a similarity matrix S=D1/2WD1/2S=D^{-1/2} W D^{-1/2}. This graph is used to propagate sparse clinical labels (\sim29% labeled) to the majority of the cohort.
  • Optimal Transport Clustering: Using the propagated multi-class labels, samples are partitioned into disease stages. Entropically regularized Wasserstein distances are computed between empirical distributions at adjoining stages, providing a continuous, metric-based quantification of disease progression using the Sinkhorn algorithm.

All modules contribute to a joint training objective, with alternating updates for representations, label distributions, and transport plans.

2. Mathematical Foundation

2.1 Input Views and Shared Embedding

  • Let X(1)=XMRIX^{(1)} = X_{MRI}, X(2)=XCSFX^{(2)} = X_{CSF}, X(3)=XdemoX^{(3)} = X_{demo}; X=[X(1),X(2),X(3)]Rn×219X = [X^{(1)}, X^{(2)}, X^{(3)}] \in \mathbb{R}^{n\times219}
  • Encoder: fθ:R219R32f_\theta: \mathbb{R}^{219} \to \mathbb{R}^{32}
  • Decoder: gϕ:R32R219g_\phi: \mathbb{R}^{32} \to \mathbb{R}^{219}
  • Latent representation: Z=fθ(X)Z = f_\theta(X)

2.2 Graph Construction

  • Pairwise Euclidean distance matrix in latent space: Dij=zizj2D_{ij} = \|z_i - z_j\|_2
  • For kNN graph, affinity weights:

wij={exp(Dij2/(2σiσj)),jNk(i)  or  iNk(j) 0,otherwisew_{ij} = \begin{cases} \exp\left(-D_{ij}^2/(2\sigma_i\sigma_j)\right), & j\in\mathcal{N}_k(i)\;\text{or}\; i\in\mathcal{N}_k(j)\ 0, & \text{otherwise} \end{cases}

with σi\sigma_i the mean distance to ii’s kk-th neighbor.

2.3 Label Propagation

  • Label matrix YRn×cY \in \mathbb{R}^{n\times c}: labeled rows one-hot, unlabeled rows uniform ($1/c$)
  • Propagation update:

F(t+1)=αSF(t)+(1α)Y,α[0,1)F^{(t+1)} = \alpha S F^{(t)} + (1-\alpha) Y,\quad \alpha\in[0,1)

This admits a closed-form solution:

F=(1α)(IαS)1YF^* = (1-\alpha)(I-\alpha S)^{-1}Y

2.4 Optimal Transport

  • Successive disease stages sis_i, si+1s_{i+1} induce empirical distributions over ZZ; 2-Wasserstein distance:

W2(μsi,μsi+1)=minT0,T1=a,TT1=bT,CλH(T)W_2(\mu_{s_i}, \mu_{s_{i+1}}) = \min_{T\ge0,\, T\mathbf{1}=\mathbf{a},\, T^T\mathbf{1}=\mathbf{b}} \langle T, C \rangle - \lambda H(T)

where Cjk=zj(i)zk(i+1)22C_{jk} = \|z_j^{(i)} - z_k^{(i+1)}\|_2^2 and H(T)H(T) is the entropy regularizer.

3. Training Objectives and Optimization

MATCH-AD is optimized by minimizing a composite loss:

Ltotal=LAE+β1Lprop+β2LOT+β3Lsmooth\mathcal{L}_{total} = \mathcal{L}_{AE} + \beta_1 \mathcal{L}_{prop} + \beta_2 \mathcal{L}_{OT} + \beta_3 \mathcal{L}_{smooth}

  • Autoencoder Loss LAE\mathcal{L}_{AE}:

LAE=1ni=1nxix^i22+λ1(θ22+ϕ22)+λ2KL(ziN(0,I))\mathcal{L}_{AE} = \frac{1}{n}\sum_{i=1}^n \|x_i - \hat{x}_i\|_2^2 + \lambda_1 (\|\theta\|_2^2+\|\phi\|_2^2) + \lambda_2\,\mathrm{KL}(z_i\|N(0,I))

  • Propagation Loss Lprop\mathcal{L}_{prop}:

Lprop=FαSF(1α)YF2\mathcal{L}_{prop} = \|F - \alpha S F - (1-\alpha)Y\|_F^2

  • Optimal Transport Loss LOT\mathcal{L}_{OT}: sum of Wasserstein distances
  • Smoothness Loss Lsmooth\mathcal{L}_{smooth}: promotes local consistency of soft label distributions

Algorithmic optimization proceeds by pretraining the autoencoder, followed by alternating updates of labels, transport plans, and representations until convergence.

4. Theoretical Analysis

MATCH-AD is accompanied by several theoretical guarantees:

  • Convergence of Label Propagation: The label propagation update converges geometrically for α[0,1)\alpha\in[0,1), with

F(t)FFαtF(0)FF\|F^{(t)}-F^*\|_F \le \alpha^t\|F^{(0)}-F^*\|_F

  • Label Consistency Bound: Under appropriate manifold assumptions, the expected label error is bounded:

1E[1{y^iyi}]1O(mδ2k+αT)1-\mathbb{E}[\mathbf{1}\{\hat{y}_i\neq y_i\}] \ge 1 - O\left(\frac{m\delta^2}{k}+\alpha^T\right)

where mm is the intrinsic dimension and δ\delta the geodesic error.

  • Stability of Wasserstein Distance: Empirical Wasserstein distances are stable to sampling, with

W2(μn,νn)W2(μ,ν)O(n1/(2m))|W_2(\mu_n,\nu_n) - W_2(\mu,\nu)| \le O(n^{-1/(2m)})

  • Global Convergence: Alternating minimization is guaranteed to reach a stationary point (under Lipschitz and boundedness assumptions).
  • Sample Complexity: To achieve ϵ\epsilon-accurate label propagation with probability 1δ1-\delta requires

L=O(cϵ2log(nδ)poly(k,1/α))|\mathcal{L}| = O\left(\frac{c}{\epsilon^2}\log\left(\frac{n}{\delta}\right)\,\mathrm{poly}(k,1/\alpha)\right)

5. Experimental Evaluation

5.1 Data and Preprocessing

  • Cohort: n=4,968n=4,968 subjects from National Alzheimer’s Coordinating Center; 219 total features per subject after integration
  • Semi-supervised regime: 29.1% labeled for training/testing, 70.9% unlabeled
  • Class distribution (labeled subset): Normal (63.1%), Impaired-not-MCI (3.7%), MCI (19.9%), Dementia (13.3%)
  • Preprocessing: >>50% missing—excluded; remaining imputed by kNN (k=5)
  • Train/test split: 80/20 stratified on labeled data

5.2 Performance under Label Scarcity

Labeled Fraction Accuracy (\%) Cohen’s κ\kappa
5\% (\approx70) 59.1 ± 1.8 0.159
30\% 72.6 ± 0.6 0.430
50\% 81.1 ± 0.7 0.614
80\% 91.9 ± 0.3 0.842
100\% (n=289n=289 test set) 98.4 0.970
  • MATCH-AD retains clinically meaningful κ (>>0.4) with only 30% labels, reaching “almost perfect agreement” (κ>>0.8) at 80% (Moayedikia et al., 19 Dec 2025).

5.3 Baseline Comparisons and Ablations

  • Best baseline (SelfTraining_SVM): 71.3% accuracy, κ=0.329
  • MATCH-AD: 98.4% accuracy, κ=0.970, F1=0.976
  • SelfTraining_SVM underperforms on minority classes, while MATCH-AD maintains F1>>0.86 across all classes (“Normal”: 98.9%, “Impaired-not-MCI”: 90.9%, “MCI”: 86.2%, “Dementia”: 97.4%)
  • Removal of the autoencoder collapses κ to zero/negative; removal of optimal transport impairs modeling of progression but weakly affects classification performance

5.4 Hyperparameter Sensitivity

  • Propagation α\alpha: peak performance at α0.2\alpha\approx0.2–$0.3$
  • Neighborhood kk: plateau of robust performance for k[10,20]k\in[10,20] (default: k=15k=15)
  • Outer iterations Touter10T_{outer}\approx10–20 for convergence

6. Clinical Relevance and Interpretability

  • Accuracy alone understates performance under class imbalance; Cohen’s κ provides a stricter criterion
  • MATCH-AD illustrates resource allocation tradeoffs: moderate agreement (κ>>0.4) attainable with only ~30% labeled subjects, substantial (κ>>0.6) at ~50–60%, and almost perfect (κ>>0.8) at ~80%
  • Minority/early-stage classes (Impaired-not-MCI) benefit most from the integrated representation+propagation approach
  • The explicit transport-based progression quantifies state transition distances, potentially offering interpretable “disease trajectories” in clinical settings

7. Summary and Availability

MATCH-AD introduces a mathematically principled, scalable, and label-efficient framework for mining disease structure in heterogeneous neuroimaging datasets. It achieves nearly perfect diagnostic reliability (κ up to 0.97), robust performance across extreme label scarcity (≥5% labeled), and continuous quantification of progression via Wasserstein distances, with strong theoretical guarantees at every stage (Moayedikia et al., 19 Dec 2025). All code and data splits are provided at https://github.com/amoayedikia/brain-network.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Multi View Adaptive Transport Clustering for Heterogeneous Alzheimer's Disease (MATCH-AD).