Papers
Topics
Authors
Recent
Search
2000 character limit reached

MATCH-AD: Adaptive Transport Clustering for AD

Updated 26 December 2025
  • The paper introduces MATCH-AD, which fuses deep representation learning, graph-based label propagation, and optimal transport clustering to extract disease-relevant signals from heterogeneous neuroimaging data.
  • It achieves robust quantification of Alzheimer’s progression with strong theoretical guarantees and near-perfect diagnostic reliability under limited label availability.
  • Extensive evaluations demonstrate that MATCH-AD significantly outperforms baseline methods in accuracy and Cohen's kappa, enhancing clinical interpretability and decision-making.

Multi-view Adaptive Transport Clustering for Heterogeneous Alzheimer’s Disease (MATCH-AD) is a semi-supervised learning framework designed to address diagnostic challenges in Alzheimer’s disease (AD) using heterogeneous neuroimaging datasets with limited ground truth annotations. By integrating deep representation learning, graph-based label propagation, and optimal transport clustering, MATCH-AD enables the extraction of disease-relevant structure, label-efficient classification, and explicit quantification of disease progression, with strong theoretical guarantees and empirical superiority over existing methods (Moayedikia et al., 19 Dec 2025).

1. Overview of MATCH-AD Framework

MATCH-AD is constructed as a unified, alternating-minimization pipeline comprising three tightly coupled modules:

  • Deep Representation Learning: Heterogeneous input data—including structural MRI features (XMRIRn×190X_{MRI}\in\mathbb{R}^{n\times190}), CSF biomarkers (XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}), and clinical/demographic variables (XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23})—are preprocessed by kNN imputation (k=5) and robust scaling, then concatenated into a tensor XRn×219X\in\mathbb{R}^{n\times219}. An autoencoder fθ,gϕf_\theta,g_\phi with encoder-decoder architecture learns a latent representation Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32} designed to preserve disease-relevant manifold structure.
  • Graph-based Label Propagation: A kNN graph is built over ZZ, constructing an affinity matrix WW from Gaussian kernels and normalizing to a similarity matrix S=D1/2WD1/2S=D^{-1/2} W D^{-1/2}. This graph is used to propagate sparse clinical labels (\sim29% labeled) to the majority of the cohort.
  • Optimal Transport Clustering: Using the propagated multi-class labels, samples are partitioned into disease stages. Entropically regularized Wasserstein distances are computed between empirical distributions at adjoining stages, providing a continuous, metric-based quantification of disease progression using the Sinkhorn algorithm.

All modules contribute to a joint training objective, with alternating updates for representations, label distributions, and transport plans.

2. Mathematical Foundation

2.1 Input Views and Shared Embedding

  • Let XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}0, XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}1, XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}2; XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}3
  • Encoder: XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}4
  • Decoder: XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}5
  • Latent representation: XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}6

2.2 Graph Construction

  • Pairwise Euclidean distance matrix in latent space: XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}7
  • For kNN graph, affinity weights:

XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}8

with XCSFRn×6X_{CSF}\in\mathbb{R}^{n\times6}9 the mean distance to XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}0’s XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}1-th neighbor.

2.3 Label Propagation

  • Label matrix XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}2: labeled rows one-hot, unlabeled rows uniform (XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}3)
  • Propagation update:

XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}4

This admits a closed-form solution:

XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}5

2.4 Optimal Transport

  • Successive disease stages XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}6, XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}7 induce empirical distributions over XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}8; 2-Wasserstein distance:

XdemoRn×23X_{demo}\in\mathbb{R}^{n\times23}9

where XRn×219X\in\mathbb{R}^{n\times219}0 and XRn×219X\in\mathbb{R}^{n\times219}1 is the entropy regularizer.

3. Training Objectives and Optimization

MATCH-AD is optimized by minimizing a composite loss:

XRn×219X\in\mathbb{R}^{n\times219}2

  • Autoencoder Loss XRn×219X\in\mathbb{R}^{n\times219}3:

XRn×219X\in\mathbb{R}^{n\times219}4

  • Propagation Loss XRn×219X\in\mathbb{R}^{n\times219}5:

XRn×219X\in\mathbb{R}^{n\times219}6

  • Optimal Transport Loss XRn×219X\in\mathbb{R}^{n\times219}7: sum of Wasserstein distances
  • Smoothness Loss XRn×219X\in\mathbb{R}^{n\times219}8: promotes local consistency of soft label distributions

Algorithmic optimization proceeds by pretraining the autoencoder, followed by alternating updates of labels, transport plans, and representations until convergence.

4. Theoretical Analysis

MATCH-AD is accompanied by several theoretical guarantees:

  • Convergence of Label Propagation: The label propagation update converges geometrically for XRn×219X\in\mathbb{R}^{n\times219}9, with

fθ,gϕf_\theta,g_\phi0

  • Label Consistency Bound: Under appropriate manifold assumptions, the expected label error is bounded:

fθ,gϕf_\theta,g_\phi1

where fθ,gϕf_\theta,g_\phi2 is the intrinsic dimension and fθ,gϕf_\theta,g_\phi3 the geodesic error.

  • Stability of Wasserstein Distance: Empirical Wasserstein distances are stable to sampling, with

fθ,gϕf_\theta,g_\phi4

  • Global Convergence: Alternating minimization is guaranteed to reach a stationary point (under Lipschitz and boundedness assumptions).
  • Sample Complexity: To achieve fθ,gϕf_\theta,g_\phi5-accurate label propagation with probability fθ,gϕf_\theta,g_\phi6 requires

fθ,gϕf_\theta,g_\phi7

5. Experimental Evaluation

5.1 Data and Preprocessing

  • Cohort: fθ,gϕf_\theta,g_\phi8 subjects from National Alzheimer’s Coordinating Center; 219 total features per subject after integration
  • Semi-supervised regime: 29.1% labeled for training/testing, 70.9% unlabeled
  • Class distribution (labeled subset): Normal (63.1%), Impaired-not-MCI (3.7%), MCI (19.9%), Dementia (13.3%)
  • Preprocessing: fθ,gϕf_\theta,g_\phi950% missing—excluded; remaining imputed by kNN (k=5)
  • Train/test split: 80/20 stratified on labeled data

5.2 Performance under Label Scarcity

Labeled Fraction Accuracy (\%) Cohen’s Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}0
5\% (Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}170) 59.1 ± 1.8 0.159
30\% 72.6 ± 0.6 0.430
50\% 81.1 ± 0.7 0.614
80\% 91.9 ± 0.3 0.842
100\% (Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}2 test set) 98.4 0.970
  • MATCH-AD retains clinically meaningful κ (Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}30.4) with only 30% labels, reaching “almost perfect agreement” (κZ=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}40.8) at 80% (Moayedikia et al., 19 Dec 2025).

5.3 Baseline Comparisons and Ablations

  • Best baseline (SelfTraining_SVM): 71.3% accuracy, κ=0.329
  • MATCH-AD: 98.4% accuracy, κ=0.970, F1=0.976
  • SelfTraining_SVM underperforms on minority classes, while MATCH-AD maintains F1Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}50.86 across all classes (“Normal”: 98.9%, “Impaired-not-MCI”: 90.9%, “MCI”: 86.2%, “Dementia”: 97.4%)
  • Removal of the autoencoder collapses κ to zero/negative; removal of optimal transport impairs modeling of progression but weakly affects classification performance

5.4 Hyperparameter Sensitivity

  • Propagation Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}6: peak performance at Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}7–Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}8
  • Neighborhood Z=fθ(X)Rn×32Z=f_\theta(X)\in\mathbb{R}^{n\times32}9: plateau of robust performance for ZZ0 (default: ZZ1)
  • Outer iterations ZZ2–20 for convergence

6. Clinical Relevance and Interpretability

  • Accuracy alone understates performance under class imbalance; Cohen’s κ provides a stricter criterion
  • MATCH-AD illustrates resource allocation tradeoffs: moderate agreement (κZZ30.4) attainable with only ~30% labeled subjects, substantial (κZZ40.6) at ~50–60%, and almost perfect (κZZ50.8) at ~80%
  • Minority/early-stage classes (Impaired-not-MCI) benefit most from the integrated representation+propagation approach
  • The explicit transport-based progression quantifies state transition distances, potentially offering interpretable “disease trajectories” in clinical settings

7. Summary and Availability

MATCH-AD introduces a mathematically principled, scalable, and label-efficient framework for mining disease structure in heterogeneous neuroimaging datasets. It achieves nearly perfect diagnostic reliability (κ up to 0.97), robust performance across extreme label scarcity (≥5% labeled), and continuous quantification of progression via Wasserstein distances, with strong theoretical guarantees at every stage (Moayedikia et al., 19 Dec 2025). All code and data splits are provided at https://github.com/amoayedikia/brain-network.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi View Adaptive Transport Clustering for Heterogeneous Alzheimer's Disease (MATCH-AD).