3D CNNs in PPMI Neuroimaging

Updated 14 July 2025

3D CNN on PPMI is a method that applies three-dimensional convolutional operations to analyze multi-modal neuroimaging data in Parkinson’s research.
It leverages spatial filtering and deep architectures like 3D ResNet-18 and 3D AlexNet to detect subtle neurodegenerative changes with high diagnostic accuracy.
The approach integrates robust preprocessing, interpretable models, and multimodal data fusion to facilitate clinical translation and biomarker discovery.

Three-dimensional convolutional neural networks (3D CNNs) have emerged as a critical methodology for analyzing volumetric medical imaging data in the context of the Parkinson's Progression Markers Initiative (PPMI). The PPMI dataset comprises multi-modal neuroimaging scans, particularly MRI and functional SPECT data, collected to advance the understanding and diagnosis of Parkinson’s disease (PD). 3D CNNs are uniquely capable of exploiting spatial context in three dimensions, enabling the detection and explanation of subtle, distributed neurodegenerative changes. The following sections review the history, methodological details, comparative performance, adaptation challenges, and clinical implications of 3D CNN-based analysis on PPMI data.

1. Methodological Foundations of 3D CNNs in PPMI

3D CNNs extend the convolutional operation beyond two-dimensional images to volumetric data, where kernels filter across height, width, and depth (or slices). For a 3D input $X \in \mathbb{R}^{H \times W \times D}$ and kernel $K \in \mathbb{R}^{k_h \times k_w \times k_d}$ , the convolution at voxel $(x,y,z)$ is defined as:

$Y(x,y,z) = \varphi \left( b + \sum_{i,j,k} K(i,j,k) \cdot X(x+i, y+j, z+k) \right)$

where $\varphi$ is a nonlinearity (e.g., ReLU or SELU), and $b$ is a bias term (Martinez-Murcia et al., 2023). This paradigm allows 3D CNNs to capture spatial relations across brain slices, a crucial property for tasks such as voxel-wise classification, region segmentation, and disease staging in PD cohorts.

Architectural innovations within this framework include the use of deep backbones (e.g., 3D ResNet-18 (Huang et al., 2024), 3D AlexNet (Martinez-Murcia et al., 2023)), fusion with attention modules for slice/patch weighting, and conversion of pre-trained 2D models for volumetric input via "inflation" or mixed convolution strategies (Huang et al., 2024).

2. Preprocessing and Data Harmonization

Preprocessing for 3D CNNs on PPMI data focuses on spatial and intensity normalization, resampling to uniform voxel dimensions, and artifact removal. For MRI, spatial normalization (e.g., affine and non-linear registration to MNI space) ensures anatomical consistency (Sahu et al., 2023), while intensity normalization (such as dividing by the mean of the top 3% voxel intensities or by local mean—integral normalization) significantly impacts classification accuracy for SPECT/fMRI modalities (Martinez-Murcia et al., 2023).

Interestingly, research indicates that deep 3D CNNs (e.g., 3D AlexNet) partially compensate for residual spatial misalignments through learned invariance, thus mitigating the computational demands traditionally associated with dense spatial normalization (Martinez-Murcia et al., 2023). In contrast, failure to normalize intensities can obscure disease-relevant signals, especially in nuclear imaging.

For network input, volumetric scans are commonly resampled to fixed dimensions (e.g., $128 \times 128 \times 128$ voxels (Patel et al., 2024)), with missing or low-information slices interpolated or excluded (Huang et al., 2024).

3. Network Architectures and Training Strategies

A variety of 3D CNN architectures are employed in the literature:

Standard 3D CNNs: E.g., 3D ResNet-18, 3D AlexNet, and variants with fixed convolutional depths and kernel sizes (Huang et al., 2024, Martinez-Murcia et al., 2023).
Hybrid architectures: E.g., MC3 (first layers use 3D convolutions, later layers switch to 2D), R(2+1)D (factorizes 3D kernels into consecutive 2D spatial and 1D temporal convolutions) (Huang et al., 2024).
Attention modules: Slice-relation mechanisms (self-attention, multi-head attention) are attached to weigh contributions of different slices, improving multiclass stage prediction performance (Huang et al., 2024).
Prototype-based interpretable networks (e.g., PIPNet3D): Incorporate a 3D CNN backbone and a linear decision layer that combines unsupervised learned prototypical local regions, yielding interpretable predictions and facilitating clinical validation of discovered features (Santi et al., 2024).
Ordinal classification: Instead of nominal softmax, some networks leverage ordinal encodings and Error Correcting Output Code (ECOC) strategies to more faithfully model disease staging (Barbero-Gómez et al., 2021).
ConvKAN: Introduces learnable B-spline activation functions within convolutional layers; 3D implementations led to state-of-the-art performance on PPMI MRI for early PD detection (Patel et al., 2024).

Transfer learning is widespread for 3D CNNs, with weights initialized from action recognition video datasets like Kinetics-400, although the domain gap with medical images imposes performance ceilings (Huang et al., 2024, Patel et al., 2024). Fine-tuning typically affects only higher or domain-adaptive layers.

The models are trained with cross-entropy or specialized ordinal regression losses, and imbalanced class distributions are managed using synthetic oversampling (ADASYN, OGO-SP-beta), class weighting, or targeted data augmentation strategies (Barbero-Gómez et al., 2021, Sahu et al., 2023).

4. Comparative Performance and Evaluation Metrics

3D CNNs on PPMI and comparable datasets achieve high classification and staging performance when appropriately designed and trained:

3D CNNs (e.g., 3D AlexNet) attain diagnostic accuracy up to 94.1% and ROC-AUC up to 0.984 on PPMI SPECT (Martinez-Murcia et al., 2023).
3D ConvKAN surpasses standard CNNs and even GCNs (when volumetric supervoxel graphs are used), achieving AUROC values of up to 1.00 for early-stage PD (Patel et al., 2024).
Attention-augmented 3D/2D models outperform their non-attention counterparts, especially for multiclass PD stage classification (Huang et al., 2024).

Typical metrics include accuracy, balanced accuracy, macro F1-score, Cohen’s kappa, mean absolute error, and ROC-AUC (Patel et al., 2024, Huang et al., 2024). In ordinal classification contexts, ordinal metrics such as weighted kappa and Kendall’s tau are further employed (Barbero-Gómez et al., 2021).

Performance, however, can be sensitive to the quality of pretraining (medical vs. nonmedical datasets), size and balance of training data, and congruence of intensity distributions across scans and cohorts.

5. Interpretability and Evaluation of Learned Features

Interpretability advances include saliency map visualization (gradient-based or guided backpropagation), class activation mapping (Grad-CAM adapted to 3D), and perturbation-based importance analysis (Kan et al., 2020, Martinez-Murcia et al., 2023). PIPNet3D extends this direction with unsupervised identification of volumetric prototypes, which are evaluated using anatomically grounded entropy and localization consistency metrics:

$H_p = H(\textrm{CerebrA}_{(\text{VOI}_p)}) \ LC_p = \sum_{\text{img}} \frac{\| \text{VOI}_{cc,p}|_{\text{img}} - \bar{\text{VOI}}_{cc,p} \|^2}{l\sqrt{3}}$

These interpretable methods reveal that 3D CNNs not only match radiologically established biomarkers (e.g., focus on striatum in SPECT imaging for PD diagnosis (Martinez-Murcia et al., 2023)) but also facilitate hypothesis generation regarding novel regions of interest. Robust interpretability is especially crucial for clinical trust and regulatory acceptance.

6. Challenges, Limitations, and Generalization

Several challenges are characteristic of deploying 3D CNNs on PPMI:

Domain Gap in Pretraining: Pretrained weights from video datasets like Kinetics-400 may not optimally transfer to neuroimaging modalities, contributing to the observed superiority of some 2D models pretrained on ImageNet for certain tasks (Huang et al., 2024).
Computational Burden: 3D CNNs are resource intensive, necessitating efficient memory management (e.g., bottleneck layers, compression in DVNet (2002.01568)), and balanced model depth (Barbero-Gómez et al., 2021).
Dataset Scale and Heterogeneity: Relatively small and imbalanced labeled datasets (especially for rare PD stages) limit model performance and necessitate robust data augmentation. Models are variably sensitive to missing slices, image artifacts, and inter-site differences (Barbero-Gómez et al., 2021, Huang et al., 2024).
Cross-Dataset Generalization: Models showing high performance on PPMI can generalize poorly to external datasets acquired with different imaging protocols, unless explicit domain adaptation or cotraining is performed (Patel et al., 2024, Huang et al., 2024).
Interpretability versus Complexity: Increased model complexity via attention or advanced architectures may improve average performance but sometimes results in greater metric variability across folds or test cohorts (Huang et al., 2024).

7. Clinical Implications and Future Directions

3D CNN analysis of PPMI data is directly relevant for the following clinical and research applications:

Diagnosis and Staging: Automated, robust, and reproducible classification of PD versus control, as well as fine-grained prediction of PD stage/severity using SPECT or structural MRI (Huang et al., 2024, Martinez-Murcia et al., 2023).
Biomarker Discovery: Unsupervised discovery of imaging prototypes or regions linked to disease allows hypothesis generation and refinement of neuroanatomical models of PD (Santi et al., 2024).
Integration of Multimodal Data: Combined analysis of MRI, DTI, and functional imaging, with potential for fusion using separate CNNs and decision-level weight optimization (Sahu et al., 2023).
Segmentation and Quantification: Memory-efficient 3D segmentation models (e.g., DVNet) pave the way for automated extraction of anatomical volumes and microarchitecture features relevant to PD pathology and progression (2002.01568).
Translation to Clinical Workflows: The combination of high diagnostic accuracy, interpretability, and computational efficiency in recent models facilitates integration into clinical decision support systems.

Ongoing directions include developing pretraining protocols with large-scale medical imaging datasets, advancing interpretable and prototype-based models, improving cross-institutional model generalization, and unifying multimodal imaging analysis pipelines.

Table: Comparative Summary of Selected 3D CNN Approaches on PPMI

Paper/Model	Input Modality	Accuracy/AUROC (PPMI)	Notable Features / Methods
ALEXNET3D (Martinez-Murcia et al., 2023)	SPECT	94.1% / 0.984	Intensity normalization, saliency maps
3D ConvKAN (Patel et al., 2024)	MRI	1.00 (AUROC)	B-spline activations, high generalizability
3D CNN (ResNet-18) (Huang et al., 2024)	SPECT	0.65–0.81	Attention layer, cotraining, Kinetics-400 pretrain
Ordinal 3D CNN (Barbero-Gómez et al., 2021)	SPECT	NA	ECOC coding, OGO-SP-beta augmentation
Multi-modal CNN (slice-wise) (Sahu et al., 2023)	MRI, DTI	95.53%	Decision-level fusion, ADASYN oversampling
PIPNet3D (Santi et al., 2024)	MRI (Alzheimer demo)	Same as blackbox	Interpretable prototypes, spatial consistency
DVNet (2002.01568)	KESM (ex., extendable)	~93.9% (3D segm.)	Dense encoder-decoder, feature compression

*NA: Not specified exactly for PPMI in the data excerpt, see original for detailed metrics.

3D CNNs have been demonstrated as a powerful tool in the analysis of volumetric PPMI neuroimaging data, with substantial progress in robustness, interpretability, and clinical viability. The continued development and evaluation of 3D CNN methods, especially those incorporating interpretable and efficient design principles, are poised to transform research and diagnosis in Parkinson’s disease.