Convolutional Neural Networks for Classification of Alzheimer's Disease: Overview and Reproducible Evaluation (1904.07773v6)

Published 16 Apr 2019 in cs.LG, eess.IV, and stat.ML

Abstract: Over 30 papers have proposed to use convolutional neural network (CNN) for AD classification from anatomical MRI. However, the classification performance is difficult to compare across studies due to variations in components such as participant selection, image preprocessing or validation procedure. Moreover, these studies are hardly reproducible because their frameworks are not publicly accessible and because implementation details are lacking. Lastly, some of these papers may report a biased performance due to inadequate or unclear validation or model selection procedures. In the present work, we aim to address these limitations through three main contributions. First, we performed a systematic literature review and found that more than half of the surveyed papers may have suffered from data leakage. Our second contribution is the extension of our open-source framework for classification of AD using CNN and T1-weighted MRI. Finally, we used this framework to rigorously compare different CNN architectures. The data was split into training/validation/test sets at the very beginning and only the training/validation sets were used for model selection. To avoid any overfitting, the test sets were left untouched until the end of the peer-review process. Overall, the different 3D approaches (3D-subject, 3D-ROI, 3D-patch) achieved similar performances while that of the 2D slice approach was lower. Of note, the different CNN approaches did not perform better than a SVM with voxel-based features. The different approaches generalized well to similar populations but not to datasets with different inclusion criteria or demographical characteristics.

Authors (10)

Junhao Wen (22 papers)
Elina Thibeau-Sutre (6 papers)
Mauricio Diaz-Melo (1 paper)
Alexandre Routier (7 papers)
Simona Bottani (7 papers)
Didier Dormont (8 papers)
Stanley Durrleman (19 papers)
Ninon Burgos (18 papers)
Olivier Colliot (36 papers)
Jorge Samper-Gonzalez (2 papers)

Citations (465)

View on Semantic Scholar

Summary

The paper introduces a reproducible evaluation framework that corrects data leakage issues in CNN-based Alzheimer’s classification.
It categorizes and compares various CNN architectures, revealing that 3D methods outperform 2D approaches in capturing spatial information.
Experimental results indicate that complex CNNs sometimes match or underperform linear SVMs, emphasizing the need for rigorous methodology.

Convolutional Neural Networks for Alzheimer's Disease Classification: Technical Evaluation and Reproducibility

The paper titled "Convolutional Neural Networks for Classification of Alzheimer’s Disease: Overview and Reproducible Evaluation" offers a comprehensive analysis of existing methodologies for Alzheimer's disease (AD) classification using convolutional neural networks (CNNs) coupled with T1-weighted magnetic resonance imaging (MRI) data. The significance of the paper lies in its attempt to address the reproducibility challenges and potential biases—primarily data leakage—that plague many existing works, and in doing so, provide a framework that facilitates unbiased evaluation and comparison of CNN-based approaches.

Analysis of Existing Methods

Initial investigations reveal substantial discrepancies in the reported performance of CNNs for AD classification. These inconsistencies arise due predominantly to variations in data preprocessing, model architecture, and evaluation protocols across different studies. The paper analyzes over 30 research articles employing CNNs for AD classification, categorizing them based on CNN types: 2D slice-level, 3D patch-level, ROI-based, and 3D subject-level approaches. It notes a significant proportion (over 50%) may have over-reported model performance due to data leakage issues, highlighting the lack of methodological rigor, particularly in data splitting and validation strategies.

Proposed Framework

In response, the authors present a reproducible open-source framework for evaluating CNN methods on three public datasets: ADNI, AIBL, and OASIS, ensuring a robust paradigm for performance evaluation. This framework includes tools to convert neuroimaging data into the BIDS format, a modular pipeline encompassing image preprocessing, CNN architectures, model training, and evaluation procedures. The framework adopts strict training-validation-test separation practice to mitigate risk of data leakage, with independent test datasets only accessed at paper conclusion to validate model performance.

Experimental Results and Comparative Analysis

Utilizing this framework, several CNN architectures were evaluated. Rigorous experiments demonstrated that 3D models generally outperformed 2D slice-level approaches, with the latter failing to capture inter-slice spatial correlations effectively. Notably, the 3D ROI-based and 3D subject-level CNN variants showed similar classification efficiencies, albeit with varying computational costs and operational complexity.

Interestingly, an unexpected observation was that these sophisticated 3D CNN models achieved comparable or occasionally inferior performance to linear SVM models trained on voxel-based features, indicating that in current dataset conditions, CNNs do not provide substantial accuracy benefits over traditional machine learning approaches. Nevertheless, the CNNs showcased improved task-specific performance in AD versus CN classifications but struggled with generalization in predicting sMCI progression, especially across different datasets.

Implications and Future Directions

This paper asserts the crucial role of framework reproducibility and transparency in the advancement of CNN applications for AD classification. By addressing data leakage and establishing standardized methods for preprocessing and validation, this work lays a foundation for future endeavors to develop and evaluate models on more diversified patient populations in neuroimaging.

For future exploration, the framework’s adoption in larger datasets and different modalities, such as integrating multi-modal data (PET, MRI, and clinical evaluations) and exploring novel architectural advancements, remains a promising avenue. Moreover, as training data expands, leveraging advanced deep learning paradigms such as semi-supervised or self-supervised strategies may enhance model robustness and clinical applicability. The paper invites the community to utilize the public nature of the proposed framework to further these research frontiers in the scientific quest towards effective AD diagnosis.

PDF Markdown