An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation (1709.04496v2)

Published 13 Sep 2017 in cs.CV

Abstract: Accurate segmentation of the heart is an important step towards evaluating cardiac function. In this paper, we present a fully automated framework for segmentation of the left (LV) and right (RV) ventricular cavities and the myocardium (Myo) on short-axis cardiac MR images. We investigate various 2D and 3D convolutional neural network architectures for this task. We investigate the suitability of various state-of-the art 2D and 3D convolutional neural network architectures, as well as slight modifications thereof, for this task. Experiments were performed on the ACDC 2017 challenge training dataset comprising cardiac MR images of 100 patients, where manual reference segmentations were made available for end-diastolic (ED) and end-systolic (ES) frames. We find that processing the images in a slice-by-slice fashion using 2D networks is beneficial due to a relatively large slice thickness. However, the exact network architecture only plays a minor role. We report mean Dice coefficients of $0.950$ (LV), $0.893$ (RV), and $0.899$ (Myo), respectively with an average evaluation time of 1.1 seconds per volume on a modern GPU.

Citations (238)

View on Semantic Scholar

Summary

The paper demonstrates that a modified 2D U-Net achieves high segmentation accuracy with mean Dice scores of 0.950 for LV, 0.893 for RV, and 0.899 for Myo.
It evaluates four CNN architectures, comparing 2D and 3D approaches while analyzing the effect of preprocessing and various loss functions on performance.
Findings highlight that optimized 2D networks may outperform 3D models, informing future strategies in advancing cardiac image analysis and clinical diagnostics.

An Exploration of 2D and 3D Deep Learning Techniques for Cardiac MR Image Segmentation

This paper undertakes a comprehensive exploration of the application of 2D and 3D convolutional neural networks (CNNs) for the automated segmentation of cardiac MR images. The primary focus is on distinguishing between the left ventricular (LV) cavity, right ventricular (RV) cavity, and the myocardium (Myo) using short-axis cardiac MR imaging data. The paper evaluates the performance of diverse network architectures and investigates the comparative utility of 2D versus 3D approaches, especially in light of the relatively low through-plane resolution that typifies many cardiac MR datasets.

Methodology

The authors explore four distinct network architectures: the fully convolutional network (FCN-8), the 2D U-Net, an optimized version of the 2D U-Net with fewer feature maps in the upsampling path, and a 3D U-Net with alterations to preserve spatial information. The investigation spans various stages of network training from pre-processing to post-processing. Pre-processing includes resampling all images to common resolutions suitable for both 2D and 3D networks, as well as intensity normalization. The paper also rigorously evaluates several cost functions, including standard cross-entropy, weighted cross-entropy, and the Dice loss, with the ADAM optimizer used for parameter tuning.

Key Findings

Network Architecture and Performance: The experimental results reveal that while the overall framework of the architecture plays a role, it is less crucial than other factors such as the choice of loss function and the use of batch normalization. Notably, the modified 2D U-Net marginally outperforms other architectures, achieving mean Dice coefficients of 0.950 for LV, 0.893 for RV, and 0.899 for Myo.
2D versus 3D Networks: Despite anticipated benefits, the 3D networks did not exhibit superior performance compared to their 2D counterparts. This result may stem from factors such as reduced training efficiency due to smaller data volumes, complications related to convolutions at volume edges, and constraints on GPU memory necessitating downsampling.
Impact of Pre- and Post-Processing: The methodology affirms the significance of resolution inferences both during input preprocessing and output post-processing. Improvements in accuracy were noted when employing linear interpolation on the softmax output for resampling, emphasizing the sensitivity of segmentation outcomes to these numerical processes.

Implications and Future Research

The implications of this paper are twofold: practically, the findings suggest achievable improvements in cardiac image analysis workflows, with potential accuracy enhancements in pathological assessments and therapeutic planning. Theoretically, this research continues to inform the balance between 2D and 3D network applications in medical imaging contexts, highlighting constraints and optimization opportunities inherent in network design and data preprocessing techniques.

Moving forward, future research could address the potential of hybrid models that incorporate the strengths of both 2D and 3D networks. Such work could explore more sophisticated data augmentation methods, cross-modality imaging integrations, or enhanced GPU utilization strategies to enable higher resolution 3D inferences. Further evaluation on diverse datasets would aid in generalizing these findings, addressing specific challenges encountered at cardiac apex and base regions. With continuous advances in computational infrastructure and algorithmic approaches, the pursuit of optimal cardiac image segmentation remains a pivotal domain promising substantial contributions to clinical diagnostics and interventional strategies.

PDF Markdown