3D MRI brain tumor segmentation using autoencoder regularization (1810.11654v3)

Published 27 Oct 2018 in cs.CV and q-bio.NC

Abstract: Automated segmentation of brain tumors from 3D magnetic resonance images (MRIs) is necessary for the diagnosis, monitoring, and treatment planning of the disease. Manual delineation practices require anatomical knowledge, are expensive, time consuming and can be inaccurate due to human error. Here, we describe a semantic segmentation network for tumor subregion segmentation from 3D MRIs based on encoder-decoder architecture. Due to a limited training dataset size, a variational auto-encoder branch is added to reconstruct the input image itself in order to regularize the shared decoder and impose additional constraints on its layers. The current approach won 1st place in the BraTS 2018 challenge.

Citations (931)

View on Semantic Scholar

Summary

The paper's main contribution is its novel integration of a VAE branch to regularize the encoder in a 3D CNN, significantly boosting segmentation accuracy.
The methodology employs a ResNet-based encoder with Group Normalization and a mirrored decoder with upsampling, effectively handling high-dimensional MRI data.
Results on the BraTS 2018 dataset showed impressive dice scores across tumor regions, underscoring the model's robust performance and clinical potential.

3D MRI Brain Tumor Segmentation Using Autoencoder Regularization

In the domain of medical imaging, the segmentation of brain tumors using 3D MRI scans is essential for accurate diagnosis, monitoring, and treatment planning. Manual delineation of these tumors is not only labor-intensive and time-consuming but also prone to human error. The paper by Andriy Myronenko introduces a sophisticated approach to automate this process using a deep learning model, which stands out due to its superior performance validated by its first-place win in the BraTS 2018 challenge.

Methodology

The proposed method hinges on an encoder-decoder convolutional neural network (CNN) architecture, enhanced by a variational autoencoder (VAE) branch. The encoder-decoder network is asymmetric; the encoder part, designed to extract deep image features, is significantly larger than the decoder, which reconstructs the segmentation mask. This design allows the model to efficiently manage the high-dimensional feature space inherent in 3D MRI scans.

Encoder and Decoder

The encoder employs ResNet blocks with Group Normalization, which mitigates the adverse effects of small batch sizes in training. The progressively downsized feature maps allow the model to capture intricate details necessary for accurate segmentation. For the decoder, the structure mirrors the encoder but includes upsampling layers to restore the original spatial dimensions.

VAE Branch

An innovative aspect of this work is the integration of a VAE branch, designed to regularize the encoder by reconstructing the input image. This regularization is particularly crucial given the limited size of the training dataset. By enforcing the encoder to retain more complex information about the input image, the VAE branch aids in clustering feature representations more effectively, ultimately leading to better segmentation performance.

Training and Optimization

The loss function combines three terms: soft Dice loss for segmentation accuracy, L2 loss for VAE reconstruction fidelity, and KL divergence for VAE regularization. The careful balancing of these terms ensures that both segmentation accuracy and feature coherence are optimized. The model is trained using the Adam optimizer with a learning rate schedule that gradually decreases over the training epochs.

Results and Performance

On the BraTS 2018 validation dataset, the model achieved impressive dice scores: 0.8145 for the enhancing tumor (ET), 0.9042 for the whole tumor (WT), and 0.8596 for the tumor core (TC), with marginal improvements observed through test-time augmentation (TTA) and model ensembling. The results on the testing dataset demonstrated robust performance with dice scores of 0.7664 for ET, 0.8839 for WT, and 0.8154 for TC, corroborating the model's efficacy.

Computational Efficiency

The model's computational requirements are substantial, necessitating the use of high-end GPUs for training. Specifically, the NVIDIA Tesla V100 32GB GPU significantly expedited the training process, reducing the time from approximately two days on smaller hardware configurations to around six hours on more advanced setups.

Implications and Future Work

This work has profound implications for the field of medical imaging. By automating the segmentation process, it not only enhances the precision of brain tumor analysis but also alleviates the clinical workload, thereby allowing radiologists to focus on more critical tasks. The integration of VAE for regularization opens avenues for further refinement of neural network architectures, particularly in scenarios with limited training data.

Looking forward, future research could explore more sophisticated data augmentation techniques, integration of additional imaging modalities, and leveraging transfer learning to further enhance model accuracy. There is also potential in refining post-processing techniques to fine-tune segmentation masks, potentially deploying graphical models like Conditional Random Fields (CRFs) more effectively.

Conclusion

The described methodology represents a significant advancement in the automated segmentation of brain tumors from 3D MRIs. By combining an encoder-decoder CNN with a VAE branch for regularization, the model sets a high benchmark, validated by its success in the BraTS 2018 challenge. This work exemplifies how deep learning can be harnessed to tackle complex problems in medical imaging, paving the way for more accurate and efficient diagnostic tools.

PDF Markdown