Cascaded V-Net using ROI masks for brain tumor segmentation

Published 30 Dec 2018 in cs.CV, cs.AI, cs.CY, cs.LG, and stat.ML | (1812.11588v1)

Abstract: In this work we approach the brain tumor segmentation problem with a cascade of two CNNs inspired in the V-Net architecture \cite{VNet}, reformulating residual connections and making use of ROI masks to constrain the networks to train only on relevant voxels. This architecture allows dense training on problems with highly skewed class distributions, such as brain tumor segmentation, by focusing training only on the vecinity of the tumor area. We report results on BraTS2017 Training and Validation sets.

Abstract PDF Upgrade to Chat

Citations (189)

View on Semantic Scholar

Summary

Overview of Cascaded V-Net using ROI Masks for Brain Tumor Segmentation

The paper titled "Cascaded V-Net using ROI masks for brain tumor segmentation" addresses the complex problem of brain tumor segmentation in MRI images by proposing a novel deep learning approach. The authors present a cascaded architecture based on V-Net, which leverages ROI masks to constrain the convolutional neural network (CNN) to learn from relevant voxels, enhancing the segmentation's precision and efficiency.

Architecture and Methodology

The segmentation challenge is approached through a two-step cascade of V-Net architectures. The network consists of convolutional blocks that integrate residual connections, which have been reformulated for improved gradient flow. These insights are drawn from advancements in identity mappings in deep residual networks. The architecture employs ROI masks to focus the networks on pertinent voxels, thus optimizing training efficiency and addressing class imbalance that arises from the typically small size of tumor regions relative to the entire MRI volume.

The cascading approach involves two distinct tasks:

Segmentation of Tumor Area: The first network, configured for binary classification, distinguishes tumor from non-tumor volumes. This segmentation relies on multi-modal MRI inputs and uses a brain mask to concentrate on brain tissue only.
Delineation of Tumor Sub-regions: The second network refines segmentation by categorizing tumor regions into edema, enhancing core, and non-enhancing core. This stage exploits the output of the first network as an ROI mask, reducing false positives by confining the focus to tumor vicinity.

The authors employ dense-training, using entire images in small batches which bypasses traditional patch-wise approaches, potentially reducing training time and enhancing consistency across the brain structure commonality.

Results and Implications

The model's performance is evaluated on the BraTS2017 dataset, showcasing competitive Dice scores particularly for whole tumor (WT) regions, achieving results close to leading methods reported on the challenge leaderboard. The specificity metric indicates a strong ability to correctly identify background, reflecting the benefits of mask-based training. However, the model exhibits lower sensitivity towards enhancing tumor (ET) and tumor core (TC) regions, indicating potential underrepresentation problems which may need addressing through enhanced class weighting strategies.

Visual evaluations confirm the model's proficiency in segmenting complete tumor regions, though challenges remain in accurately delineating specific tumor sub-regions, especially in instances of complex tumor morphology or lower grade gliomas.

Future Directions

The proposed methodology highlights advancements in addressing class imbalance and leveraging dense volumetric information. Future work should focus on refining the weighting mechanisms during the learning phase to improve detection of less prevalent tumor sub-regions. Additionally, integration of improved ROI generation methods could further enhance segmentation quality and reduce false positive rates.

The paper contributes valuable insights into efficient volumetric medical image analysis and paves the way for more refined segmentation approaches in oncological imaging, emphasizing the importance of hierarchical architectural strategies and targeted training methods. Further exploration in adaptive mask strategies and multi-modal data fusion will likely drive enhancements in segmentation accuracy and clinical applicability.