NeoUNet: Towards accurate colon polyp segmentation and neoplasm detection (2107.05023v1)

Published 11 Jul 2021 in eess.IV and cs.CV

Abstract: Automatic polyp segmentation has proven to be immensely helpful for endoscopy procedures, reducing the missing rate of adenoma detection for endoscopists while increasing efficiency. However, classifying a polyp as being neoplasm or not and segmenting it at the pixel level is still a challenging task for doctors to perform in a limited time. In this work, we propose a fine-grained formulation for the polyp segmentation problem. Our formulation aims to not only segment polyp regions, but also identify those at high risk of malignancy with high accuracy. In addition, we present a UNet-based neural network architecture called NeoUNet, along with a hybrid loss function to solve this problem. Experiments show highly competitive results for NeoUNet on our benchmark dataset compared to existing polyp segmentation models.

Citations (55)

View on Semantic Scholar

Summary

The paper introduces NeoUNet, a UNet-based model enhanced with a HarDNet68 encoder, attention gates, and deep supervision to address the PSND challenge.
It employs a hybrid loss function combining multi-class and binary segmentation losses to effectively handle unknown labels and class imbalance.
Experimental results show that NeoUNet outperforms baselines in Dice and IoU metrics while providing real-time performance for clinical colonoscopy applications.

This paper introduces NeoUNet, a deep neural network architecture designed for a fine-grained task combining colon polyp segmentation with neoplasm detection. Traditional polyp segmentation focuses on identifying polyps, but determining whether a polyp is neoplastic (precancerous) or non-neoplastic is crucial for guiding clinical management. This task is challenging, even for experienced endoscopists, especially under time pressure.

The authors propose the "Polyp Segmentation and Neoplasm Detection" (PSND) problem, which extends standard binary polyp segmentation to a multi-class pixel-wise labeling problem. For each pixel in an input colonoscopy image, the model should output one of four labels:

0: Background
1: Non-neoplastic polyp
2: Neoplastic polyp
3: Polyp with unknown neoplasticity

The presence of "unknown" labels is a unique challenge, as these pixels should contribute to segmentation but not necessarily to classification during training.

To address the PSND problem, the paper proposes NeoUNet, a UNet-based architecture with several key features aimed at improving performance and efficiency:

HarDNet68 Encoder: The encoder uses the HarDNet68 architecture [chao2019hardnet], which is known for its efficient memory traffic and good performance, inspired by DenseNet principles but with sparser connections. Using a pre-trained HarDNet on ImageNet allows leveraging transfer learning.
Attention Gates: Additive attention gates [oktay2018attention] are incorporated into the skip connections between the encoder and decoder. These gates help filter and focus the information passed from the encoder, allowing the model to emphasize relevant spatial regions and suppress irrelevant ones in finer feature maps.
Deep Supervision: The decoder structure includes output layers at multiple scales. During training, segmentation masks are predicted at different resolutions and upsampled to the original image size. Losses are computed for each scale and summed, which helps stabilize training and improves convergence.
Hybrid Loss Function: A novel loss function combines a multi-class segmentation loss and a binary segmentation loss. The total loss $\mathcal{L}$ $L$ is a weighted sum: $\mathcal{L} = w_c \mathcal{L}_c + w_s \mathcal{L}_s$ , where $w_c$ $w_{c}$ and $w_s$ $w_{s}$ are weights ($0.75$ and $0.25$ in the paper).
- $\mathcal{L}_c$ : Multi-class loss (for classes 0, 1, 2) calculated using the average of Binary Cross Entropy (BCE) and Focal Tversky Loss [abraham2019novel]. Focal Tversky is a variant of Tversky Loss salehi2017tversky that incorporates Focal Loss [lin2017focal] concepts to focus on hard examples and handle class imbalance. The authors set parameters $\alpha=0.3$ , $\beta=0.7$ (prioritizing recall) and $\gamma=4/3$ (focusing on hard examples). This loss is the primary driver for classification.
- $\mathcal{L}_s$ : Binary segmentation loss (for polyp vs. background) calculated using the average of BCE and standard Tversky Loss. This loss ensures accurate overall polyp segmentation and allows training on images where only binary segmentation labels are available (e.g., "unknown" polyps).
Handling "Unknown" Labels: Pixels labeled as "unknown" in the ground truth contribute only to the binary segmentation loss $\mathcal{L}_s$ and are ignored by the multi-class loss $\mathcal{L}_c$ . This enables the model to use data with unknown classification for segmentation training while forcing it to predict either neoplastic or non-neoplastic for ambiguous cases based on learned features.

The authors curated a new dataset called NeoPolyp, consisting of 7,466 annotated endoscopic images from various lighting modes (WLI, FICE, BLI, LCI) and Paris classifications. Each image has pixel-wise annotations for the four PSND classes (background, non-neoplastic, neoplastic, unknown). A filtered version, NeoPolyp-Clean (6,630 images), excludes images with "unknown" labels. The dataset exhibits significant class imbalance, with neoplastic polyps being the majority.

Implementation Details & Training:

Implemented in PyTorch.
Trained using SGD with Nesterov momentum, learning rate 0.001 with warmup and cosine annealing schedule.
Oversampling of images containing non-neoplastic polyps to mitigate class imbalance ( $P_{non} \approx P_{neo}$ ).
Multi-scale input training (448x448, 352x352, 256x256).
On-the-fly data augmentation (rotate, horizontal/vertical flip, motion blur, color jittering) with 0.7 probability.
Pretraining on existing polyp segmentation datasets (Kvasir-SEG and CVC-ClinicDB).

Experimental Results:

Experiments comparing NeoUNet against U-Net [ronneberger2015u], PraNet [fan2020pranet], and HarDNet-MSEG [huang2021hardnet] on the NeoPolyp-Clean dataset (for PSND) show that NeoUNet significantly outperforms baselines in Dice and IoU metrics for overall segmentation ( $\text{Dice}_{seg}$ , $\text{IoU}_{seg}$ ) and class-specific segmentation ( $\text{Dice}_{non}$ , $\text{IoU}_{non}$ , $\text{Dice}_{neo}$ , $\text{IoU}_{neo}$ ). For example, NeoUNet achieved $\text{Dice}_{seg}$ of 0.911 and $\text{Dice}_{neo}$ of 0.889, compared to PraNet's 0.895 and 0.873, respectively. Performance on the non-neoplastic class remains lower across all models, attributed to dataset imbalance despite oversampling and Focal Loss.

Inference speed on a Tesla V100 GPU showed NeoUNet running at 68.3 FPS, slower than HarDNet-MSEG (77.1 FPS) but faster than PraNet (55.6 FPS), offering a good balance of speed and accuracy.

A comparison training NeoUNet on NeoPolyp (with "unknown" labels) vs. NeoPolyp-Clean (without "unknown" labels) showed a slight improvement in all metrics when training on the full NeoPolyp dataset. This suggests that utilizing the "unknown" data via the secondary segmentation loss helps the model learn better general segmentation features which, in turn, benefit classification.

Practical Applications & Implementation Considerations:

Real-time CAD systems: NeoUNet's inference speed is adequate for real-time application in colonoscopy, providing simultaneous segmentation and neoplasm risk indication. This could significantly aid endoscopists, potentially reducing miss rates and improving decision-making during procedures.
Clinical Workflow Integration: The model outputs a pixel-wise map with distinct labels for non-neoplastic and neoplastic regions. This output format can be directly overlaid onto the live video feed, highlighting suspicious areas to the physician.
Data Annotation Efficiency: The hybrid loss and handling of "unknown" labels allow the use of partially labeled data. Images where neoplasm classification is ambiguous can still contribute to training the segmentation component, potentially reducing the need for highly detailed, fully classified annotations for every image.
Computational Resources: Training requires a GPU (tested on GTX 3090). Inference is relatively fast on modern GPUs like the Tesla V100 or GTX 3090, making it feasible for deployment on dedicated hardware in clinical settings or potentially edge devices optimized for inference.
Limitations: The model still struggles with accurately classifying non-neoplastic polyps, likely due to inherent visual similarity with neoplastic polyps and data imbalance. Further work is needed to improve fine-grained classification accuracy.

In summary, NeoUNet provides a practical approach to the PSND problem by extending UNet with efficient components and a tailored loss function that effectively leverages partially labeled data, offering a significant improvement in simultaneous polyp segmentation and neoplasm detection accuracy suitable for real-world clinical assistance.

PDF Markdown

NeoUNet: Towards accurate colon polyp segmentation and neoplasm detection (2107.05023v1)

Summary

Related Papers