TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation (1801.05746v1)

Published 17 Jan 2018 in cs.CV

Abstract: Pixel-wise image segmentation is demanding task in computer vision. Classical U-Net architectures composed of encoders and decoders are very popular for segmentation of medical images, satellite images etc. Typically, neural network initialized with weights from a network pre-trained on a large data set like ImageNet shows better performance than those trained from scratch on a small dataset. In some practical applications, particularly in medicine and traffic safety, the accuracy of the models is of utmost importance. In this paper, we demonstrate how the U-Net type architecture can be improved by the use of the pre-trained encoder. Our code and corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNet. We compare three weight initialization schemes: LeCun uniform, the encoder with weights from VGG11 and full network trained on the Carvana dataset. This network architecture was a part of the winning solution (1st out of 735) in the Kaggle: Carvana Image Masking Challenge.

Citations (580)

View on Semantic Scholar

Summary

The paper demonstrates a significant improvement in segmentation accuracy by integrating an ImageNet-pretrained VGG11 encoder into the U-Net framework.
It compares three weight initialization schemes, showing that transfer learning boosts IoU from 0.593 to nearly 0.687 on urban aerial images.
The approach offers practical benefits for domains with limited annotated data, such as medical diagnostics and autonomous driving.

TernausNet: U-Net with VGG11 Encoder for Image Segmentation

The paper "TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation" by Vladimir Iglovikov and Alexey Shvets presents an enhancement to the U-Net architecture for image segmentation tasks. This paper targets the improvement of segmentation accuracy by leveraging a pretrained VGG11 encoder from ImageNet, aiming to address challenges in domains requiring high precision, such as medical imaging and autonomous driving.

Overview of the Model

The core innovation of this work is the integration of a VGG11 network, pretrained on the ImageNet dataset, as the encoder within the U-Net framework. The U-Net architecture, renowned for its success in pixel-wise image segmentation, is modified by incorporating VGG11, known for its capacity to extract hierarchical features efficiently.

Methodology and Experimental Design

The paper explores three distinct weight initialization schemes:

LeCun Uniform Initialization: This serves as a baseline model without pretrained weights.
VGG11 Pre-trained on ImageNet: Only the encoder utilizes pretrained weights.
Fully Pre-trained Network on Carvana Dataset: Both encoder and decoder are pretrained.

The experiments were conducted on the Inria Aerial Image Labeling Dataset, emphasizing urban area segmentation. A total of 150 training images were used, with a validation strategy involving 30 images from varied urban environments. The Jaccard index (IoU) served as the primary evaluation metric.

Results

The results demonstrated clear advantages of employing a pretrained encoder. While the baseline model achieved an IoU of 0.593, incorporating the VGG11-pretrained encoder improved the IoU to 0.686. Moreover, the fully pretrained model on the Carvana dataset achieved a slightly higher IoU of 0.687. These findings underscore the efficacy of transfer learning in enhancing model performance and convergence speed.

Implications and Future Directions

The implications of this research are significant for domains where data annotation is laborious and datasets are limited, such as medical diagnostics. The incorporation of pretrained models can enhance performance and reduce training times, mitigating risks of overfitting.

Going forward, this methodology invites further exploration with more sophisticated encoders. Integrating networks such as VGG16 or deeper ResNet architectures could potentially yield additional improvements. The paper also suggests the utility of fine-tuning techniques for tasks beyond image classification, advocating for broader adoption in segmentation challenges.

Conclusion

The TernausNet approach exemplifies an effective strategy for improving segmentation tasks by combining the standard U-Net architecture with a pretrained VGG11 encoder. This method not only achieves superior segmentation accuracy but also emphasizes the practical benefits of utilizing pretrained models in scenarios with constrained data. The availability of the authors' code as an open-source resource further enhances the accessibility for ongoing development and application in various computer vision domains.

PDF Markdown

Related Papers

GitHub

GitHub - ternaus/TernausNet: UNet model with VGG11 encoder pre-trained on Kaggle Carvana dataset (1,050 stars)

Tweets

https://twitter.com/alxndrkalinin/status/954044375230795776

https://twitter.com/shvetsiya/status/963192547052064768

https://twitter.com/vandotorres/status/1248691849474252800

https://twitter.com/dnlcrl/status/960395360467406848