Object Contour Detection with a Fully Convolutional Encoder-Decoder Network (1603.04530v1)

Published 15 Mar 2016 in cs.CV and cs.LG

Abstract: We develop a deep learning algorithm for contour detection with a fully convolutional encoder-decoder network. Different from previous low-level edge detection, our algorithm focuses on detecting higher-level object contours. Our network is trained end-to-end on PASCAL VOC with refined ground truth from inaccurate polygon annotations, yielding much higher precision in object contour detection than previous methods. We find that the learned model generalizes well to unseen object classes from the same super-categories on MS COCO and can match state-of-the-art edge detection on BSDS500 with fine-tuning. By combining with the multiscale combinatorial grouping algorithm, our method can generate high-quality segmented object proposals, which significantly advance the state-of-the-art on PASCAL VOC (improving average recall from 0.62 to 0.67) with a relatively small amount of candidates ($\sim$1660 per image).

Citations (356)

View on Semantic Scholar

Summary

The paper introduces a fully convolutional encoder-decoder network leveraging a pre-trained VGG-16 to enhance object contour detection accuracy.
It refines ground truth contour annotations using a dense CRF, effectively aligning noisy labels with true image boundaries.
The network generalizes to unseen classes and improves object proposal generation by achieving state-of-the-art F-scores and average recall metrics.

Object Contour Detection with a Fully Convolutional Encoder-Decoder Network

The paper "Object Contour Detection with a Fully Convolutional Encoder-Decoder Network" presents a deep learning approach for object contour detection using a fully convolutional encoder-decoder network (CEDN). This method diverges from traditional approaches that focus on low-level edge detection by emphasizing the identification of high-level object contours, thus enhancing precision compared to previous methods.

Summary of Contributions

Network Architecture and Methodology: The authors introduce a fully convolutional encoder-decoder network, leveraging the pre-trained VGG-16 model for the encoder, which is adapted to preserve its generalization capability. The decoder is built with unpooling and convolution layers, allowing it to upscale feature maps while utilizing pool switches to maintain dense predictions. This network configuration contributes to achieving fine object contour delineation.
Ground Truth Refinement: Recognizing the challenges inherent in contour annotations due to inaccuracies, the researchers refine the ground truth contour annotations via a dense Conditional Random Field (CRF) approach. This step is crucial for aligning annotations with true image boundaries, thereby improving the training quality of the neural network.
Generalization and Fine-Tuning Capability: The model demonstrates substantial generalization to unseen object classes from related super-categories, suggesting robust adaptability beyond its training set. Further, by fine-tuning the pre-trained model on datasets such as BSDS500, which encompass natural edges beyond just object contours, the network achieves state-of-the-art performance similar to cutting-edge natural edge detection methods like HED.
Implications for Object Proposal Generation: By integrating the CEDN-generated contours with multiscale combinatorial grouping (MCG), the paper reports improvements in the generation of object proposals, evidenced by a significant increase in average recall and reduction in computational demand compared to previous methodologies.

Findings and Results

The trained CEDN model achieves high precision in contour detection on datasets like PASCAL VOC, with a noted F-score improvement to 0.57. This reflects robust object contour delineation results.
The generalization ability is noted in testing on datasets such as BSDS500 and MS COCO, where performance on unseen classes is supported by the network's pre-trained encoder.
The application in generating high-quality object proposals marks a significant boost with an average recall of 0.67 on PASCAL VOC with a reduced number of proposals, thus enhancing efficiency and effectiveness compared to previous methods.

Implications and Future Direction

This research advances the field of computer vision by demonstrating the efficacy of deep networks in high-level perception tasks like object contour detection, moving beyond mere edge detection. Notably, it provides a foundation for enhancements in segmented object proposal techniques, which are crucial for applications requiring object recognition and segmentation.

Speculatively, future developments might investigate methodologies capable of training on larger, more complex datasets like MS COCO, despite annotation challenges. Moreover, advancements in semi-supervised learning may augment training from datasets with noisy annotations, thereby broadening the scope and applicability of such models across diverse categories of objects.

In summary, this paper aptly showcases how deep learning, specifically through a well-structured encoder-decoder network, significantly propels the capabilities of object contour detection and its application in generating accurate object proposals, thereby enriching the toolkit available to researchers and practitioners in computer vision.

PDF Markdown