Fully Convolutional Instance-aware Semantic Segmentation (1611.07709v2)

Published 23 Nov 2016 in cs.CV

Abstract: We present the first fully convolutional end-to-end solution for instance-aware semantic segmentation task. It inherits all the merits of FCNs for semantic segmentation and instance mask proposal. It performs instance mask prediction and classification jointly. The underlying convolutional representation is fully shared between the two sub-tasks, as well as between all regions of interest. The proposed network is highly integrated and achieves state-of-the-art performance in both accuracy and efficiency. It wins the COCO 2016 segmentation competition by a large margin. Code would be released at \url{https://github.com/daijifeng001/TA-FCN}.

Authors (5)

Yi Li (482 papers)
Haozhi Qi (22 papers)
Jifeng Dai (131 papers)
Xiangyang Ji (159 papers)
Yichen Wei (47 papers)

Citations (982)

View on Semantic Scholar

Summary

Fully Convolutional Instance-aware Semantic Segmentation: An Analytical Summary

Introduction

The paper "Fully Convolutional Instance-aware Semantic Segmentation" by Yi Li et al. presents a novel methodology for addressing the instance-aware semantic segmentation task using a fully convolutional network (FCN). The authors propose extending the conventional FCN framework to include position-sensitive score maps, thereby enabling joint and simultaneous object detection and segmentation within an efficient and integrated network architecture. The proposed method, dubbed Fully Convolutional Instance-aware Semantic Segmentation (FCIS), achieves state-of-the-art performance in both accuracy and efficiency, evidenced by its win in the COCO 2016 segmentation competition.

Technical Approach

The core innovation of FCIS lies in its unique use of position-sensitive score maps. These maps introduce a translation-variant property to handle the challenges of instance-aware segmentation, a problem that conventional FCNs, with their translation-invariant nature, are inadequate to address. Specifically, FCIS employs $k \times k$ position-sensitive score maps for encoding relative positions of object parts within a region of interest (ROI). Each score map, covering the entire image at a lower resolution, represents the likelihood of a pixel belonging to an object instance at specific relative positions, thereby allowing differentiated responses for the same pixel in different instances.

Joint Mask Prediction and Classification

The FCIS architecture integrates object segmentation and detection into a single framework through a joint formulation. This formulation fuses detection and segmentation tasks, leveraging the strong correlation between them to predict bounded object instances and their corresponding class probabilities. The network operates on box proposals rather than sliding windows, benefiting from recent advances in object detection. By constructing inside/outside score maps and performing operations such as softmax and max pooling, FCIS ensures efficient per-ROI computations without additional parameters.

Experimental Results

COCO Dataset

Experiments on the COCO dataset showcase the superior performance of FCIS compared to prior methods. The network achieves significantly higher accuracy than the previous state-of-the-art method, MNC, particularly for larger objects, reflecting its ability to capture fine spatial details effectively. Specifically, FCIS achieves an impressive mAP score of 29.2% without OHEM and 29.6% with OHEM on the COCO test-dev set, indicating its robustness and efficiency.

PASCAL VOC Dataset

Ablation studies on the PASCAL VOC dataset further validate the effectiveness of the position-sensitive score maps and the joint formulation for the segmentation and detection tasks. Comparisons with baseline methods such as na\"ive MNC and the combination of InstFCN and R-FCN demonstrate the critical contributions of the novel components in FCIS. The results indicate substantial improvements in mAP scores, underscoring the importance of translation-variant properties and the integrated network design.

Implications and Future Prospects

The FCIS framework sets a new benchmark for instance-aware semantic segmentation, demonstrating how fully convolutional architectures can be leveraged for complex, multi-component tasks. The proposed method not only enhances accuracy but also significantly reduces inference time, marking a crucial step towards real-time instance segmentation applications.

From a theoretical perspective, the utilization of position-sensitive score maps opens new avenues for exploring translation-variant properties in convolutional networks. Future research could extend this approach to other computer vision tasks that similarly benefit from translation variance.

Practically, the implications of FCIS extend to various domains, including autonomous driving, robotics, and medical imaging, where accurate and efficient instance-aware segmentation is vital. Continued improvements in network architectures and training methodologies, as prompted by the success of FCIS, will likely drive further advancements in these fields.

Conclusion

In summary, the "Fully Convolutional Instance-aware Semantic Segmentation" paper introduces a sophisticated and efficient approach to instance segmentation, leveraging position-sensitive score maps and a joint segmentation-detection formulation. The method achieves remarkable accuracy and efficiency, leading to its top performance in the COCO 2016 segmentation competition. This research not only advances the field of semantic segmentation but also provides a foundation for future innovations in convolutional network design and application. For further details and implementation, the authors have made their code available at GitHub.

PDF Markdown

Related Papers

GitHub

GitHub - msracver/FCIS: Fully Convolutional Instance-aware Semantic Segmentation (1,564 stars)

Tweets

https://twitter.com/DLdotHub/status/862725889775337472

https://twitter.com/gigasquid/status/862798740154974208

https://twitter.com/chrieke/status/862842670921789440

https://twitter.com/reddit_ml/status/862713927007948802

https://twitter.com/TrendingGithub/status/862977501349412864

https://twitter.com/ericness/status/862799176836554752