RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features (2104.08569v1)

Published 17 Apr 2021 in cs.CV

Abstract: The two-stage methods for instance segmentation, e.g. Mask R-CNN, have achieved excellent performance recently. However, the segmented masks are still very coarse due to the downsampling operations in both the feature pyramid and the instance-wise pooling process, especially for large objects. In this work, we propose a new method called RefineMask for high-quality instance segmentation of objects and scenes, which incorporates fine-grained features during the instance-wise segmenting process in a multi-stage manner. Through fusing more detailed information stage by stage, RefineMask is able to refine high-quality masks consistently. RefineMask succeeds in segmenting hard cases such as bent parts of objects that are over-smoothed by most previous methods and outputs accurate boundaries. Without bells and whistles, RefineMask yields significant gains of 2.6, 3.4, 3.8 AP over Mask R-CNN on COCO, LVIS, and Cityscapes benchmarks respectively at a small amount of additional computational cost. Furthermore, our single-model result outperforms the winner of the LVIS Challenge 2020 by 1.3 points on the LVIS test-dev set and establishes a new state-of-the-art. Code will be available at https://github.com/zhanggang001/RefineMask.

Authors (7)

Gang Zhang (139 papers)
Xin Lu (165 papers)
Jingru Tan (11 papers)
Jianmin Li (43 papers)
Zhaoxiang Zhang (162 papers)
Quanquan Li (18 papers)
Xiaolin Hu (97 papers)

Citations (85)

View on Semantic Scholar

Summary

RefineMask: Advancements in Instance Segmentation via Fine-Grained Features

The research presented in the paper "RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features" addresses limitations in existing two-stage instance segmentation methods such as Mask R-CNN. These methods, though effective, often produce coarse segmented masks due to the inherent downsampling processes utilized in the feature pyramid network and instance-wise pooling operations, especially when dealing with large object instances. The proposed method, RefineMask, innovatively incorporates a multi-stage approach to integrate fine-grained features into the instance segmentation task, significantly enhancing the capability to generate high-quality, accurate masks even at object boundaries.

RefineMask builds upon the prevalent two-stage instance segmentation framework but introduces a novel semantic head module attached to the existing feature pyramid. This semantic head operates on high-resolution features and creates what the authors refer to as fine-grained features. These features serve to augment the lost detail through a sequence of refinement stages within the mask head. By leveraging these meticulously integrated fine-grained features iteratively, RefineMask is designed to maintain the strengths of current methods for instance distinction while recovering critical details for precise boundary delineation.

The paper presents compelling empirical results demonstrating RefineMask's enhanced performance across several benchmarks, including COCO, LVIS, and Cityscapes datasets. Notably, RefineMask achieves improvements of 2.6, 3.4, and 3.8 points in Average Precision (AP) over Mask R-CNN on the respective datasets. Furthermore, RefineMask's performance surpasses the winner of the LVIS Challenge 2020 on the test-dev set by 1.3 points. These gains are achieved with a modest increase in computational cost, highlighting the effectiveness and efficiency of the method.

The paper also introduces a boundary-aware refinement mechanism that further distinguishes RefineMask from its predecessors. By explicitly focusing on boundary regions, the framework facilitates more precise boundary predictions, addressing a common shortfall in earlier two-stage methods. Moreover, the implementation details reveal a nuanced approach to balancing the computational overhead while achieving these segmentation quality improvements. The authors incorporate novel architectural modules like the Semantic Fusion Module (SFM), which integrates multi-resolution features to improve model performance.

The implications of RefineMask are multifaceted. Practically, the method promises improvements in applications requiring detailed mask predictions, such as autonomous driving, robotic vision, and medical imaging. Theoretically, RefineMask introduces a framework that can potentially be adapted and extended in future research to explore more refined integration of multi-scale and multi-resolution features in deep learning architectures.

The evaluation uses rigorous metrics, including not only the traditional AP but also AP $^\star$ , which takes advantage of higher quality annotations from the LVIS dataset to test the fine segmentation granularity. Additionally, the boundary-aware refinement approach provides a framework that can be adapted into other segmentation purposes where precision around object edges is particularly critical.

As a future research trajectory, further exploration could pursue optimizing the computational trade-offs for deployment in resource-constrained environments, such as mobile devices. Additionally, extending the multi-stage integration of features could enhance segmentation performance in more complex scenes involving occluded or densely packed objects.

In conclusion, RefineMask significantly advances the capability for high-quality instance segmentation by innovatively integrating fine-grained semantic features and executing multi-stage refinement to produce detailed and accurate segmentation masks. This paper sets a benchmark for future research and application development in instance segmentation tasks.

Related Papers

Mask R-CNN (2017)
Boundary-preserving Mask R-CNN (2020)
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features (2017)
Mask Scoring R-CNN (2019)
Instance-aware Semantic Segmentation via Multi-task Network Cascades (2015)

Find Related Papers

GitHub

GitHub - zhanggang001/RefineMask: RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features (CVPR 2021) (212 stars)

Tweets

https://twitter.com/billore_arpit/status/1482904118444572676