Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CenterMask : Real-Time Anchor-Free Instance Segmentation (1911.06667v6)

Published 15 Nov 2019 in cs.CV

Abstract: We propose a simple yet efficient anchor-free instance segmentation, called CenterMask, that adds a novel spatial attention-guided mask (SAG-Mask) branch to anchor-free one stage object detector (FCOS) in the same vein with Mask R-CNN. Plugged into the FCOS object detector, the SAG-Mask branch predicts a segmentation mask on each box with the spatial attention map that helps to focus on informative pixels and suppress noise. We also present an improved backbone networks, VoVNetV2, with two effective strategies: (1) residual connection for alleviating the optimization problem of larger VoVNet \cite{lee2019energy} and (2) effective Squeeze-Excitation (eSE) dealing with the channel information loss problem of original SE. With SAG-Mask and VoVNetV2, we deign CenterMask and CenterMask-Lite that are targeted to large and small models, respectively. Using the same ResNet-101-FPN backbone, CenterMask achieves 38.3%, surpassing all previous state-of-the-art methods while at a much faster speed. CenterMask-Lite also outperforms the state-of-the-art by large margins at over 35fps on Titan Xp. We hope that CenterMask and VoVNetV2 can serve as a solid baseline of real-time instance segmentation and backbone network for various vision tasks, respectively. The Code is available at https://github.com/youngwanLEE/CenterMask.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Youngwan Lee (18 papers)
  2. Jongyoul Park (7 papers)
Citations (497)

Summary

CenterMask: Real-Time Anchor-Free Instance Segmentation

The research paper "CenterMask: Real-Time Anchor-Free Instance Segmentation" introduces a novel approach to instance segmentation through the development of the CenterMask model. This model integrates a spatial attention-guided mask (SAG-Mask) with an anchor-free object detection framework, consequently enhancing the speed and accuracy of segmentation tasks.

Overview and Methodology

The core innovation of CenterMask is the introduction of an anchor-free instance segmentation architecture, which is an enhancement over the FCOS object detector. By incorporating the SAG-Mask branch, the model effectively predicts segmentation masks while focusing on relevant pixels through spatial attention mechanisms, which suppress noise and enhance the precision of mask predictions.

Another significant contribution is the development of an improved backbone network, VoVNetV2. It leverages two key strategies: a residual connection to facilitate optimization in deeper networks and an effective Squeeze-Excitation (eSE) module to mitigate channel information loss. VoVNetV2, in its various configurations, underscores versatility in catering to models of different scales.

Results and Performance

CenterMask demonstrates substantial improvements in both accuracy and speed over existing methods. With the ResNet-101-FPN backbone, CenterMask achieves 38.3% AP, setting a new benchmark for real-time instance segmentation models, outperforming previous state-of-the-art approaches. CenterMask-Lite, specifically designed for smaller models, also delivers significant performance enhancements, maintaining over 35 FPS on Titan Xp while achieving competitive accuracy.

The empirical evaluations provide robust evidence that CenterMask, coupled with VoVNetV2, serves as a pragmatic and effective choice for real-time instance segmentation tasks. The improvements are consistent across different metrics, including AP\textsubscript{S}, AP\textsubscript{M}, and AP\textsubscript{L}, denoting small, medium, and large object performance, respectively.

Implications and Future Directions

This paper provides a compelling case for moving towards anchor-free instance segmentation models, presenting a significant step forward in balancing computational efficiency with predictive accuracy. From a theoretical standpoint, the attention mechanism and improved backbone architectures offer a pathway to refining deep learning model architectures more generally.

Looking ahead, further exploration into adaptive feature map utilizations and enhanced attention mechanisms can potentially unlock even greater efficiencies and capabilities, maintaining the momentum towards ever-more real-time, high-accuracy vision systems. The strides made in VoVNetV2 might also inspire additional innovations in backbone architectures across various vision tasks beyond instance segmentation.

Overall, the methodologies and results presented in this paper are likely to stimulate future research and application development in real-time computer vision, particularly in domains where computational resources are at a premium.

Github Logo Streamline Icon: https://streamlinehq.com