BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation (2001.00309v3)

Published 2 Jan 2020 in cs.CV

Abstract: Instance segmentation is one of the fundamental vision tasks. Recently, fully convolutional instance segmentation methods have drawn much attention as they are often simpler and more efficient than two-stage approaches like Mask R-CNN. To date, almost all such approaches fall behind the two-stage Mask R-CNN method in mask precision when models have similar computation complexity, leaving great room for improvement. In this work, we achieve improved mask prediction by effectively combining instance-level information with semantic information with lower-level fine-granularity. Our main contribution is a blender module which draws inspiration from both top-down and bottom-up instance segmentation approaches. The proposed BlendMask can effectively predict dense per-pixel position-sensitive instance features with very few channels, and learn attention maps for each instance with merely one convolution layer, thus being fast in inference. BlendMask can be easily incorporated with the state-of-the-art one-stage detection frameworks and outperforms Mask R-CNN under the same training schedule while being 20% faster. A light-weight version of BlendMask achieves $ 34.2% $ mAP at 25 FPS evaluated on a single 1080Ti GPU card. Because of its simplicity and efficacy, we hope that our BlendMask could serve as a simple yet strong baseline for a wide range of instance-wise prediction tasks. Code is available at https://git.io/AdelaiDet

PDF Abstract

Analysis of BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

The paper "BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation" introduces an innovative approach to instance segmentation by combining top-down and bottom-up methodologies. The authors aim to improve mask precision while maintaining efficiency, addressing the shortcomings of existing methods particularly in relation to computational overhead and mask quality.

Key Contributions

The principal contribution of this work is the development of the BlendMask framework, which effectively integrates top-level instance information with lower-level semantic data for enhanced mask prediction. The innovative blender module is designed to predict detailed per-pixel position-sensitive instance features using minimal channels. This modular approach facilitates rapid inference and superior performance compared to conventional methods like Mask R-CNN.

Methodology

BlendMask is built on a one-stage detection framework, utilizing a blender module that merges semantic feature maps (bases) with instance-aware attentions to generate high-quality masks. The crucial components are:

Bottom Module: Produces score maps known as bases from either backbone or FPN features, ensuring alignment and coherence with lower-level data.
Top Layer: Predicts attentions using a single convolution layer, mapping instance-level information to the bases.
Blender Module: Combines bases with attentions through region cropping and linear combination, forming the final mask logits.

Experimental Results

BlendMask achieves substantial improvements over Mask R-CNN, both in speed and accuracy. Specific numerical results include:

A lightweight version achieving 34.2% mAP at 25 FPS on a 1080Ti GPU.
Standard settings surpassing Mask R-CNN by a notable margin with a 20% faster inference time.
Enhanced mask resolution capabilities leading to more precise edge definitions.

Practical and Theoretical Implications

The BlendMask framework provides a balanced approach to instance segmentation, effectively handling overlapping and complex scenes while maintaining computational efficiency. The method’s scalable architecture signifies its potential as a robust baseline for various instance-level prediction tasks, including panoptic segmentation.

On the theoretical side, BlendMask highlights the effectiveness of leveraging detailed instance-level attention maps in conjunction with position-sensitive features, challenging prevailing paradigms that favor either top-down or bottom-up approaches in isolation. The work opens pathways for future explorations in integrating multi-level semantic information within mask prediction models.

Future Directions

The research suggests multiple avenues for development:

Exploring deeper integration with diverse detection frameworks could enhance the versatility and adoption of BlendMask.
Expanding the framework for other instance-specific predictions such as keypoint detection and depth estimation.
Investigating alternate architectures for the bottom module may yield further performance gains.
A comprehensive analysis of the role of attention map resolution and alignment techniques in performance optimization.

Overall, BlendMask represents a significant step forward in the pursuit of efficient and effective instance segmentation, setting a noteworthy precedent for future advancements in the field.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Hao Chen (1006 papers)
Kunyang Sun (14 papers)
Zhi Tian (68 papers)
Chunhua Shen (404 papers)
Yongming Huang (98 papers)
Youliang Yan (31 papers)

Citations (459)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos