HAISTA-NET: Human Assisted Instance Segmentation Through Attention (2305.03105v3)
Abstract: Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.
- Efficient interactive annotation of segmentation datasets with polygon-rnn++. In CVPR, 2018.
- Fluid annotation: a human-machine collaboration interface for full image annotation. In ACM Multimedia, 2018.
- Interactive video object segmentation in the wild. arXiv preprint arXiv:1801.00269, 2017.
- Large-scale interactive object segmentation with human annotators. In CVPR, 2019.
- Multiple regression in practice. Sage, 1985.
- Yolact: Real-time instance segmentation. In ICCV, 2019.
- End-to-end object detection with transformers. In ECCV, 2020.
- Masklab: Instance segmentation by refining object detection with semantic and direction features. In CVPR, 2018.
- Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 1973.
- Simple training strategies and model scaling for object detection. arXiv preprint arXiv:2107.00057, 2021.
- Instaboost: Boosting instance segmentation via probability map guided copy-pasting. In ICCV, 2019.
- Instances as queries. In ICCV, 2021.
- Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR, 2021.
- Lvis: A dataset for large vocabulary instance segmentation. In CVPR, 2019.
- Transformer in transformer. In NeurIPS, 2021.
- Mask r-cnn. In ICCV, 2017.
- Deep residual learning for image recognition. In CVPR, 2016.
- Pointrend: Image segmentation as rendering. In CVPR, 2020.
- Lazy snapping. ACM Transactions on Graphics (ToG), 2004.
- Interactive image segmentation with latent diversity. In CVPR, 2018.
- Regional interactive image segmentation networks. In ICCV, 2017.
- Multiseg: Semantically meaningful, scale-diverse segmentations from minimal user input. In ICCV, 2019.
- Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In CVPR, 2016.
- Feature pyramid networks for object detection. In CVPR, 2017.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Sketch-based modeling: A survey. Computers & Graphics, 2009.
- Extreme clicking for efficient object annotation. In ICCV, 2017.
- Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In CVPR, 2021.
- Deep learning for medical image processing: Overview, challenges and the future. Classification in BioApps: Automation of Decision Making, 2018.
- Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
- Analysis of variance (anova). Chemometrics and Intelligent Laboratory Systems, 1989.
- Look closer to segment better: Boundary patch refinement for instance segmentation. In CVPR, 2021.
- Boxinst: High-performance instance segmentation with box annotations. In CVPR, 2021.
- Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Transactions on Medical Imaging, 2018.
- Deepigeos: a deep interactive geodesic framework for medical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
- Solov2: Dynamic and fast instance segmentation. In NeurIPS, 2020.
- Aggregated residual transformations for deep neural networks. In CVPR, 2017.
- Deep interactive object selection. In CVPR, 2016.