Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HAISTA-NET: Human Assisted Instance Segmentation Through Attention (2305.03105v3)

Published 4 May 2023 in cs.CV and cs.AI

Abstract: Instance segmentation is a form of image detection which has a range of applications, such as object refinement, medical image analysis, and image/video editing, all of which demand a high degree of accuracy. However, this precision is often beyond the reach of what even state-of-the-art, fully automated instance segmentation algorithms can deliver. The performance gap becomes particularly prohibitive for small and complex objects. Practitioners typically resort to fully manual annotation, which can be a laborious process. In order to overcome this problem, we propose a novel approach to enable more precise predictions and generate higher-quality segmentation masks for high-curvature, complex and small-scale objects. Our human-assisted segmentation model, HAISTA-NET, augments the existing Strong Mask R-CNN network to incorporate human-specified partial boundaries. We also present a dataset of hand-drawn partial object boundaries, which we refer to as human attention maps. In addition, the Partial Sketch Object Boundaries (PSOB) dataset contains hand-drawn partial object boundaries which represent curvatures of an object's ground truth mask with several pixels. Through extensive evaluation using the PSOB dataset, we show that HAISTA-NET outperforms state-of-the art methods such as Mask R-CNN, Strong Mask R-CNN, and Mask2Former, achieving respective increases of +36.7, +29.6, and +26.5 points in AP-Mask metrics for these three models. We hope that our novel approach will set a baseline for future human-aided deep learning models by combining fully automated and interactive instance segmentation architectures.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Efficient interactive annotation of segmentation datasets with polygon-rnn++. In CVPR, 2018.
  2. Fluid annotation: a human-machine collaboration interface for full image annotation. In ACM Multimedia, 2018.
  3. Interactive video object segmentation in the wild. arXiv preprint arXiv:1801.00269, 2017.
  4. Large-scale interactive object segmentation with human annotators. In CVPR, 2019.
  5. Multiple regression in practice. Sage, 1985.
  6. Yolact: Real-time instance segmentation. In ICCV, 2019.
  7. End-to-end object detection with transformers. In ECCV, 2020.
  8. Masklab: Instance segmentation by refining object detection with semantic and direction features. In CVPR, 2018.
  9. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  11. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 1973.
  12. Simple training strategies and model scaling for object detection. arXiv preprint arXiv:2107.00057, 2021.
  13. Instaboost: Boosting instance segmentation via probability map guided copy-pasting. In ICCV, 2019.
  14. Instances as queries. In ICCV, 2021.
  15. Simple copy-paste is a strong data augmentation method for instance segmentation. In CVPR, 2021.
  16. Lvis: A dataset for large vocabulary instance segmentation. In CVPR, 2019.
  17. Transformer in transformer. In NeurIPS, 2021.
  18. Mask r-cnn. In ICCV, 2017.
  19. Deep residual learning for image recognition. In CVPR, 2016.
  20. Pointrend: Image segmentation as rendering. In CVPR, 2020.
  21. Lazy snapping. ACM Transactions on Graphics (ToG), 2004.
  22. Interactive image segmentation with latent diversity. In CVPR, 2018.
  23. Regional interactive image segmentation networks. In ICCV, 2017.
  24. Multiseg: Semantically meaningful, scale-diverse segmentations from minimal user input. In ICCV, 2019.
  25. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In CVPR, 2016.
  26. Feature pyramid networks for object detection. In CVPR, 2017.
  27. Microsoft coco: Common objects in context. In ECCV, 2014.
  28. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  29. Sketch-based modeling: A survey. Computers & Graphics, 2009.
  30. Extreme clicking for efficient object annotation. In ICCV, 2017.
  31. Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution. In CVPR, 2021.
  32. Deep learning for medical image processing: Overview, challenges and the future. Classification in BioApps: Automation of Decision Making, 2018.
  33. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
  34. Analysis of variance (anova). Chemometrics and Intelligent Laboratory Systems, 1989.
  35. Look closer to segment better: Boundary patch refinement for instance segmentation. In CVPR, 2021.
  36. Boxinst: High-performance instance segmentation with box annotations. In CVPR, 2021.
  37. Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Transactions on Medical Imaging, 2018.
  38. Deepigeos: a deep interactive geodesic framework for medical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018.
  39. Solov2: Dynamic and fast instance segmentation. In NeurIPS, 2020.
  40. Aggregated residual transformations for deep neural networks. In CVPR, 2017.
  41. Deep interactive object selection. In CVPR, 2016.
Citations (1)

Summary

We haven't generated a summary for this paper yet.