Papers
Topics
Authors
Recent
Search
2000 character limit reached

SqueezeSAM: User friendly mobile interactive segmentation

Published 11 Dec 2023 in cs.CV | (2312.06736v3)

Abstract: The Segment Anything Model (SAM) has been a cornerstone in the field of interactive segmentation, propelling significant progress in generative AI, computational photography, and medical imaging. Despite its ability to process arbitrary user input and generate corresponding segmentation masks, SAM's 600 million parameter architecture, based on ViT-H, is not compatible with current mobile hardware due to its high computational demands and large model size. Our research aims to adapt SAM for use in mobile photography applications. To this end, we have developed a fully convolutional SqueezeSAM model architecture, which is 62.5 times faster and 31.6 times smaller than the original SAM, making it a viable solution for mobile applications. Furthermore, our tiny model achieves an mIOU within 1% of the original VIT-H architecture. Automated segmentation holds significant value in the creation flow for photography applications, as evidenced by its adoption by leading industry players like apple and capcut. To facilitate this automation, we employ salient object detection and simulate potential user clicks for foreground object selection, generating an initial segmentation mask that users can subsequently edit interactively. A common user expectation is that a click on a specific part of an object will result in the segmentation of the entire object. For example, a click on a person's t-shirt in a photo should ideally segment the entire person, not just the t-shirt. However, SAM typically only segments the clicked area. We address this limitation through a novel data augmentation scheme. Consequently, if a user clicks on a person holding a basketball, both the person and the basketball are segmented together, aligning with user expectations and enhancing the overall user experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. NanoSAM. https://github.com/NVIDIA-AI-IOT/nanosam. Accessed: 2023-11-05.
  2. Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, pages 105–112 vol.1, 2001.
  3. Efficientvit: Multi-scale linear attention for high-resolution dense prediction, 2023.
  4. Focalclick: Towards practical interactive image segmentation, 2022.
  5. Sam on medical images: A comprehensive study on three prompt modes, 2023.
  6. Adaptive low rank adaptation of segment anything to salient object detection, 2023.
  7. R³net: Recurrent residual refinement network for saliency detection. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 684–690. International Joint Conferences on Artificial Intelligence Organization, 2018.
  8. An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
  9. NASVit: Neural architecture search for efficient vision transformers with gradient conflict aware supernet training. In International Conference on Learning Representations, 2022.
  10. Geodesic star convexity for interactive image segmentation. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3129–3136, 2010.
  11. Lvis: A dataset for large vocabulary instance segmentation, 2019.
  12. Deep residual learning for image recognition, 2015.
  13. YOLO v8, 2023.
  14. Snakes: Active contour models. International Journal of Computer Vision, 1988.
  15. Segment anything. In ICCV, 2023.
  16. Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper, 2018.
  17. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1097–1105. 2012.
  18. The secrets of salient object segmentation, 2014.
  19. Exploring plain vision transformer backbones for object detection. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part IX, page 280–296, Berlin, Heidelberg, 2022. Springer-Verlag.
  20. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds, 2023.
  21. Microsoft coco: Common objects in context, 2015.
  22. Feature pyramid networks for object detection, 2016.
  23. Simpleclick: Interactive image segmentation with simple vision transformers, 2023a.
  24. Any-to-any style transfer: Making picasso and da vinci collaborate, 2023b.
  25. A convnet for the 2020s, 2022.
  26. Fully convolutional networks for semantic segmentation, 2015.
  27. Segment anything in medical images, 2023.
  28. Design and perceptual validation of performance measures for salient object segmentation. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops, pages 49–56, 2010.
  29. Samstyler: Enhancing visual creativity with neural style transfer and segment anything model (sam). IEEE Access, 11:100256–100267, 2023.
  30. U22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT-net: Going deeper with nested u-structure for salient object detection. Pattern Recognition, 106:107404, 2020.
  31. U-net: Convolutional networks for biomedical image segmentation, 2015.
  32. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 13:146–165, 2004.
  33. Anything-3d: Towards single-view anything reconstruction in the wild, 2023.
  34. Zero123++: a single image to consistent multi-view diffusion base model, 2023.
  35. Reviving iterative training with mask guidance for interactive segmentation, 2021.
  36. Attention is all you need, 2017.
  37. Learning to detect salient objects with image-level supervision. In CVPR, 2017.
  38. Machine learning at facebook: Understanding inference at the edge. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 331–344, 2019.
  39. Cvpr 2023 text guided video editing competition, 2023a.
  40. Self-prompting large vision models for few-shot medical image segmentation, 2023b.
  41. Edit everything: A text-guided generative system for images editing, 2023.
  42. Deep interactive object selection, 2016.
  43. Hierarchical saliency detection. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1155–1162, 2013.
  44. Saliency detection via graph-based manifold ranking. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3166–3173. IEEE, 2013.
  45. Inpaint anything: Segment anything meets image inpainting, 2023.
  46. Towards high-resolution salient object detection. In The IEEE International Conference on Computer Vision (ICCV), 2019.
  47. Faster segment anything: Towards lightweight sam for mobile applications, 2023.
  48. Fast segment anything, 2023.
  49. Can sam segment polyps?, 2023.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.