Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task-Specific Adaptation of Segmentation Foundation Model via Prompt Learning (2403.09199v2)

Published 14 Mar 2024 in cs.CV and cs.AI

Abstract: Recently, foundation models trained on massive datasets to adapt to a wide range of tasks have attracted considerable attention and are actively being explored within the computer vision community. Among these, the Segment Anything Model (SAM) stands out for its remarkable progress in generalizability and flexibility for image segmentation tasks, achieved through prompt-based object mask generation. However, despite its strength, SAM faces two key limitations when applied to instance segmentation that segments specific objects or those in unique environments (e.g., task-specific adaptation for out-of-distribution objects) not typically present in the training data: 1) the ambiguity inherent in input prompts and 2) the necessity for extensive additional training to achieve optimal segmentation. To address these challenges, we propose a task-specific adaptation (i.e., customization) of the segmentation foundation model via prompt learning tailored to SAM. Our method involves a prompt learning module (PLM), which adjusts input prompts into the embedding space to better align with peculiarities of the target task, thereby enabling more efficient training. Furthermore, we introduce a point matching module (PMM) to enhance the feature representation for finer segmentation by ensuring detailed alignment with ground truth boundaries. Experimental results on various customized segmentation scenarios demonstrate the effectiveness of the proposed method.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Kaggle car license plate detection. https://www.kaggle.com/datasets/andrewmvd/car-plate-detection. Accessed: 2024-3-7.
  2. Layer normalization. arXiv preprint arXiv:1607.06450, 2016.
  3. Bidirectional copy-paste for semi-supervised medical image segmentation. In CVPR, pages 11514–11524, 2023.
  4. Yolact: Real-time instance segmentation. In ICCV, pages 9157–9166, 2019.
  5. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  6. Segment any anomaly without training via hybrid prompt regularization. arXiv preprint arXiv:2305.10724, 2023.
  7. Segment anything in 3d with nerfs. In NeurIPS, 2023.
  8. Sam-octa: Prompting segment-anything for octa image segmentation. arXiv preprint arXiv:2310.07183, 2023.
  9. C-cam: Causal cam for weakly supervised semantic segmentation on medical image. In CVPR, pages 11676–11685, 2022.
  10. Segment and track anything. arXiv preprint arXiv:2305.06558, 2023.
  11. Solq: Segmenting objects by learning queries. In NeurIPS, pages 21898–21909, 2021.
  12. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  13. Instances as queries. In ICCV, pages 6910–6919, 2021.
  14. Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans. Intell. Transp. Syst., 22(3):1341–1360, 2020.
  15. Mask r-cnn. In ICCV, pages 2961–2969, 2017.
  16. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021a.
  17. Istr: End-to-end instance segmentation with transformers. arXiv preprint arXiv:2105.00637, 2021b.
  18. Mask scoring r-cnn. In CVPR, pages 6409–6418, 2019.
  19. Scaling up visual and vision-language representation learning with noisy text supervision. pages 4904–4916, 2021.
  20. Visual prompt tuning. In ECCV, pages 709–727. Springer, 2022.
  21. Segment anything in high quality. In NeurIPS, 2023.
  22. Segment Anything. In ICCV, pages 4015–4026, 2023.
  23. Maskgan: Towards diverse and interactive facial image manipulation. In CVPR, 2020.
  24. Centermask: Real-time anchor-free instance segmentation. In CVPR, pages 13906–13915, 2020.
  25. The power of scale for parameter-efficient prompt tuning. In Emp. Meth. Nat. Lan. Proc., pages 3045–3059, 2021.
  26. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  27. Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
  28. Editgan: High-precision semantic image editing. In NeurIPS, pages 16331–16345, 2021.
  29. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023a.
  30. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499, 2023b.
  31. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv preprint arXiv:2110.07602, 2021.
  32. Segment anything in medical images. Nature Communications, 15(1):654, 2024.
  33. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, 89:102918, 2023.
  34. Learning transferable visual models from natural language supervision. pages 8748–8763, 2021.
  35. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015.
  36. A comparative study of real-time semantic segmentation for autonomous driving. In CVPRW, pages 587–597, 2018.
  37. Can sam segment anything? when sam meets camouflaged object detection. arXiv preprint arXiv:2304.04709, 2023.
  38. Attention is all you need. In NeurIPS, 2017.
  39. Scaling-up remote sensing segmentation dataset with segment anything model. arXiv preprint arXiv:2305.02034, 2023.
  40. Solo: Segmenting objects by locations. In ECCV, pages 649–665, 2020.
  41. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  42. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  43. Inpaint anything: Segment anything meets image inpainting. arXiv preprint arXiv:2304.06790, 2023.
  44. Image editing via segmentation guided self-attention network. IEEE Sign. Process. Letters, 27:1605–1609, 2020.
  45. Cyclemix: A holistic strategy for medical image segmentation from scribble supervision. In CVPR, pages 11656–11665, 2022.
  46. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023a.
  47. On the challenges and perspectives of foundation models for medical image analysis. arXiv preprint arXiv:2306.05705, 2023.
  48. Arbitrary shape text detection via boundary transformer. IEEE TMM, 2023b.
  49. Can sam segment polyps? arXiv preprint arXiv:2304.07583, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Hyung-Il Kim (9 papers)
  2. Kimin Yun (7 papers)
  3. Jun-Seok Yun (3 papers)
  4. Yuseok Bae (5 papers)

Summary

We haven't generated a summary for this paper yet.