Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning of SAM (2402.16338v4)

Published 26 Feb 2024 in cs.CV

Abstract: The Segment Anything Model (SAM), a foundation model pretrained on millions of images and segmentation masks, has significantly advanced semantic segmentation, a fundamental task in computer vision. Despite its strengths, SAM encounters two major challenges. Firstly, it struggles with segmenting specific objects autonomously, as it relies on users to manually input prompts like points or bounding boxes to identify targeted objects. Secondly, SAM faces challenges in excelling at specific downstream tasks, like medical imaging, due to a disparity between the distribution of its pretraining data, which predominantly consists of general-domain images, and the data used in downstream tasks. Current solutions to these problems, which involve finetuning SAM, often lead to overfitting, a notable issue in scenarios with very limited data, like in medical imaging. To overcome these limitations, we introduce BLO-SAM, which finetunes SAM based on bi-level optimization (BLO). Our approach allows for automatic image segmentation without the need for manual prompts, by optimizing a learnable prompt embedding. Furthermore, it significantly reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset, each at a different level of optimization. We apply BLO-SAM to diverse semantic segmentation tasks in general and medical domains. The results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Flamingo: a visual language model for few-shot learning. In Oh, A. H., Agarwal, A., Belgrave, D., and Cho, K. (eds.), Advances in Neural Information Processing Systems, 2022.
  2. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Swin-unet: Unet-like pure transformer for medical image segmentation. In European conference on computer vision, pp.  205–218. Springer, 2022.
  5. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017a.
  6. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587, 2017b.
  7. Tracking anything with decoupled video segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  1316–1326, 2023.
  8. Betty: An automatic differentiation library for multilevel optimization. arXiv preprint arXiv:2207.02849, 2022.
  9. David, S. Semantics segmentation of car parts. Available: https://www.kaggle.com/datasets/intelecai/car-segmentation/data, 2020.
  10. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
  12. Self-support few-shot semantic segmentation. In European Conference on Computer Vision, pp.  701–719. Springer, 2022.
  13. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, volume 70, pp.  1126–1135, 2017.
  14. Bilevel programming for hyperparameter optimization and meta-learning. In International conference on machine learning, pp. 1568–1577. PMLR, 2018.
  15. Learning from mistakes–a framework for neural architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  10184–10192, 2022.
  16. On the iteration complexity of hypergradient computation. In International Conference on Machine Learning, 2020.
  17. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  16000–16009, 2022.
  18. Computer-vision benchmark segment-anything model (sam) in medical images: Accuracy in 12 datasets. arXiv preprint arXiv:2304.09324, 2023.
  19. Image understanding by captioning with differentiable architecture search. In Proceedings of the 30th ACM International Conference on Multimedia, pp.  4665–4673, 2022.
  20. Fair and accurate decision making through group-aware learning. In International Conference on Machine Learning, pp. 13254–13269. PMLR, 2023.
  21. Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
  22. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  23. Kvasir-seg: A segmented polyp dataset. In International Conference on Multimedia Modeling, pp. 451–462. Springer, 2020.
  24. A nested bi-level optimization framework for robust few shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pp.  7176–7184, 2022.
  25. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  26. Lisa: Reasoning segmentation via large language model. arXiv preprint arXiv:2308.00692, 2023.
  27. Survey on semantic segmentation using deep learning techniques. Neurocomputing, 338:321–348, 2019.
  28. Maskgan: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  5549–5558, 2020.
  29. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp.  3045–3059, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.emnlp-main.243. URL https://aclanthology.org/2021.emnlp-main.243.
  30. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
  31. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. arXiv preprint arXiv:2205.05638, 2022.
  32. Visual instruction tuning, 2023a.
  33. Gpt understands, too. AI Open, 2023b. ISSN 2666-6510. doi: https://doi.org/10.1016/j.aiopen.2023.08.012. URL https://www.sciencedirect.com/science/article/pii/S2666651023000141.
  34. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  3431–3440, 2015.
  35. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  36. Segment anything in medical images. arXiv preprint arXiv:2304.12306, 2023.
  37. Segment anything model for medical image analysis: an experimental study. Medical Image Analysis, 89:102918, 2023.
  38. Review the state-of-the-art technologies of semantic segmentation based on deep learning. Neurocomputing, 493:626–646, 2022.
  39. Foundation models for generalist medical artificial intelligence. Nature, 616(7956):259–265, 2023.
  40. Bi-level meta-learning for few-shot domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15900–15910, 2023.
  41. Improving language understanding by generative pre-training. OpenAI, 2018.
  42. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
  43. Roman, K. Human segmentation dataset - tiktok dances. Available: www.kaggle.com/datasets, 2023.
  44. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp.  234–241. Springer, 2015.
  45. Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists’ detection of pulmonary nodules. American Journal of Roentgenology, 174(1):71–74, 2000.
  46. Meta-weight-net: Learning an explicit mapping for sample weighting. In NeurIPS, 2019.
  47. A review on bilevel optimization: From classical to evolutionary approaches and applications. IEEE Transactions on Evolutionary Computation, 22(2):276–295, 2017.
  48. Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
  49. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023a.
  50. Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288, 2023b.
  51. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  52. Non-local neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  7794–7803, 2018.
  53. SimVLM: Simple visual language model pretraining with weak supervision. In International Conference on Learning Representations, 2022.
  54. Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
  55. Segformer: Simple and efficient design for semantic segmentation with transformers. In Neural Information Processing Systems (NeurIPS), 2021.
  56. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968, 2023.
  57. Customized segment anything model for medical image segmentation. arXiv preprint arXiv:2304.13785, 2023.
  58. Adaptive budget allocation for parameter-efficient fine-tuning. arXiv preprint arXiv:2303.10512, 2023a.
  59. Hsnet: A hybrid semantic network for polyp segmentation. Computers in biology and medicine, 150:106173, 2022.
  60. Children’s dental panoramic radiographs dataset for caries segmentation and dental disease detection. Scientific Data, 10(1):380, 2023b.
Citations (4)

Summary

We haven't generated a summary for this paper yet.