Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation (2307.00997v3)

Published 3 Jul 2023 in cs.CV and cs.AI

Abstract: The Segment Anything Model (SAM) has gained significant attention for its impressive performance in image segmentation. However, it lacks proficiency in referring video object segmentation (RVOS) due to the need for precise user-interactive prompts and a limited understanding of different modalities, such as language and vision. This paper presents the RefSAM model, which explores the potential of SAM for RVOS by incorporating multi-view information from diverse modalities and successive frames at different timestamps in an online manner. Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-Modal MLP that projects the text embedding of the referring expression into sparse and dense embeddings, serving as user-interactive prompts. Additionally, we have introduced the hierarchical dense attention module to fuse hierarchical visual semantic information with sparse embeddings to obtain fine-grained dense embeddings, and an implicit tracking module to generate a tracking token and provide historical information for the mask decoder. Furthermore, we employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively. Through comprehensive ablation studies, we demonstrate our model's practical and effective design choices. Extensive experiments conducted on Refer-Youtube-VOS, Ref-DAVIS17, and three referring image segmentation datasets validate the superiority and effectiveness of our RefSAM model over existing methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yonglin Li (9 papers)
  2. Jing Zhang (731 papers)
  3. Xiao Teng (5 papers)
  4. Long Lan (38 papers)
  5. Xinwang Liu (101 papers)
Citations (13)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub