Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track (2408.10125v2)

Published 19 Aug 2024 in cs.CV

Abstract: Video Object Segmentation (VOS) task aims to segmenting a particular object instance throughout the entire video sequence given only the object mask of the first frame. Recently, Segment Anything Model 2 (SAM 2) is proposed, which is a foundation model towards solving promptable visual segmentation in images and videos. SAM 2 builds a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. SAM 2 is a simple transformer architecture with streaming memory for real-time video processing, which trained on the date provides strong performance across a wide range of tasks. In this work, we evaluate the zero-shot performance of SAM 2 on the more challenging VOS datasets MOSE and LVOS. Without fine-tuning on the training set, SAM 2 achieved 75.79 J&F on the test set and ranked 4th place for 6th LSVOS Challenge VOS Track.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Feiyu Pan (5 papers)
  2. Hao Fang (88 papers)
  3. Runmin Cong (60 papers)
  4. Wei Zhang (1492 papers)
  5. Xiankai Lu (21 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.