PM-VIS: High-Performance Box-Supervised Video Instance Segmentation (2404.13863v1)

Published 22 Apr 2024 in cs.CV

Abstract: Labeling pixel-wise object masks in videos is a resource-intensive and laborious process. Box-supervised Video Instance Segmentation (VIS) methods have emerged as a viable solution to mitigate the labor-intensive annotation process. . In practical applications, the two-step approach is not only more flexible but also exhibits a higher recognition accuracy. Inspired by the recent success of Segment Anything Model (SAM), we introduce a novel approach that aims at harnessing instance box annotations from multiple perspectives to generate high-quality instance pseudo masks, thus enriching the information contained in instance annotations. We leverage ground-truth boxes to create three types of pseudo masks using the HQ-SAM model, the box-supervised VIS model (IDOL-BoxInst), and the VOS model (DeAOT) separately, along with three corresponding optimization mechanisms. Additionally, we introduce two ground-truth data filtering methods, assisted by high-quality pseudo masks, to further enhance the training dataset quality and improve the performance of fully supervised VIS methods. To fully capitalize on the obtained high-quality Pseudo Masks, we introduce a novel algorithm, PM-VIS, to integrate mask losses into IDOL-BoxInst. Our PM-VIS model, trained with high-quality pseudo mask annotations, demonstrates strong ability in instance mask prediction, achieving state-of-the-art performance on the YouTube-VIS 2019, YouTube-VIS 2021, and OVIS validation sets, notably narrowing the gap between box-supervised and fully supervised VIS methods.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

References (44)

Authors (5)

Zhangjing Yang (2 papers)
Dun Liu (2 papers)
Wensheng Cheng (5 papers)
Jinqiao Wang (76 papers)
Yi Wu (171 papers)

Citations (1)

View on Semantic Scholar

PM-VIS: High-Performance Box-Supervised Video Instance Segmentation (2404.13863v1)

Related Papers