Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collaborative Spatial-Temporal Modeling for Language-Queried Video Actor Segmentation (2105.06818v1)

Published 14 May 2021 in cs.CV and cs.MM

Abstract: Language-queried video actor segmentation aims to predict the pixel-level mask of the actor which performs the actions described by a natural language query in the target frames. Existing methods adopt 3D CNNs over the video clip as a general encoder to extract a mixed spatio-temporal feature for the target frame. Though 3D convolutions are amenable to recognizing which actor is performing the queried actions, it also inevitably introduces misaligned spatial information from adjacent frames, which confuses features of the target frame and yields inaccurate segmentation. Therefore, we propose a collaborative spatial-temporal encoder-decoder framework which contains a 3D temporal encoder over the video clip to recognize the queried actions, and a 2D spatial encoder over the target frame to accurately segment the queried actors. In the decoder, a Language-Guided Feature Selection (LGFS) module is proposed to flexibly integrate spatial and temporal features from the two encoders. We also propose a Cross-Modal Adaptive Modulation (CMAM) module to dynamically recombine spatial- and temporal-relevant linguistic features for multimodal feature interaction in each stage of the two encoders. Our method achieves new state-of-the-art performance on two popular benchmarks with less computational overhead than previous approaches.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Tianrui Hui (15 papers)
  2. Shaofei Huang (19 papers)
  3. Si Liu (130 papers)
  4. Zihan Ding (38 papers)
  5. Guanbin Li (177 papers)
  6. Wenguan Wang (103 papers)
  7. Jizhong Han (48 papers)
  8. Fei Wang (574 papers)
Citations (44)

Summary

We haven't generated a summary for this paper yet.