Video Relation Detection via Tracklet based Visual Transformer (2108.08669v1)

Published 19 Aug 2021 in cs.CV

Abstract: Video Visual Relation Detection (VidVRD), has received significant attention of our community over recent years. In this paper, we apply the state-of-the-art video object tracklet detection pipeline MEGA and deepSORT to generate tracklet proposals. Then we perform VidVRD in a tracklet-based manner without any pre-cutting operations. Specifically, we design a tracklet-based visual Transformer. It contains a temporal-aware decoder which performs feature interactions between the tracklets and learnable predicate query embeddings, and finally predicts the relations. Experimental results strongly demonstrate the superiority of our method, which outperforms other methods by a large margin on the Video Relation Understanding (VRU) Grand Challenge in ACM Multimedia 2021. Codes are released at https://github.com/Dawn-LX/VidVRD-tracklets.

Citations (29)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - Dawn-LX/VidVRD-tracklets: Video Visual Relation Detection (VidVRD) tracklets generation. also for ACM MM Visual Relation Understanding Grand Challenge (37 stars)

Video Relation Detection via Tracklet based Visual Transformer (2108.08669v1)

Summary

Related Papers

GitHub