Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Unified Pretraining Target Based Video-music Retrieval With Music Rhythm And Video Optical Flow Information (2309.09421v1)

Published 18 Sep 2023 in cs.MM

Abstract: Background music (BGM) can enhance the video's emotion. However, selecting an appropriate BGM often requires domain knowledge. This has led to the development of video-music retrieval techniques. Most existing approaches utilize pretrained video/music feature extractors trained with different target sets to obtain average video/music-level embeddings. The drawbacks are two-fold. One is that different target sets for video/music pretraining may cause the generated embeddings difficult to match. The second is that the underlying temporal correlation between video and music is ignored. In this paper, our proposed approach leverages a unified target set to perform video/music pretraining and produces clip-level embeddings to preserve temporal information. The downstream cross-modal matching is based on the clip-level features with embedded music rhythm and optical flow information. Experiments demonstrate that our proposed method can achieve superior performance over the state-of-the-art methods by a significant margin.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tianjun Mao (4 papers)
  2. Shansong Liu (19 papers)
  3. Yunxuan Zhang (5 papers)
  4. Dian Li (28 papers)
  5. Ying Shan (252 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.