Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VMCML: Video and Music Matching via Cross-Modality Lifting (2303.12379v1)

Published 22 Mar 2023 in cs.CV

Abstract: We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, we establish a large-scale dataset called MSVD, in which we provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSVD datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yi-Shan Lee (5 papers)
  2. Wei-Cheng Tseng (19 papers)
  3. Fu-En Wang (12 papers)
  4. Min Sun (108 papers)

Summary

We haven't generated a summary for this paper yet.