Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Textless Speech-to-Music Retrieval Using Emotion Similarity (2303.10539v1)

Published 19 Mar 2023 in cs.SD, cs.IR, cs.MM, and eess.AS

Abstract: We introduce a framework that recommends music based on the emotions of speech. In content creation and daily life, speech contains information about human emotions, which can be enhanced by music. Our framework focuses on a cross-domain retrieval system to bridge the gap between speech and music via emotion labels. We explore different speech representations and report their impact on different speech types, including acting voice and wake-up words. We also propose an emotion similarity regularization term in cross-domain retrieval tasks. By incorporating the regularization term into training, similar speech-and-music pairs in the emotion space are closer in the joint embedding space. Our comprehensive experimental results show that the proposed model is effective in textless speech-to-music retrieval.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. SeungHeon Doh (18 papers)
  2. Minz Won (19 papers)
  3. Keunwoo Choi (42 papers)
  4. Juhan Nam (64 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.