Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ALCAP: Alignment-Augmented Music Captioner (2212.10901v3)

Published 21 Dec 2022 in cs.SD, cs.CL, cs.IR, cs.MM, and eess.AS

Abstract: Music captioning has gained significant attention in the wake of the rising prominence of streaming media platforms. Traditional approaches often prioritize either the audio or lyrics aspect of the music, inadvertently ignoring the intricate interplay between the two. However, a comprehensive understanding of music necessitates the integration of both these elements. In this study, we delve into this overlooked realm by introducing a method to systematically learn multimodal alignment between audio and lyrics through contrastive learning. This not only recognizes and emphasizes the synergy between audio and lyrics but also paves the way for models to achieve deeper cross-modal coherence, thereby producing high-quality captions. We provide both theoretical and empirical results demonstrating the advantage of the proposed method, which achieves new state-of-the-art on two music captioning datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zihao He (31 papers)
  2. Weituo Hao (16 papers)
  3. Wei-Tsung Lu (12 papers)
  4. Changyou Chen (108 papers)
  5. Kristina Lerman (197 papers)
  6. Xuchen Song (20 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.