2000 character limit reached
Accompaniment Prompt Adherence: A measure for evaluating music accompaniment systems (2404.00775v3)
Published 31 Mar 2024 in cs.SD and eess.AS
Abstract: Generative systems of musical accompaniments are rapidly growing, yet there are no standardized metrics to evaluate how well generations align with the conditional audio prompt. We introduce a distribution-based measure called Accompaniment Prompt Adherence (APA), and validate it through objective experiments on synthetic data perturbations, and human listening tests. Results show that APA aligns well with human judgments of adherence and is discriminative to transformations that degrade adherence. We release a Python implementation of the metric using the widely adopted pre-trained CLAP embedding model, offering a valuable tool for evaluating and comparing accompaniment generation systems.
- MusicLM: Generating music from text. arXiv preprint arXiv:2301.11325.
- Ddx7: Differentiable fm synthesis of musical instrument sounds. arXiv preprint arXiv:2208.06169.
- Look, listen, and learn more: Design choices for deep audio embeddings. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3852–3856. IEEE.
- Singsong: Generating musical accompaniments from singing.
- CLAP: learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE.
- Neural audio synthesis of musical notes with wavenet autoencoders.
- Riffusion - Stable diffusion for real-time music generation.
- Bassnet: A variational gated autoencoder for conditional generation of bass guitar tracks with learned interactive control. Applied Sciences, 10(18:6627). Special Issue ”Deep Learning for Applications in Acoustics: Modeling, Synthesis, and Listening”.
- A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773.
- Adapting frechet audio distance for generative music evaluation.
- Cnn architectures for large-scale audio classification. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 131–135. IEEE.
- Mulan: A joint embedding of music audio and natural language.
- Noise2music: Text-conditioned music generation with diffusion models.
- Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models. arXiv preprint arXiv:2301.12661.
- Fréchet audio distance: A reference-free metric for evaluating music enhancement algorithms. In INTERSPEECH, pages 2350–2354.
- High-level control of drum track generation using learned patterns of rhythmic interaction. In 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2019, New Paltz, NY, USA, October 20-23, 2019. IEEE.
- Audioldm: Text-to-audio generation with latent diffusion models.
- A common language effect size statistic. Psychological bulletin, 111(2):361–365.
- Comparing representations for audio synthesis using generative adversarial networks.
- Drumgan: Synthesis of drum sounds with timbral feature conditioning using generative adversarial networks. arXiv preprint arXiv:2008.12073.
- Stemgen: A music generation model that listens.
- Bass accompaniment generation via latent diffusion.
- Musika! fast infinite waveform music generation. arXiv preprint arXiv:2208.08706.
- MUSDB18-HQ - an uncompressed version of MUSDB18.
- A hierarchical latent vector model for learning long-term structure in music.
- Moûsai: Text-to-music generation with long-context latent diffusion.
- Demystifying MMD GANs. In International Conference for Learning Representations, pages 1–36.
- Music controlnet: Multiple time-varying controls for music generation.
- Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5.
- Jukedrummer: Conditional beat-aware audio-domain drum accompaniment generation via transformer vq-vae.