Text Conditioned Symbolic Drumbeat Generation using Latent Diffusion Models
Abstract: This study introduces a text-conditioned approach to generating drumbeats with Latent Diffusion Models (LDMs). It uses informative conditioning text extracted from training data filenames. By pretraining a text and drumbeat encoder through contrastive learning within a multimodal network, aligned following CLIP, we align the modalities of text and music closely. Additionally, we examine an alternative text encoder based on multihot text encodings. Inspired by musics multi-resolution nature, we propose a novel LSTM variant, MultiResolutionLSTM, designed to operate at various resolutions independently. In common with recent LDMs in the image space, it speeds up the generation process by running diffusion in a latent space provided by a pretrained unconditional autoencoder. We demonstrate the originality and variety of the generated drumbeats by measuring distance (both over binary pianorolls and in the latent space) versus the training dataset and among the generated drumbeats. We also assess the generated drumbeats through a listening test focused on questions of quality, aptness for the prompt text, and novelty. We show that the generated drumbeats are novel and apt to the prompt text, and comparable in quality to those created by human musicians.
- J. A. Biles, “Genjam: Evolution of a jazz improviser,” in Creative evolutionary systems. Elsevier, 2002, pp. 165–187.
- D. Eck and J. Schmidhuber, “A first look at music composition using lstm recurrent neural networks,” Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, vol. 103, no. 4, pp. 48–56, 2002.
- A. Roberts, J. Engel, C. Raffel, C. Hawthorne, and D. Eck, “A hierarchical latent vector model for learning long-term structure in music,” in International conference on machine learning. PMLR, 2018, pp. 4364–4373.
- J.-P. Briot, G. Hadjeres, and F.-D. Pachet, “Deep learning techniques for music generation – a survey,” 2019.
- D. Herremans, C.-H. Chuan, and E. Chew, “A functional taxonomy of music generation systems,” ACM Computing Surveys, vol. 50, no. 5, p. 1–30, Sep. 2017. [Online]. Available: http://dx.doi.org/10.1145/3108242
- D. Herremans and E. Chew, “Morpheus: Generating structured music with constrained patterns and tension,” IEEE Transactions on Affective Computing, vol. 10, no. 4, p. 510–523, Oct. 2019. [Online]. Available: http://dx.doi.org/10.1109/TAFFC.2017.2737984
- A. HEWITT, “Musical creativity: Multidisciplinary research in theory and practice edited by irène deliège and geraint wiggins. andover: Psychology press, 2006. 376 pp, £60.00, hardback. isbn: 1-84169-508-4,” British Journal of Music Education, vol. 24, no. 3, p. 343–346, 2007.
- R. Dahale, V. Talwadker, P. Rao, and P. Verma, “Generating coherent drum accompaniment with fills and improvisations,” 2022.
- C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon, C. Hawthorne, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck, “Music transformer,” 2018.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” 2021.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” 2020.
- A. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” 2021.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” 2022.
- A. Tikhonov and I. Yamshchikov, “Drum beats and where to find them: Sampling drum patterns from a latent space,” 07 2020.
- D. Makris, G. Zixun, M. Kaliakatsos-Papakostas, and D. Herremans, “Conditional drums generation using compound word representations,” 2022.
- Y.-K. Wu, C.-Y. Chiu, and Y.-H. Yang, “Jukedrummer: Conditional beat-aware audio-domain drum accompaniment generation via transformer vq-vae,” 2022.
- M. Kaliakatsos-Papakostas, “Generating drum rhythms through data-driven conceptual blending of features and genetic algorithms,” in Computational Intelligence in Music, Sound, Art and Design: 7th International Conference, EvoMUSART 2018, Parma, Italy, April 4-6, 2018, Proceedings. Springer, 2018, pp. 145–160.
- A. K. Hoover, M. P. Rosario, and K. O. Stanley, “Scaffolding for interactively evolving novel drum tracks for existing songs,” in Applications of Evolutionary Computing: EvoWorkshops 2008: EvoCOMNET, EvoFIN, EvoHOT, EvoIASP, EvoMUSART, EvoNUM, EvoSTOC, and EvoTransLog, Naples, Italy, March 26-28, 2008. Proceedings. Springer, 2008, pp. 412–422.
- A. Roberts, J. Engel, C. Raffel, C. Hawthorne, and D. Eck, “A hierarchical latent vector model for learning long-term structure in music,” 2019.
- T. Blankensmith and K. Phillips, “Beat blender,” 2018. [Online]. Available: https://experiments.withgoogle.com/ai/beat-blender/view/
- L. Callender, C. Hawthorne, and J. Engel, “Improving perceptual quality of drum transcription with the expanded groove midi dataset,” 2020.
- S. Wu, D. Yu, X. Tan, and M. Sun, “Clamp: Contrastive language-music pre-training for cross-modal symbolic music information retrieval,” 2023.
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
- L.-C. Yang and A. Lerch, “On the evaluation of generative models in music,” Neural Computing and Applications, vol. 32, no. 9, pp. 4773–4784, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.