Symbolic Music Generation with Non-Differentiable Rule Guided Diffusion (2402.14285v4)
Abstract: We study the problem of symbolic music generation (e.g., generating piano rolls), with a technical focus on non-differentiable rule guidance. Musical rules are often expressed in symbolic form on note characteristics, such as note density or chord progression, many of which are non-differentiable which pose a challenge when using them for guided diffusion. We propose Stochastic Control Guidance (SCG), a novel guidance method that only requires forward evaluation of rule functions that can work with pre-trained diffusion models in a plug-and-play way, thus achieving training-free guidance for non-differentiable rules for the first time. Additionally, we introduce a latent diffusion architecture for symbolic music generation with high time resolution, which can be composed with SCG in a plug-and-play fashion. Compared to standard strong baselines in symbolic music generation, this framework demonstrates marked advancements in music quality and rule-based controllability, outperforming current state-of-the-art generators in a variety of settings. For detailed demonstrations, code and model checkpoints, please visit our project website: https://scg-rule-guided-music.github.io/.
- Atassi, L. Generating symbolic music using diffusion models. arXiv preprint arXiv:2303.08385, 2023.
- An optimal control perspective on diffusion-based generative modeling. arXiv preprint arXiv:2211.01364, 2022.
- Midi-vae: Modeling dynamics and instrumentation of music with applications to style transfer. 19th International Society for Music Information Retrieval Conference, 2018.
- Ilvr: Conditioning method for denoising diffusion probabilistic models. ICCV, 2021.
- Encoding musical style with transformer autoencoders. In International Conference on Machine Learning, pp. 1899–1908. PMLR, 2020.
- Diffusion posterior sampling for general noisy inverse problems. ICLR, 2023.
- music21: A toolkit for computer-aided musicology and symbolic music data. 2010.
- Dai Pra, P. A stochastic control approach to reciprocal diffusion processes. Applied mathematics and Optimization, 23:313–329, 1991.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. ICLR, 2021.
- Efron, B. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
- Evans, L. C. Partial differential equations, volume 19. American Mathematical Society, 2022.
- Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics: An International Journal of Probability and Stochastic Processes, 8(1):63–77, 1982.
- Enabling factorized piano music modeling and generation with the MAESTRO dataset. In International Conference on Learning Representations, 2019.
- Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
- Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp. 178–186, 2021.
- Music transformer. arXiv preprint arXiv:1809.04281, 2018.
- Noise2music: Text-conditioned music generation with diffusion models. arXiv preprint arXiv:2302.03917, 2023.
- Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM international conference on multimedia, pp. 1180–1188, 2020.
- Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
- Melodydiffusion: Chord-conditioned melody generation using a transformer-based diffusion model. Mathematics, 11(8):1915, 2023.
- Decoupled weight decay regularization. ICLR, 2019.
- Exploring conditioning for generative music systems with human-interpretable controls. arXiv preprint arXiv:1907.04352, 2019.
- Sdedit: Image synthesis and editing with stochastic differential equations. ICLR, 2022.
- Polyffusion: A diffusion model for polyphonic score generation with internal and external controls. Proc. of the 24th Int. Society for Music Information Retrieval Conf, 2023.
- Øksendal, B. Stochastic differential equations. Springer, 2003.
- Pavon, M. Stochastic control and nonequilibrium thermodynamical systems. Applied Mathematics and Optimization, 19:187–202, 1989.
- Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4195–4205, 2023.
- Popmag: Pop music accompaniment generation. In Proceedings of the 28th ACM international conference on multimedia, pp. 1198–1206, 2020.
- A hierarchical latent vector model for learning long-term structure in music. In International conference on machine learning, pp. 4364–4373. PMLR, 2018.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- Denoising diffusion implicit models. ICLR, 2021a.
- Loss-guided diffusion models for plug-and-play controllable generation. In International Conference on Machine Learning, 2023.
- Score-based generative modeling through stochastic differential equations. ICLR, 2021b.
- Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, pp. 127063, 2023.
- A generalized path integral control approach to reinforcement learning. The Journal of Machine Learning Research, 11:3137–3181, 2010.
- Theodorou, E. A. Nonlinear stochastic control and information theoretic dualities: Connections, interdependencies and thermodynamic interpretations. Entropy, 17(5):3352–3375, 2015.
- Denoising diffusion samplers. arXiv preprint arXiv:2302.13834, 2023.
- Figaro: Controllable music generation using learned and expert features. In The Eleventh International Conference on Learning Representations, 2022.
- Pop909: A pop-song dataset for music arrangement generation. arXiv preprint arXiv:2008.07142, 2020.
- Musemorphose: Full-song and fine-grained piano music style transfer with one transformer vae. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 31:1953–1967, 2023.
- On the evaluation of generative models in music. Neural Computing and Applications, 32(9):4773–4784, 2020.
- Midinet: A convolutional generative adversarial network for symbolic-domain music generation. International Society for Music Information Retrieval, 2017.
- Deep learning’s shallow gains: a comparative evaluation of algorithms for automatic music generation. Machine Learning, 112(5):1785–1822, 2023.
- Sdmuse: Stochastic differential music editing and generation via hybrid representation. IEEE Transactions on Multimedia, 2023a.
- Path integral sampler: a stochastic control approach for sampling. arXiv preprint arXiv:2111.15141, 2021.
- Diffcollage: Parallel generation of large content with diffusion models. CVPR, 2023b.
- Yujia Huang (12 papers)
- Adishree Ghatare (1 paper)
- Yuanzhe Liu (7 papers)
- Ziniu Hu (51 papers)
- Qinsheng Zhang (28 papers)
- Chandramouli S Sastry (1 paper)
- Siddharth Gururani (14 papers)
- Sageev Oore (26 papers)
- Yisong Yue (154 papers)