GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework (2305.10841v2)
Abstract: Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However, previous efforts have fallen short in addressing this necessity due to limitations in their music representations and models. In this paper, we introduce a framework known as GETMusic, with GET'' standing for
GEnerate music Tracks.'' This framework encompasses a novel music representation GETScore'' and a diffusion model
GETDiff.'' GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time. At a training step, each track of a music piece is randomly selected as either the target or source. The training involves two processes: In the forward process, target tracks are corrupted by masking their tokens, while source tracks remain as the ground truth; in the denoising process, GETDiff is trained to predict the masked target tokens conditioning on the source tracks. Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations. Our experiments demonstrate that the versatile GETMusic outperforms prior works proposed for certain specific composition tasks.
- Structured denoising diffusion models in discrete state-spaces. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994.
- Walshaw Christopher. The abc music standard 2.1. ABC notation standard, 2011.
- Diffusion models beat gans on image synthesis. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 8780–8794. Curran Associates, Inc., 2021.
- Multitrack music transformer. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
- Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment, 2017.
- MMM : Exploring conditional multi-track music generation with the transformer. CoRR, abs/2008.06048, 2020.
- Midi miner - A python library for tonal tension and track classification. CoRR, abs/1910.02049, 2019.
- Denoising diffusion probabilistic models. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Argmax flows and multinomial diffusion: Learning categorical distributions. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
- Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. Proceedings of the AAAI Conference on Artificial Intelligence, 35(1):178–186, May 2021.
- Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 1180–1188, New York, NY, USA, 2020. Association for Computing Machinery.
- Telemelody: Lyric-to-melody generation with a template-based two-stage method. CoRR, abs/2109.09617, 2021.
- On information and sufficiency. The annals of mathematical statistics, 22(1):79–86, 1951.
- Re-creation of creations: A new paradigm for lyric-to-melody generation, 2022.
- Symbolic music generation with diffusion models, 2021.
- Popmag: Pop music accompaniment generation. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, page 1198–1206, New York, NY, USA, 2020. Association for Computing Machinery.
- Songmass: Automatic song writing with pre-training and alignment constraint. CoRR, abs/2012.05168, 2020.
- Li Shuyu and Yunsick Sung. Melodydiffusion: Chord-conditioned melody generation using a transformer-based diffusion model. Mathematics 11, no. 8: 1915., 2023.
- Deep unsupervised learning using nonequilibrium thermodynamics. In Francis Bach and David Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2256–2265, Lille, France, 07–09 Jul 2015. PMLR.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
- Attention is all you need. In Advances in neural information processing systems, pages 5998–6008, 2017.
- Midinet: A convolutional generative adversarial network for symbolic-domain music generation using 1d and 2d conditions. CoRR, abs/1703.10847, 2017.
- Museformer: Transformer with fine- and coarse-grained attention for music generation. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- MusicBERT: Symbolic music understanding with large-scale pre-training. In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 791–800, Online, August 2021. Association for Computational Linguistics.
- Sdmuse: Stochastic differential music editing and generation via hybrid representation, 2022.
- Ang Lv (19 papers)
- Xu Tan (164 papers)
- Peiling Lu (8 papers)
- Wei Ye (110 papers)
- Shikun Zhang (82 papers)
- Jiang Bian (229 papers)
- Rui Yan (250 papers)