Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long Multi-track Symbolic Music Generation (2401.07532v1)

Published 15 Jan 2024 in cs.SD, cs.AI, and eess.AS

Abstract: Variational Autoencoders (VAEs) constitute a crucial component of neural symbolic music generation, among which some works have yielded outstanding results and attracted considerable attention. Nevertheless, previous VAEs still encounter issues with overly long feature sequences and generated results lack contextual coherence, thus the challenge of modeling long multi-track symbolic music still remains unaddressed. To this end, we propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music. The Multi-view MidiVAE utilizes the two-dimensional (2-D) representation, OctupleMIDI, to capture relationships among notes while reducing the feature sequences length. Moreover, we focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy to integrate both Track- and Bar-view MidiVAE features. Objective and subjective experimental results on the CocoChorales dataset demonstrate that, compared to the baseline, Multi-view MidiVAE exhibits significant improvements in terms of modeling long multi-track symbolic music.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. “Wavjourney: Compositional audio creation with large language models,” arXiv preprint arXiv:2307.14335, 2023.
  2. “Popmag: Pop music accompaniment generation,” in Proceedings of the 28th ACM international conference on multimedia, 2020, pp. 1198–1206.
  3. “Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32.
  4. “Multitrack music transformer,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  5. “Music Transformer: Generating Music with Long-Term Structure,” in International Conference on Learning Representations, 2018.
  6. “Lakhnes: Improving multi-instrumental music generation with cross-domain pre-training,” in Proc. ISMIR, 2019.
  7. “Musiclm: Generating music from text,” arXiv preprint arXiv:2301.11325, 2023.
  8. “AudioLDM: Text-to-audio generation with latent diffusion models,” arXiv preprint arXiv:2301.12503, 2023.
  9. “Simple and Controllable Music Generation,” arXiv preprint arXiv:2306.05284, 2023.
  10. “Efficient Neural Music Generation,” arXiv preprint arXiv:2305.15719, 2023.
  11. “Noise2music: Text-conditioned music generation with diffusion models,” arXiv preprint arXiv:2302.03917, 2023.
  12. “Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation,” arXiv preprint arXiv:2305.18474, 2023.
  13. “A Survey on Deep Learning for Symbolic Music Generation: Representations, Algorithms, Evaluations, and Challenges,” ACM Computing Surveys, 2023.
  14. “A hierarchical latent vector model for learning long-term structure in music,” in International conference on machine learning. PMLR, 2018, pp. 4364–4373.
  15. “GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures,” in 2017 IEEE symposium series on computational intelligence (SSCI). IEEE, 2017, pp. 1–7.
  16. “Attribute-based regularization of latent spaces for variational auto-encoders,” Neural Computing and Applications, vol. 33, pp. 4429–4444, 2021.
  17. “Pianotree vae: Structured representation learning for polyphonic music,” in Proc. ISMIR, 2020.
  18. “Learning long-term music representations via hierarchical contextual constraints,” in Proc. ISMIR, 2021.
  19. “Symbolic music generation with diffusion models,” arXiv preprint arXiv:2103.16091, 2021.
  20. “Learning a latent space of multitrack measures,” arXiv preprint arXiv:1806.00195, 2018.
  21. “This time with feeling: Learning expressive musical performance,” Neural Computing and Applications, vol. 32, pp. 955–967, 2020.
  22. “MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 791–800.
  23. “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  24. “The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling,” arXiv preprint arXiv:2209.14458, 2022.
  25. “beta-vae: Learning basic visual concepts with a constrained variational framework,” in International conference on learning representations, 2016.
  26. “FIGARO: Generating symbolic music with fine-grained artistic control,” arXiv preprint arXiv:2201.10936, 2022.
Citations (2)

Summary

We haven't generated a summary for this paper yet.