A Survey of Music Generation in the Context of Interaction (2402.15294v1)
Abstract: In recent years, machine learning, and in particular generative adversarial neural networks (GANs) and attention-based neural networks (transformers), have been successfully used to compose and generate music, both melodies and polyphonic pieces. Current research focuses foremost on style replication (eg. generating a Bach-style chorale) or style transfer (eg. classical to jazz) based on large amounts of recorded or transcribed music, which in turn also allows for fairly straight-forward "performance" evaluation. However, most of these models are not suitable for human-machine co-creation through live interaction, neither is clear, how such models and resulting creations would be evaluated. This article presents a thorough review of music representation, feature analysis, heuristic algorithms, statistical and parametric modelling, and human and automatic evaluation measures, along with a discussion of which approaches and models seem most suitable for live interaction.
- “Lstm based music generation with dataset preprocessing and reconstruction techniques” In 2018 IEEE Symposium Series on Computational Intelligence (SSCI), 2018, pp. 455–462 IEEE
- Kat Agres, Jamie Forth and Geraint A. Wiggins “Evaluation of Musical Creativity and Musical Metacreation Systems” In Computers in Entertainment 14.3, 2016, pp. 3:1–3:33 DOI: 10.1145/2967506
- “Music Generation and Transformation with Moment Matching-Scattering Inverse Networks.” In ISMIR, 2018, pp. 327–333
- Christopher Ariza “The Interrogator as Critic: The Turing Test and the Evaluation of Generative Music Systems” In Computer Music Journal 33.2, 2009, pp. 48–70 DOI: 10.1162/comj.2009.33.2.48
- Dzmitry Bahdanau, Kyunghyun Cho and Yoshua Bengio “Neural machine translation by jointly learning to align and translate” In arXiv preprint arXiv:1409.0473, 2014
- “Automatic music transcription: An overview” In IEEE Signal Processing Magazine 36.1 IEEE, 2018, pp. 20–30
- “The Million Song Dataset” In Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR 2011), 2011
- John Biles “GenJam: A Genetic Algorithm for Generating Jazz Solos” In Proceedings of the 1994 International Computer Music Conference University of Michigan Library, 1994
- John Biles “GenJam: Evolution of a Jazz Improviser” In Creative Evolutionary Systems Morgan Kaufmann, 2002, pp. 165–188
- “Synthesis of the singing voice by performance sampling and spectral models” In IEEE signal processing magazine 24.2 IEEE, 2007, pp. 67–79
- “MIDI-VAE: Modeling dynamics and instrumentation of music with applications to style transfer” In arXiv preprint arXiv:1809.07600, 2018
- G. Burloiu “Interactive Learning of Microtiming in an Expressive Drum Machine” In The 2020 Joint Conference on AI Music Creativity, 2020
- L. Callender, C. Hawthorne and J. Engel “Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset” In arXiv preprint arXiv:2004.00188, 2020
- “The effect of explicit structure encoding of deep neural networks for symbolic music generation” In 2019 International Workshop on Multilayer Music Representation and Processing (MMRP), 2019, pp. 77–84 IEEE
- “A Variant Model of TGAN for Music Generation” In Proceedings of the 2020 Asia Service Sciences and Software Engineering Conference, 2020, pp. 40–45
- “Generating music with a self-correcting non-chronological autoregressive model” In arXiv preprint arXiv:2008.08927, 2020
- “Generating Long Sequences with Sparse Transformers”, 2019 arXiv:1904.10509 [cs.LG]
- “Encoding musical style with transformer autoencoders” In International Conference on Machine Learning, 2020, pp. 1899–1908 PMLR
- “Encoding Musical Style with Transformer Autoencoders”, 2020 arXiv:1912.05537 [cs.SD]
- T. Christensen “The Cambridge history of Western music theory” Cambridge University Press, 2006
- Hang Chu, Raquel Urtasun and Sanja Fidler “Song from PI: A musically plausible network for pop music generation” In arXiv preprint arXiv:1611.03477, 2016
- Ching-Hua Chuan, Kat Agres and Dorien Herremans “From context to concept: exploring semantic relationships in music with word2vec” In Neural Computing and Applications 32.4 Springer, 2020, pp. 1023–1036
- “Empirical evaluation of gated recurrent neural networks on sequence modeling” In arXiv preprint arXiv:1412.3555, 2014
- “A systematic review of artificial intelligence-based music generation: Scope, applications, and future trends” In Expert Systems with Applications 209, 2022, pp. 118190 DOI: https://doi.org/10.1016/j.eswa.2022.118190
- S. Colton “Creativity Versus the Perception of Creativity in Computational Systems” In AAAI Spring Symposium: Creative Intelligent Systems, 2008
- Charles Darwin “On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life” John Murray, 1859
- “Rhythm, Chord and Melody Generation for Lead Sheets Using Recurrent Neural Networks” In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2019, pp. 454–461 Springer
- “Sing: Symbol-to-instrument neural generator” In Advances in Neural Information Processing Systems, 2018, pp. 9041–9051
- “Jukebox: A generative model for music” In arXiv preprint arXiv:2005.00341, 2020
- Sander Dieleman, Aaron Oord and Karen Simonyan “The challenge of realistic music generation: modelling raw audio at scale” In Advances in Neural Information Processing Systems 31, 2018
- “LakhNES: Improving Multi-instrumental Music Generation with Cross-domain Pre-training” In ISMIR, 2019
- H. W. Dong and Y. H. Yang “Convolutional generative adversarial networks with binary neurons for polyphonic music generation” In arXiv preprint arXiv:1804.09399, 2018
- “Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment” In Proceedings of the AAAI Conference on Artificial Intelligence, 2018
- “MusPy: A Toolkit for Symbolic Music Generation” In arXiv preprint arXiv:2008.01951, 2020
- “MAPS-A piano database for multipitch estimation and automatic transcription of music”, 2010
- “Neural audio synthesis of musical notes with wavenet autoencoders” In International Conference on Machine Learning, 2017, pp. 1068–1077 PMLR
- “Gansynth: Adversarial neural audio synthesis” In arXiv preprint arXiv:1902.08710, 2019
- “BUILDING THE METAMIDI DATASET: LINKING SYMBOLIC AND AUDIO MUSICAL DATA” In Proc. Int’l Society for Music Information Retrieval Conference (ISMIR), 2021
- “Mmm: Exploring conditional multi-track music generation with the transformer” In arXiv preprint arXiv:2008.06048, 2020
- “Probabilistic grammars and their applications” In International Encyclopedia of the Social & Behavioral Sciences 2002 Pergamon Oxford, 2002, pp. 12075–12082
- “Learning to groove with inverse sequence transformations” In International Conference on Machine Learning (ICML), 2019
- Jon Gillick, Kevin Tang and Robert M Keller “Machine learning of jazz grammars” In Computer Music Journal 34.3 MIT Press, 2010, pp. 56–66
- K. Goel, R. Vohra and J. K. Sahoo “Polyphonic music generation by modeling temporal dependencies using a rnn-dbn” In International Conference on Artificial Neural Networks, 2014, pp. 217–224 Springer
- “Generative adversarial networks” In arXiv preprint arXiv:1406.2661, 2014
- F. Guan, C. Yu and S. Yang “A GAN model with self-attention mechanism to generate multi-instruments symbolic music” In 2019 International Joint Conference on Neural Networks (IJCNN), 2019, pp. 1–6 IEEE
- Gaëtan Hadjeres, Frank Nielsen and François Pachet “GLSR-VAE: Geodesic latent space regularization for variational autoencoder architectures” In 2017 IEEE Symposium Series on Computational Intelligence (SSCI), 2017, pp. 1–7 IEEE
- Gaëtan Hadjeres, François Pachet and Frank Nielsen “Deepbach: a steerable model for bach chorales generation” In International Conference on Machine Learning, 2017, pp. 1362–1371 PMLR
- “Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset” In International Conference on Learning Representations, 2019 URL: https://openreview.net/forum?id=r1lYRjC9F7
- “Onsets and Frames: Dual-Objective Piano Transcription”, 2018 arXiv:1710.11153 [cs.SD]
- “Recognition of music in long-term memory: Are melodic and temporal patterns equal partners?” In Memory & cognition 25.4 Springer, 1997, pp. 518–533
- “The Hitch-Hiker’s Guide to Evolutionary Computation”, 2001 URL: ftp://rtfm.mit.edu/pub/usenet/news.answers/ai-faq/genetic/
- H. Hild, J. Feulner and W. Menzel “HARMONET: A neural net for harmonizing chorales in the style of JS Bach” In Advances in neural information processing systems, 1992, pp. 267–274
- “Generating a Complete Multipart Musical Composition from a Single Monophonic Melody with Functional Scaffolding.” In ICCC, 2012, pp. 111–118 Citeseer
- Andrew Horner “Evolution in Digital Audio Technology” In Evolutionary Computer Music Springer, 2007, pp. 52–78 DOI: 10.1007/978-1-84628-600-1˙3
- “Deep learning for music” In arXiv preprint arXiv:1606.04930, 2016
- “Music transformer: Generating music with long-term structure” In International Conference on Learning Representations, 2018
- “Music Transformer”, 2018 arXiv:1809.04281 [cs.LG]
- “Pop music transformer: Generating music with rhythm and harmony” In arXiv preprint arXiv:2002.00212, 2020
- “Deep generative models for musical audio synthesis”, 2020 arXiv:2006.06426 [eess.AS]
- “Modeling self-repetition in music generation using generative adversarial networks” In Machine Learning for Music Discovery Workshop, ICML, 2019
- D. D. Johnson “Generating polyphonic music using tied parallel networks” In International conference on evolutionary and biologically inspired music and art, 2017, pp. 128–143 Springer
- Anna Jordanous “Four PPPPerspectives on Computational Creativity in Theory and in Practice” In Connection Science 28.2 Taylor & Francis, 2016, pp. 194–216 DOI: 10/ghmd54
- “Music generation with deep learning” In arXiv preprint arXiv:1612.04928, 2016
- Stefano Kalonaris “Re Sound Art Machines and Aesthetics” In Proceedings of Art Machines 2: International Symposium on Machine Learning and Art 2021, 2021
- Thayabaran Kathiresan “Automatic Melody Generation”, 2015
- “A grammatical approach to automatic improvisation” In Proceedings of the 4th Sound and Music Computing Conference, SMC 2007, 2007
- “Neural music synthesis for flexible timbre control” In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019, pp. 176–180 IEEE
- D. P Kingma and M. Welling “Stochastic gradient VB and the variational auto-encoder” In Second International Conference on Learning Representations, ICLR 19, 2014
- “Polyphonic music generation with sequence generative adversarial networks” In arXiv preprint arXiv:1710.11418, 2017
- Fred Lerdahl and Ray S Jackendoff “A Generative Theory of Tonal Music, reissue, with a new preface” MIT press, 1983
- “Automatic Stylistic Composition of Bach Chorales with Deep LSTM.” In ISMIR, 2017, pp. 449–456
- X. Liang, J. Wu and J. Cao “MIDI-Sandwich2: RNN-based Hierarchical Multi-modal Fusion Generation VAE networks for multi-track symbolic music generation” In arXiv preprint arXiv:1909.03522, 2019
- X. Liang, J. Wu and Y. Yin “MIDI-Sandwich: Multi-model Multi-task Hierarchical Conditional VAE-GAN networks for Symbolic Single-track Music Generation” In arXiv preprint arXiv:1907.01607, 2019
- Hyungui Lim, Seungyeon Rhyu and Kyogu Lee “Chord generation from symbolic melody using BLSTM networks” In arXiv preprint arXiv:1712.01011, 2017
- H. M. Liu and Y. H. Yang “Lead sheet generation and arrangement by conditional generative adversarial network” In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), 2018, pp. 722–727 IEEE
- “Modelling high-dimensional sequences with lstm-rtrbm: Application to polyphonic music generation” In Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015
- S. Madjiheurem, L. Qu and C. Walder “Chord2vec: Learning musical chord embeddings” In Proceedings of the constructive machine learning workshop at 30th conference on neural information processing systems (NIPS2016), Barcelona, Spain, 2016
- Sephora Madjiheurem, Lizhen Qu and Christian Walder “Chord2vec: Learning musical chord embeddings” In Proceedings of the constructive machine learning workshop at 30th conference on neural information processing systems (NIPS2016), Barcelona, Spain, 2016
- “LSTM Based Music Generation System” In arXiv preprint arXiv:1908.01080, 2019
- “An end to end model for automatic music generation: Combining deep raw and symbolic audio networks” In Proceedings of the Musical Metacreation Workshop at 9th International Conference on Computational Creativity, Salamanca, Spain, 2018
- “Conditioning deep generative raw audio models for structured automatic music” In arXiv preprint arXiv:1806.09905, 2018
- H. H. Mao, T. Shin and G. Cottrell “DeepJ: Style-specific music generation” In 2018 IEEE 12th International Conference on Semantic Computing (ICSC), 2018, pp. 377–382 IEEE
- “Convolutional Generative Adversarial Network, via Transfer Learning, for Traditional Scottish Music Generation” In Artificial Intelligence in Music, Sound, Art and Design, Lecture Notes in Computer Science Springer International Publishing, 2021, pp. 187–202 DOI: 10.1007/978-3-030-72914-1˙13
- Yuval Marom “Improvising Jazz With Markov Chains”, 1997
- Robert Neil McArthur and Charles Patrick Martin “An Application for Evolutionary Music Composition Using Autoencoders” In Artificial Intelligence in Music, Sound, Art and Design, Lecture Notes in Computer Science Springer International Publishing, 2021, pp. 443–458 DOI: 10.1007/978-3-030-72914-1˙29
- “Hierarchical Timbre-Painting and Articulation Generation” In arXiv preprint arXiv:2008.13095, 2020
- Eduardo Miranda “At the Crossroads of Evolutionary Computation and Music: Self-Programming Synthesizers, Swarm Orchestras and the Origins of Melody” In Evolutionary Computation 12.2, 2004, pp. 137–158 DOI: 10/brg8qw
- M. Modrzejewski, M. Dorobek and P. Rokita “Application of Deep Neural Networks to Music Composition Based on MIDI Datasets and Graphical Representation” In International Conference on Artificial Intelligence and Soft Computing, 2019, pp. 143–152 Springer
- R. A. Moog “Midi: Musical instrument digital interface” In Journal of the Audio Engineering Society 34.5 Audio Engineering Society, 1986, pp. 394–404
- “Autoencoder-based music translation” In International Conference on Learning Representations, 2019
- “Tree-structured probabilistic model of monophonic written music based on the generative theory of tonal music” In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 276–280 IEEE
- B. Nettl “Music” In Grove Music Online Oxford University Press, 2014
- M. Norgaard, M. Montiel and J. Spencer “Chords not required : Incorporating horizontal and vertical aspects independently in a computer improvisation algorithm” Proceedings of the International Symposium on Performance Science, 2013, pp. 725–730
- “Wavenet: A generative model for raw audio” In arXiv preprint arXiv:1609.03499, 2016
- Christine Payne “MuseNet” OpenAI blog post, 2019 URL: \url{https://openai.com/blog/musenet}
- “Towards Automated Counter-Melody Generation for Monophonic Melodies” In Proceedings of the 2017 International Conference on Machine Learning and Soft Computing, 2017, pp. 197–202
- L. R. Rabiner and B. H. Juang “An introduction to hidden Markov models” In IEEE ASSp Magazine, 1986
- Colin Raffel “The lakh midi dataset v0. 1”, 2016
- S. Raja “Music Generation with Temporal Structure Augmentation” In arXiv preprint arXiv:2004.10246, 2020
- “Popmag: Pop music accompaniment generation” In Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1198–1206
- Graeme Ritchie “The Evaluation of Creative Systems” In Computational Creativity: The Philosophy and Engineering of Autonomously Creative Systems, Computational Synthesis and Creative Systems Cham: Springer International Publishing, 2019, pp. 159–194 DOI: 10.1007/978-3-319-43610-4˙8
- “Grammars as Representations for Music” In Computer Music Journal 3.1 The MIT Press, 1979, pp. 48–55
- “A hierarchical latent vector model for learning long-term structure in music” In International Conference on Machine Learning, 2018, pp. 4364–4373 PMLR
- “Learning Latent Representations of Music to Generate Interactive Musical Palettes.” In IUI Workshops, 2018
- P. Roy, A. Papadopoulos and F. Pachet “Sampling variations of lead sheets” In arXiv preprint arXiv:1703.00760, 2017
- Álvaro Sánchez Hidalgo “Generation of jazz improvisations in MATLAB”, 2017
- E.G. Schukat-Talamazzini “Automatische Spracherkennung: Grundlagen, statistische Modelle und effiziente Algorithmen”, Künstliche Intelligenz Braunschweig/Wiesbaden: W. BibelW. von Hahn, Vieweg Verlag, 1995
- Ian Simon, Dan Morris and Sumit Basu “MySong: Automatic Accompaniment Generation for Vocal Melodies” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Association for Computing Machinery, 2008, pp. 725–734 DOI: 10.1145/1357054.1357169
- “Animal music” In Grove Music Online, 2014
- César Souza “Hidden Markov Models in C#” http://crsouza.com/2010/03/23/hidden-markov-models-in-c/, visited 2021-06-02, 2010
- J. St. George and H. Peter. Bischof “Music Style Transformer” In Proceedings on the International Conference on Artificial Intelligence (ICAI), 2019, pp. 22–33 The Steering Committee of The World Congress in Computer Science, Computer …
- Hao Hao Tan and Dorien Herremans “Music fadernets: Controllable music generation based on high-level features via low-level feature modelling” In arXiv preprint arXiv:2007.15474, 2020
- Yifei Teng, An Zhao and Camille Goudeseune “Generating nontrivial melodies for music as a service” In arXiv preprint arXiv:1710.02280, 2017
- J. Thickstun, Z. Harchaoui and S. Kakade “Learning features of music from scratch” In arXiv preprint arXiv:1611.09827, 2016
- V. Tiwari, P. Shivaprasad and R. Rushikesh “Polyphonic Music Generation” In Available at SSRN 3558389, 2020
- Sebastian Trump “Sound Cells in Genetic Improvisation: An Evolutionary Model for Improvised Music” In Artificial Intelligence in Music, Sound, Art and Design, Lecture Notes in Computer Science Cham: Springer International Publishing, 2020, pp. 179–193 DOI: 10/ggrw2r
- “Spirio Sessions: Experiments in Human-Machine Improvisation with a Digital Player Piano” In Proceedings of the 2nd Joint Conference on AI Music Creativity, 2021
- Alan Turing “Computing Machinery and Intelligence” In Mind 59.236, 1950, pp. 433–460
- “Attention is All You Need”, 2017 URL: https://arxiv.org/pdf/1706.03762.pdf
- E. Waite “Generating long-term structure in songs and stories” In Web blog post. Magenta 15.4, 2016
- N. L. Wallin, B. Merker and S. Brown “The origins of music” MIT press, 2001
- C. Walshaw “The ABC musical notation language”, 2000
- B. Wang and Y. H. Yang “PerformanceNet: Score-to-audio music generation with multi-band convolutional residual network” In Proceedings of the AAAI Conference on Artificial Intelligence 33, 2019, pp. 1174–1181
- Z. Wang, S. Zhang and X. Chen “Exploring Inherent Properties of the Monophonic Melody of Songs” In arXiv preprint arXiv:2003.09287, 2020
- “Learning interpretable representation for controllable polyphonic music generation” In arXiv preprint arXiv:2008.07122, 2020
- “The Jazz Transformer on the Front Line: Exploring the Shortcomings of AI-composed Music through Quantitative Measures”, 2020 arXiv:2008.01307 [cs.SD]
- L. C. Yang, S. Y. Chou and Y. H. Yang “MidiNet: A convolutional generative adversarial network for symbolic-domain music generation” In arXiv preprint arXiv:1703.10847, 2017
- Ning Zhang “Learning Adversarial Transformer for Symbolic Music Generation” In IEEE Transactions on Neural Networks and Learning Systems, 2020, pp. 1–10 DOI: 10.1109/TNNLS.2020.2990746
- “A Review of Intelligent Music Generation Systems”, 2022 arXiv:2211.09124 [cs.SD]
- “Music2Dance: DanceNet for Music-driven Dance Generation” In arXiv e-prints, 2020, pp. arXiv–2002
- U. Zölzer “Digital audio signal processing” Wiley Online Library, 2008