Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EmoGen: Eliminating Subjective Bias in Emotional Music Generation (2307.01229v1)

Published 3 Jul 2023 in cs.SD, cs.AI, cs.LG, cs.MM, and eess.AS

Abstract: Music is used to convey emotions, and thus generating emotional music is important in automatic music generation. Previous work on emotional music generation directly uses annotated emotion labels as control signals, which suffers from subjective bias: different people may annotate different emotions on the same music, and one person may feel different emotions under different situations. Therefore, directly mapping emotion labels to music sequences in an end-to-end way would confuse the learning process and hinder the model from generating music with general emotions. In this paper, we propose EmoGen, an emotional music generation system that leverages a set of emotion-related music attributes as the bridge between emotion and music, and divides the generation into two stages: emotion-to-attribute mapping with supervised clustering, and attribute-to-music generation with self-supervised learning. Both stages are beneficial: in the first stage, the attribute values around the clustering center represent the general emotions of these samples, which help eliminate the impacts of the subjective bias of emotion labels; in the second stage, the generation is completely disentangled from emotion labels and thus free from the subjective bias. Both subjective and objective evaluations show that EmoGen outperforms previous methods on emotion control accuracy and music quality respectively, which demonstrate our superiority in generating emotional music. Music samples generated by EmoGen are available via this link:https://ai-muzic.github.io/emogen/, and the code is available at this link:https://github.com/microsoft/muzic/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Generating music with emotions. IEEE Transactions on Multimedia, 2022.
  2. Music sketchnet: Controllable music generation via factorized representations of pitch and rhythm. In Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp.  77–84, 2020.
  3. Video background music generation with controllable music transformer. In Proceedings of ACM International Conference on Multimedia (MM), pp.  2037–2045, 2021.
  4. On large-scale genre classification in symbolically encoded music by automatic identification of repeating patterns. In Proceedings of International Conference on Digital Libraries for Musicology (DLfM), pp.  34–37, 2018.
  5. Learning to generate music with sentiment. In Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp.  384–390, 2019.
  6. Computer-generated music for tabletop role-playing games. In Proceedings of the Sixteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp.  59–65, 2020.
  7. Controlling perceived emotion in symbolic music generation with monte carlo tree search. In Proceedings of AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp.  163–170, 2022.
  8. Monophonic music generation with a given emotion using conditional variational autoencoder. IEEE Access, 9:129088–129101, 2021.
  9. Music composition with deep learning: A review. Advances in Speech and Music Technology: Computational Aspects and Applications, pp.  25–50, 2022.
  10. Ho, T. K. Random decision forests. In Proceedings of International Conference on Document Analysis and Recognition (ICDAR), pp.  278–282, 1995.
  11. Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. In Proceedings of AAAI Conference on Artificial Intelligence (AAAI), volume 35, pp.  178–186, 2021.
  12. Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of ACM International Conference on Multimedia (MM), pp.  1180–1188, 2020.
  13. EMOPIA: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation. In Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp.  318–325, 2021.
  14. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning, pp. 5156–5165. PMLR, 2020.
  15. Attributes-aware deep music transformation. In Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp.  670–677, 2020.
  16. Music emotion recognition: A state of the art review. In Proc. ismir, volume 86, pp.  937–952, 2010.
  17. Adam: A method for stochastic optimization. In Proceedings of International Conference on Learning Representations (ICLR), 2015.
  18. Lloyd, S. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
  19. Sentimozart: Music generation based on emotions. In Proceedings of International Conference on Agents and Artificial Intelligence (ICAART), pp.  501–506, 2018.
  20. JSYMBOLIC 2.2: Extracting features from symbolic music for use in musicological and MIR research. In Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp.  348–354, 2018.
  21. Generating music with sentiment using transformer-gans. arXiv preprint arXiv:2212.11134, 2022.
  22. Generating music with emotion using transformer. In Proceedings of International Conference on Computer Science and Engineering (IC2SE), volume 1, pp.  1–6, 2021.
  23. Raffel, C. Learning-Based Methods for Comparing Sequences, with Applications to Audio-to-MIDI Alignment and Matching. PhD thesis, Columbia University, 2016.
  24. Russell, J. A. A circumplex model of affect. Journal of personality and social psychology, 39(6):1161, 1980.
  25. Theme transformer: Symbolic music generation with theme-conditioned transformer. IEEE Transactions on Multimedia, 2022.
  26. Symbolic music generation conditioned on continuous-valued emotions. IEEE Access, 10:44617–44626, 2022.
  27. Music fadernets: Controllable music generation based on high-level features via low-level feature modelling. In Proceedings of International Society for Music Information Retrieval Conference (ISMIR), pp.  109–116, 2020.
  28. Extending music based on emotion and tonality via generative adversarial network. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.  86–90, 2021.
  29. Figaro: Generating symbolic music with fine-grained artistic control. arXiv preprint arXiv:2201.10936, 2022.
  30. Musemorphose: Full-song and fine-grained music style transfer with one transformer vae. arXiv preprint arXiv:2105.04090, 2021.
  31. Museformer: Transformer with fine-and coarse-grained attention for music generation. arXiv preprint arXiv:2210.10349, 2022.
  32. Domain adversarial training on conditional variational auto-encoder for controllable music generation. arXiv preprint arXiv:2209.07144, 2022.
  33. Emotionbox: a music-element-driven emotional music generation system using recurrent neural network. arXiv preprint arXiv:2112.08561, 2021.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chenfei Kang (2 papers)
  2. Peiling Lu (8 papers)
  3. Botao Yu (13 papers)
  4. Xu Tan (164 papers)
  5. Wei Ye (110 papers)
  6. Shikun Zhang (82 papers)
  7. Jiang Bian (229 papers)
Citations (4)