Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation (2407.20955v1)
Abstract: Managing the emotional aspect remains a challenge in automatic music generation. Prior works aim to learn various emotions at once, leading to inadequate modeling. This paper explores the disentanglement of emotions in piano performance generation through a two-stage framework. The first stage focuses on valence modeling of lead sheet, and the second stage addresses arousal modeling by introducing performance-level attributes. To further capture features that shape valence, an aspect less explored by previous approaches, we introduce a novel functional representation of symbolic music. This representation aims to capture the emotional impact of major-minor tonality, as well as the interactions among notes, chords, and key signatures. Objective and subjective experiments validate the effectiveness of our framework in both emotional valence and arousal modeling. We further leverage our framework in a novel application of emotional controls, showing a broad potential in emotion-driven music generation.
- C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, I. Simon, C. Hawthorne, N. Shazeer, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck, “Music Transformer: Generating music with long-term structure,” in Proc. ICLR, 2019.
- Y.-S. Huang and Y.-H. Yang, “Pop Music Transformer: Beat-based modeling and generation of expressive pop piano compositions,” in Proc. ACM Multimed., 2020.
- D. von Rütte, L. Biggio, Y. Kilcher, and T. Hofmann, “FIGARO: Generating symbolic music with fine-grained artistic control,” in Proc. ICLR, 2023.
- K. Chen, C. Wang, T. Berg-Kirkpatrick, and S. Dubnov, “Music sketchnet: Controllable music generation via factorized representations of pitch and rhythm,” in Proc. ISMIR, 2020.
- S.-L. Wu and Y.-H. Yang, “MuseMorphose: Full-song and fine-grained piano music style transfer with one Transformer VAE,” IEEE Trans. Audio, Speech, Lang. Process., vol. 31, pp. 1953–1967, 2023.
- P. Lu, X. Xu, C. Kang, B. Yu, C. Xing, X. Tan, and J. Bian, “MuseCoco: Generating symbolic music from text,” CoRR, vol. abs/2306.00110, 2023.
- H. Hung, J. Ching, S. Doh, N. Kim, J. Nam, and Y.-H. Yang, “EMOPIA: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation,” in Proc. ISMIR, 2021.
- P. L. T. Neves, J. Fornari, and J. B. Florindo, “Generating music with sentiment using Transformer-GANs,” in Proc. ISMIR, 2022.
- S. Ji and X. Yang, “MusER: Musical element-based regularization for generating symbolic music with emotion,” in Proc. AAAI, 2024.
- C. Kang, P. Lu, B. Yu, X. Tan, W. Ye, S. Zhang, and J. Bian, “EmoGen: Eliminating subjective bias in emotional music generation,” CoRR, vol. abs/2307.01229, 2023.
- L. Ferreira and J. Whitehead, “Learning to generate music with sentiment,” in Proc. ISMIR, 2019.
- E. Choi, Y. Chung, S. Lee, J. Jeon, T. Kwon, and J. Nam, “YM2413-MDB: A multi-instrumental FM video game music dataset with emotion annotations,” in Proc. ISMIR, 2022.
- W. Cui, P. Sarmento, and M. Barthet, “MoodLoopGP: Generating emotion-conditioned loop tablature music with multi-granular features,” in Proc. EvoMUSART, 2024.
- K. Zheng, R. Meng, C. Zheng, X. Li, J. Sang, J. Cai, J. Wang, and X. Wang, “EmotionBox: A music-element-driven emotional music generation system based on music psychology,” Frontiers in Psychology, vol. 13, 2022.
- M. T. Haseeb, A. Hammoudeh, and G. Xia, “GPT-4 driven cinematic music generation through text processing,” in Proc. ICASSP, 2024.
- J. A. Russell, “A circumplex model of affect,” Journal of Personality and Social Psychology, 1980.
- Y.-H. Yang, Y.-C. Lin, Y.-F. Su, and H. H. Chen, “A regression approach to music emotion recognition,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 2, pp. 448–457, 2008.
- ——, “Machine recognition of music emotion: A review,” ACM Trans. Intelligent Systems and Technology, vol. 3, no. 3, 2012.
- J. S. G. Cañón, E. Cano, T. Eerola, P. Herrera, X. Hu, Y.-H. Yang, and E. Gómez, “Music emotion recognition: Toward new, robust standards in personalized and context-sensitive applications,” IEEE Signal Process. Magzine, vol. 38, no. 6, pp. 106–114, 2021.
- D. R. Bakker and F. H. Martin, “Musical chords and emotion: major and minor triads are processed for emotion,” Cognitive, Affective, & Behavioral Neuroscience, 2015.
- Y.-C. Wu and H. H. Chen, “Generation of affective accompaniment in accordance with emotion flow,” IEEE Trans. Audio, Speech, Lang. Process., 2016.
- S. Chowdhury and G. Widmer, “On perceived emotion in expressive piano performance: Further experimental evidence for the relevance of mid-level perceptual features,” in Proc. ISMIR, 2021.
- R. Panda, R. Malheiro, and R. P. Paiva, “Audio features for music emotion recognition: A survey,” IEEE Trans. Affective Computing, 2020.
- A. Aljanaki and M. Soleymani, “A data-driven approach to mid-level perceptual musical feature modeling,” in Proc. ISMIR, 2018.
- Y. Hong, R. K. Mo, and A. Horner, “The effects of mode, pitch, and dynamics on valence in piano scales and chord progressions,” in Proc. ICMC, 2018.
- W. Hsiao, J. Liu, Y. Yeh, and Y.-H. Yang, “Compound Word Transformer: Learning to compose full-song music over dynamic directed hypergraphs,” in Proc. AAAI, 2021.
- T. Chen and L. Su, “Functional harmony recognition of symbolic music data with multi-task recurrent neural networks,” in Proc. ISMIR, 2018.
- L. N. Ferreira, L. Mou, J. Whitehead, and L. H. S. Lelis, “Controlling perceived emotion in symbolic music generation with monte carlo tree search,” in Proc. of AAAI (AIIDE Workshop), 2022.
- L. N. Ferreira, L. H. S. Lelis, and J. Whitehead, “Computer-generated music for tabletop role-playing games,” in Proc. of AAAI (AIIDE Workshop, 2020.
- N. N. López, M. Gotham, and I. Fujinaga, “Augmentednet: A roman numeral analysis network with synthetic training examples and additional tonal tasks,” in Proc. ISMIR, 2021.
- G. Micchi, M. Gotham, and M. Giraud, “Not all roads lead to Rome: Pitch representation and model architecture for automatic harmonic analysis,” TISMIR, 2020.
- E. Karystinaios and G. Widmer, “Roman numeral analysis with graph neural networks: Onset-wise predictions from note-wise features,” in Proc. ISMIR, 2023.
- B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, “Music transcription modelling and composition using deep learning,” CoRR, vol. abs/1604.08723, 2016.
- Y. Yeh, W. Hsiao, S. Fukayama, T. Kitahara, B. Genchel, H. Liu, H. Dong, Y. Chen, T. Leong, and Y.-H. Yang, “Automatic melody harmonization with triad chords: A comparative study,” CoRR, vol. abs/2001.02360, 2020.
- Y. Li, S. Li, and G. Fazekas, “An comparative analysis of different pitch and metrical grid encoding methods in the task of sequential music generation,” CoRR, vol. abs/2301.13383, 2023.
- N. S. Keskar, B. McCann, L. R. Varshney, C. Xiong, and R. Socher, “CTRL: A conditional transformer language model for controllable generation,” CoRR, vol. abs/1909.05858, 2019.
- M. Mongeau and D. Sankoff, “Comparison of musical sequences,” Computers and the Humanities, vol. 24, no. 3, pp. 161–175, 1990.
- S.-L. Wu and Y.-H. Yang, “Compose & Embellish: Well-structured piano performance generation via a two-stage approach,” in Proc. ICASSP, 2023.
- “HookTheory,” https://www.hooktheory.com/ [Accessed: (September 1, 2023)].
- C. Donahue, J. Thickstun, and P. Liang, “Melody transcription via generative pre-training,” in Proc. ISMIR, 2022.
- A. L. Uitdenbogerd and J. Zobel, “Manipulation of music for melody matching,” in Proc. ACM Multimed., 1998.
- J. Chang, “Chorders,” https://github.com/joshuachang2311/chorder.
- C. L. Krumhansl, “Cognitive foundations of musical pitch,” Oxford University Press, 2001.
- P. Toiviainen and T. Eerola, “MIDI toolbox 1.1,” https://github.com/miditoolbox/, 2016.
- “Midi_Toolkit,” https://github.com/RetroCirce/Midi_Toolkit [Accessed: (September 1, 2023)].
- Z. Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, S. Dai, and G. Xia, “POP909: A pop-song dataset for music arrangement generation,” in Proc. ISMIR, 2020.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Proc. NeurIPS, 2017.
- Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. V. Le, and R. Salakhutdinov, “Transformer-XL: Attentive language models beyond a fixed-length context,” in Proc. ACL, 2019.
- K. M. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlós, P. Hawkins, J. Q. Davis, A. Mohiuddin, L. Kaiser, D. B. Belanger, L. J. Colwell, and A. Weller, “Rethinking attention with Performers,” in Proc. ICLR, 2021.
- A. Holtzman, J. Buys, M. Forbes, and Y. Choi, “The curious case of neural text degeneration,” in Proc. ICLR, 2019.
- Z. Fu, W. Lam, A. M. So, and B. Shi, “A theoretical analysis of the repetition problem in text generation,” in Proc. AAAI, 2021.
- J. Huang and Y.-H. Yang, “Emotion-driven melody harmonization via melodic variation and functional representation,” CoRR, vol. abs/2407.20176, 2024.
- Jingyue Huang (7 papers)
- Ke Chen (241 papers)
- Yi-Hsuan Yang (89 papers)