LOAF-M2L: Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation (2307.02146v2)
Abstract: Despite previous efforts in melody-to-lyric generation research, there is still a significant compatibility gap between generated lyrics and melodies, negatively impacting the singability of the outputs. This paper bridges the singability gap with a novel approach to generating singable lyrics by jointly Learning wOrding And Formatting during Melody-to-Lyric training. After general-domain pretraining, our proposed model acquires length awareness first from a large text-only lyric corpus. Then, we introduce a new objective informed by musicological research on the relationship between melody and lyrics during melody-to-lyric training, which enables the model to learn the fine-grained format requirements of the melody. Our model achieves 3.75% and 21.44% absolute accuracy gains in the outputs' number-of-line and syllable-per-line requirements compared to naive fine-tuning, without sacrificing text fluency. Furthermore, our model demonstrates a 63.92% and 74.18% relative improvement of music-lyric compatibility and overall quality in the subjective evaluation, compared to the state-of-the-art melody-to-lyric generation model, highlighting the significance of formatting learning.
- N. Liu, W. Han, G. Liu, D. Peng, R. Zhang, X. Wang, and H. Ruan, “Chipsong: A controllable lyric generation system for chinese popular song,” in Proceedings of the First Workshop on Intelligent and Interactive Writing Assistants (In2Writing 2022), 2022, pp. 85–95.
- Z. Sheng, K. Song, X. Tan, Y. Ren, W. Ye, S. Zhang, and T. Qin, “Songmass: Automatic song writing with pre-training and alignment constraint,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 15, 2021, pp. 13 798–13 805.
- X. Ma, Y. Wang, M.-Y. Kan, and W. S. Lee, “Ai-lyricist: Generating music and vocabulary constrained lyrics,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 1002–1011.
- H.-P. Lee, J.-S. Fang, and W.-Y. Ma, “icomposer: An automatic songwriting system for chinese popular music,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), 2019, pp. 84–88.
- P. Li, H. Zhang, X. Liu, and S. Shi, “Rigid formats controlled text generation,” in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 742–751.
- F. Guo, C. Zhang, Z. Zhang, Q. He, K. Zhang, J. Xie, and J. Boyd-Graber, “Automatic song translation for tonal languages,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 729–743.
- P. Low, “Singable translations of songs,” Perspectives: Studies in Translatology, vol. 11, no. 2, pp. 87–103, 2003.
- M. Ghazvininejad, Y. Choi, and K. Knight, “Neural poetry translation,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 67–71.
- L. Xue, K. Song, D. Wu, X. Tan, N. L. Zhang, T. Qin, W.-Q. Zhang, and T.-Y. Liu, “Deeprapper: Neural rap generation with rhyme and rhythm modeling,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 69–81.
- E. Nichols, D. Morris, S. Basu, and C. Raphael, “Relationships between lyrics and melody in popular music,” in ISMIR 2009-Proceedings of the 11th International Society for Music Information Retrieval Conference, 2009, pp. 471–476.
- K. Watanabe, Y. Matsubayashi, S. Fukayama, M. Goto, K. Inui, and T. Nakano, “A melody-conditioned lyrics language model,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 163–172.
- Y. Chen and A. Lerch, “Melody-conditioned lyrics generation with seqgans,” in 2020 IEEE International Symposium on Multimedia (ISM). IEEE, 2020, pp. 189–196.
- T. Qian, J. Shi, S. Guo, P. Wu, and Q. Jin, “Training strategies for automatic song writing: A unified framework perspective,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 4738–4742.
- J. Li, P. Wang, Z. Li, X. Liu, M. Utiyama, E. Sumita, H. Zhao, and H. Ai, “A fuzzy training framework for controllable sequence-to-sequence generation,” IEEE Access, vol. 10, pp. 92 467–92 480, 2022.
- X. Wu, Z. Du, Y. Guo, and H. Fujita, “Hierarchical attention based long short-term memory for chinese lyric generation,” Applied Intelligence, vol. 49, pp. 44–52, 2019.
- G. Barbieri, F. Pachet, P. Roy, and M. Degli Esposti, “Markov constraints for generating lyrics with style.” in Ecai, vol. 242, 2012, pp. 115–120.
- G. Lingan, “A model designed for automatic generated rap lyrics in given gender and style,” in ISMIR 2021-Proceedings of the 23th International Society for Music Information Retrieval Conference, Late Breaking Demo, 2021.
- R. Zhang, X. Mao, L. Li, L. Jiang, L. Chen, Z. Hu, Y. Xi, C. Fan, and M. Huang, “Youling: an ai-assisted lyrics creation system,” arXiv preprint arXiv:2201.06724, 2022.
- N. I. Nikolov, E. Malmi, C. Northcutt, and L. Parisi, “Rapformer: Conditional rap lyrics generation with denoising autoencoders,” in Proceedings of the 13th International Conference on Natural Language Generation, 2020, pp. 360–373.
- Y.-F. Huang and K.-C. You, “Automated generation of chinese lyrics based on melody emotions,” IEEE Access, vol. 9, pp. 98 060–98 071, 2021.
- J.-W. Chang, J. C. Hung, and K.-C. Lin, “Singability-enhanced lyric generator with music style transfer,” Computer Communications, vol. 168, pp. 33–53, 2021.
- X. Lu, J. Wang, B. Zhuang, S. Wang, and J. Xiao, “A syllable-structured, contextually-based conditionally generation of chinese lyrics,” in PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26-30, 2019, Proceedings, Part III 16. Springer, 2019, pp. 257–265.
- L. Zhang, R. Zhang, X. Mao, and Y. Chang, “Qiuniu: A chinese lyrics generation system with passage-level input,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2022, pp. 76–82.
- T. Melistas, T. Giannakopoulos, and G. Paraskevopoulos, “Lyrics and vocal melody generation conditioned on accompaniment,” in Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio (NLP4MusA), 2021, pp. 11–16.
- K. Watanabe and M. Goto, “Atypical lyrics completion considering musical audio signals,” in MultiMedia Modeling: 27th International Conference, MMM 2021, Prague, Czech Republic, June 22–24, 2021, Proceedings, Part I 27. Springer, 2021, pp. 174–186.
- P. Potash, A. Romanov, and A. Rumshisky, “Ghostwriter: Using an lstm for automatic rap lyric generation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015, pp. 1919–1924.
- Y. Tong, Y. Liu, J. Wang, and G. Xin, “Text steganography on rnn-generated lyrics,” Mathematical Biosciences and Engineering, vol. 16, no. 5, pp. 5451–5463, 2019.
- Y. Yu, A. Srivastava, and S. Canales, “Conditional lstm-gan for melody generation from lyrics,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 17, no. 1, pp. 1–20, 2021.
- G. Meseguer-Brocal, A. Cohen-Hadria, and G. Peeters, “Creating dali, a large dataset of synchronized audio, lyrics, and notes,” Transactions of the International Society for Music Information Retrieval, vol. 3, no. 1, 2020.
- M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019. [Online]. Available: https://openreview.net/forum?id=Bkg6RiCqY7