ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts
Abstract: Film scores are considered an essential part of the film cinematic experience, but the process of film score generation is often expensive and infeasible for small-scale creators. Automating the process of film score composition would provide useful starting points for music in small projects. In this paper, we propose a two-stage pipeline for generating music from a movie script. The first phase is the Sentiment Analysis phase where the sentiment of a scene from the film script is encoded into the valence-arousal continuous space. The second phase is the Conditional Music Generation phase which takes as input the valence-arousal vector and conditionally generates piano MIDI music to match the sentiment. We study the efficacy of various music generation architectures by performing a qualitative user survey and propose methods to improve sentiment-conditioning in VAE architectures.
- Musiclm: Generating music from text. arXiv preprint arXiv:2301.11325, 2023.
- Musav: a dataset of relative arousal-valence annotations for validation of audio models. 2022.
- Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349, 2015.
- EmoBank: Studying the impact of annotation perspective and representation format on dimensional emotion analysis. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 578–585, Valencia, Spain, April 2017. Association for Computational Linguistics.
- Goemotions: A dataset of fine-grained emotions. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4040–4054, 2020.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
- Paul Ekman. An argument for basic emotions. Cognition & emotion, 6(3-4):169–200, 1992.
- Latent constraints: Learning to generate conditionally from unconditional generative models. arXiv preprint arXiv:1711.05772, 2017.
- Learning to generate music with sentiment. arXiv preprint arXiv:2103.06125, 2021.
- MidiTok: A python package for MIDI file tokenization. In Extended Abstracts for the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference, 2021.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2017.
- Compound word transformer: Learning to compose full-song music over dynamic directed hypergraphs. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 178–186, 2021.
- Music transformer. arXiv preprint arXiv:1809.04281, 2018.
- Movienet: A holistic dataset for movie understanding. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pages 709–727. Springer, 2020.
- Pop music transformer: Beat-based modeling and generation of expressive pop piano compositions. In Proceedings of the 28th ACM International Conference on Multimedia, 2020.
- Emopia: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation. arXiv preprint arXiv:2108.01374, 2021.
- IMSDb. The internet movie script database (imsdb).
- Transformer vae: A hierarchical model for structure-aware and interpretable music representation learning. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 516–520. IEEE, 2020.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011.
- Saif M. Mohammad. Obtaining reliable human ratings of valence, arousal, and dominance for 20,000 english words. In Proceedings of The Annual Conference of the Association for Computational Linguistics (ACL), Melbourne, Australia, 2018.
- Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3):436–465, 2013.
- It’s levasa not leviosa! latent encodings for valence-arousal structure alignment. In Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD), pages 238–242, 2021.
- Generating music with sentiment using transformer-gans. arXiv preprint arXiv:2212.11134, 2022.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 188–197, 2019.
- This time with feeling: Learning expressive musical performance. Neural Computing and Applications, 2018.
- Attribute-based regularization of latent spaces for variational auto-encoders. Neural Computing and Applications, 33:4429–4444, 2021.
- Robert Plutchik. The emotions. University Press of America, 1991.
- The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and psychopathology, 17(3):715–734, 2005.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
- A hierarchical latent vector model for learning long-term structure in music. In International conference on machine learning, pages 4364–4373. PMLR, 2018.
- Evidence for universality and cultural variation of differential emotion response patterning. Journal of personality and social psychology, 66(2):310, 1994.
- Sonus texere! automated dense soundtrack construction for books using movie adaptations.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
- Music fadernets: Controllable music generation based on high-level features via low-level feature modelling. arXiv preprint arXiv:2007.15474, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Figaro: Generating symbolic music with fine-grained artistic control. arXiv preprint arXiv:2201.10936, 2022.
- Pianotree vae: Structured representation learning for polyphonic music. arXiv preprint arXiv:2008.07118, 2020.
- Automated screenplay annotation for extracting storytelling knowledge. In Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, 2017.
- Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems, 32, 2019.
- Yelp. Yelp open dataset.
- Musicbert: Symbolic music understanding with large-scale pre-training, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.