The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos (2404.04420v1)
Abstract: Neural models are one of the most popular approaches for music generation, yet there aren't standard large datasets tailored for learning music directly from game data. To address this research gap, we introduce a novel dataset named NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is built upon the Nintendo Entertainment System Music Database (NES-MDB), encompassing 5,278 music pieces from 397 NES games. Our approach involves collecting long-play videos for 389 games of the original dataset, slicing them into 15-second-long clips, and extracting the audio from each clip. Subsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to automatically identify the corresponding piece in the NES-MDB dataset. Additionally, we introduce a baseline method based on the Controllable Music Transformer to generate NES music conditioned on gameplay clips. We evaluated this approach with objective metrics, and the results showed that the conditional CMT improves musical structural quality when compared to its unconditional counterpart. Moreover, we used a neural classifier to predict the game genre of the generated pieces. Results showed that the CMT generator can learn correlations between gameplay videos and game genres, but further research has to be conducted to achieve human-level performance.
- Sara Cardinale and Oliver Withington. 2023. HarmonyMapper: Generating Emotionally Diverse Chord Progressions for Games. In Proceedings of The Experimental AI in Games Workshop (EXAG’23).
- Video background music generation with controllable music transformer. In Proceedings of the 29th ACM International Conference on Multimedia. 2037–2045.
- The NES music database: A multi-instrumental dataset with expressive performance attributes. arXiv preprint arXiv:1806.04278 (2018).
- MusPy: A toolkit for symbolic music generation. arXiv preprint arXiv:2008.01951 (2020).
- Musegan: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
- Computer-generated music for tabletop role-playing games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 16. 59–65.
- Controlling perceived emotion in symbolic music generation with monte carlo tree search. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 18. 163–170.
- Lucas N. Ferreira and Jim Whitehead. 2019. Learning to Generate Music with Sentiment. In Proceedings of the Conference of the International Society for Music Information Retrieval (ISMIR’19).
- Foley music: Learning to generate music from videos. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI 16. Springer, 758–775.
- Learning to create piano performances. In NIPS 2017 Workshop on Machine Learning for Creativity and Design.
- Affective evolutionary music composition with MetaCompose. Genetic Programming and Evolvable Machines 18 (2017), 433–465.
- Audeo: Audio generation for a silent performance video. Advances in Neural Information Processing Systems 33 (2020), 3325–3337.
- Multi-instrumentalist net: Unsupervised generation of music from body movements. arXiv preprint arXiv:2012.03478 (2020).
- Avery Wang et al. 2003. An industrial strength audio search algorithm.. In Ismir, Vol. 2003. Washington, DC, 7–13.
- Learning interpretable representation for controllable polyphonic music generation. arXiv preprint arXiv:2008.07122 (2020).
- Dynamic game soundtrack generation in response to a continuously varying emotional trajectory. In Audio engineering society conference: 56th international conference: Audio for games. Audio Engineering Society.
- Musicbert: Symbolic music understanding with large-scale pre-training. arXiv preprint arXiv:2106.05630 (2021).
- Video background music generation: Dataset, method and evaluation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 15637–15647.