Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging AI to Generate Audio for User-generated Content in Video Games (2404.17018v1)

Published 25 Apr 2024 in cs.HC and cs.AI

Abstract: In video game design, audio (both environmental background music and object sound effects) play a critical role. Sounds are typically pre-created assets designed for specific locations or objects in a game. However, user-generated content is becoming increasingly popular in modern games (e.g. building custom environments or crafting unique objects). Since the possibilities are virtually limitless, it is impossible for game creators to pre-create audio for user-generated content. We explore the use of generative artificial intelligence to create music and sound effects on-the-fly based on user-generated content. We investigate two avenues for audio generation: 1) text-to-audio: using a text description of user-generated content as input to the audio generator, and 2) image-to-audio: using a rendering of the created environment or object as input to an image-to-text generator, then piping the resulting text description into the audio generator. In this paper we discuss ethical implications of using generative artificial intelligence for user-generated content and highlight two prototype games where audio is generated for user-created environments and objects.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Deep clustering with generative adversarial network for separation of concurrent speakers. IEEE Access 7 (2019), 45593–45603.
  2. A framework for spatial audio on user-generated contents. IEEE Access 4 (2016), 665–679.
  3. Simple and Controllable Music Generation. In Thirty-seventh Conference on Neural Information Processing Systems.
  4. A Novel Hybrid Generative Adversarial Network for Audio Generation. IEEE Transactions on Multimedia (2021).
  5. Generative adversarial networks for speech enhancement. In 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, 1–5.
  6. Deep learning for spatial audio detection. In Proceedings of the 26th ACM International Conference on Multimedia. 1213–1216.
  7. Y. B. Kafai and M. Resnick. 2020. Constructionism in Practice: Designing, Thinking, and Learning in a Digital World (2 ed.). Routledge.
  8. AudioGen: Textually Guided Audio Generation. (2022). https://doi.org/10.48550/arXiv.2209.15352
  9. BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. https://doi.org/10.48550/arXiv.2201.12086
  10. Media Molecule. 2022. Dreams. Sony Interactive Entertainment. https://www.playstation.com/en-us/games/dreams/
  11. Modelling interaction between audio and visuals in video games using deep reinforcement learning. arXiv preprint arXiv:1804.00455 (2018).
  12. Gillian M. Morriss-Kay. 2010. The evolution of human artistic creativity. Journal of Anatomy 216, 2 (2010), 158–176. https://doi.org/10.1111%2Fj.1469-7580.2009.01160.x
  13. Roblox Corporation. 2022. Roblox. Roblox Corporation. https://www.roblox.com/
  14. Spatial audio driven adaptive visual saliency-based player engagement model. IEEE Transactions on Games 11, 1 (2019), 4–17.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets