Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SonifyAR: Context-Aware Sound Generation in Augmented Reality (2405.07089v3)

Published 11 May 2024 in cs.HC

Abstract: Sound plays a crucial role in enhancing user experience and immersiveness in Augmented Reality (AR). However, current platforms lack support for AR sound authoring due to limited interaction types, challenges in collecting and specifying context information, and difficulty in acquiring matching sound assets. We present SonifyAR, an LLM-based AR sound authoring system that generates context-aware sound effects for AR experiences. SonifyAR expands the current design space of AR sound and implements a Programming by Demonstration (PbD) pipeline to automatically collect contextual information of AR events, including virtual content semantics and real world context. This context information is then processed by a LLM to acquire sound effects with Recommendation, Retrieval, Generation, and Transfer methods. To evaluate the usability and performance of our system, we conducted a user study with eight participants and created five example applications, including an AR-based science experiment, an improving case for AR headset safety, and an assisting example for low vision AR users.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. ARVid - Augmented Reality. Accessed on September 24, 2023.
  2. Freesound. Accessed on September 24, 2023.
  3. Halo AR. Accessed on September 24, 2023.
  4. Adobe. Adobe audition sound effects download page, 2023. Accessed on Date of Access.
  5. Adobe. Adobe aero. https://www.adobe.com/products/aero.html, Accessed September 11, 2023.
  6. Apple. Apple arkit documentation, 2023. Accessed on Oct 9th, 2023.
  7. Apple. Arkit - tracking and visualizing planes, 2023. Accessed on Oct 9th, 2023.
  8. Apple. Scenekit - physics simulation, 2023. Accessed on Oct 9th, 2023.
  9. Apple. Reality composer. https://apps.apple.com/us/app/reality-composer/id1462358802, Accessed September 11, 2023.
  10. Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset, 2023.
  11. Ar4vi: Ar as an accessibility tool for people with visual impairments. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR-Adjunct) (2017), IEEE, pp. 288–292.
  12. Taxonomy and definition of audio augmented reality (aar): A grounded theory study. International Journal of Human-Computer Studies 182 (2024), 103179.
  13. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341 (2020).
  14. Rigid-Body Sound Synthesis with Differentiable Modal Resonators, Oct. 2022. arXiv:2210.15306 [cs, eess].
  15. Gansynth: Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710 (2019).
  16. Using non-speech sound as acoustic modality in augmented reality environment. In 2012 International Symposium on Computer Applications and Industrial Electronics (ISCAIE) (2012), IEEE, pp. 79–82.
  17. Games, E. Unreal engine, 2023. Accessed on Oct 9th, 2023.
  18. Foleygan: Visually guided generative adversarial network-based synchronous sound generation in silent videos. IEEE Transactions on Multimedia (2022).
  19. Live semantic 3d perception for immersive augmented reality. IEEE transactions on visualization and computer graphics 26, 5 (2020), 2012–2022.
  20. Augmented reality (ar) technology on the android operating system in chemistry learning. In IOP conference series: Materials science and engineering (2018), vol. 288, IOP Publishing, p. 012068.
  21. A Taxonomy of Sounds in Virtual Reality. In Proceedings of the 2021 International Conference on Multimodal Interaction (Montréal QC Canada, Oct. 2021), ACM, pp. 80–91.
  22. Deep-Modal: Real-Time Impact Sound Synthesis for Arbitrary Shapes. In Proceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA, Oct. 2020), MM ’20, Association for Computing Machinery, pp. 1171–1179.
  23. NeuralSound: learning-based modal sound synthesis with acoustic transfer. ACM Transactions on Graphics 41, 4 (July 2022), 1–15.
  24. Diffwave: A versatile diffusion model for audio synthesis. arXiv preprint arXiv:2009.09761 (2020).
  25. Development framework for context-aware augmented reality applications. In Companion Proceedings of the 12th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (New York, NY, USA, June 2020), EICS ’20 Companion, Association for Computing Machinery, pp. 1–6.
  26. Melgan: Generative adversarial networks for conditional waveform synthesis. Advances in neural information processing systems 32 (2019).
  27. Virtual agent positioning driven by scene semantics in mixed reality. In 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR) (2019), IEEE, pp. 767–775.
  28. Programming by demonstration: An inductive learning formulation. In Proceedings of the 4th International Conference on Intelligent User Interfaces (New York, NY, USA, 1998), IUI ’99, Association for Computing Machinery, p. 145–152.
  29. Scene-aware behavior synthesis for virtual pets in mixed reality. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (2021), pp. 1–12.
  30. Soundify: Matching sound effects to video. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (2023), pp. 1–13.
  31. Context-aware online adaptation of mixed reality interfaces. In Proceedings of the 32nd annual ACM symposium on user interface software and technology (2019), pp. 147–160.
  32. Audioldm: Text-to-audio generation with latent diffusion models. arXiv preprint arXiv:2301.12503 (2023).
  33. Sound Synthesis, Propagation, and Rendering: A Survey, May 2021. arXiv:2011.05538 [cs].
  34. Teachable reality: Prototyping tangible augmented reality with everyday objects by leveraging interactive machine teaching. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (2023), pp. 1–15.
  35. The Trouble with Augmented Reality/Virtual Reality Authoring Tools. In 2018 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) (Munich, Germany, Oct. 2018), IEEE, pp. 333–337.
  36. Audio retrieval with natural language queries. In INTERSPEECH (2021).
  37. OpenAI. Gpt-4 technical report, 2024.
  38. Scalar: Authoring semantically adaptive augmented reality experiences in virtual reality. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (2022), pp. 1–18.
  39. What can we learn from augmented reality (ar)? benefits and drawbacks of ar for inquiry-based learning of physics. In Proceedings of the 2019 CHI conference on human factors in computing systems (2019), pp. 1–12.
  40. Interactive sound synthesis for large scale environments. In Proceedings of the 2006 symposium on Interactive 3D graphics and games (New York, NY, USA, Mar. 2006), I3D ’06, Association for Computing Machinery, pp. 101–108.
  41. Technologies for multimodal interaction in extended reality—a scoping review. Multimodal Technologies and Interaction 5, 12 (2021), 81.
  42. Example-guided physically based modal sound synthesis. ACM Transactions on Graphics 32, 1 (Feb. 2013), 1:1–1:16.
  43. Auditory augmented reality: Object sonification for the visually impaired. In 2012 IEEE 14th international workshop on multimedia signal processing (MMSP) (2012), IEEE, pp. 319–324.
  44. Auditory augmented reality: Object sonification for the visually impaired. In 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP) (Sept. 2012), pp. 319–324.
  45. Immersive Sound. Focal Press, 2017.
  46. SonifEye: Sonification of Visual Information Using Physical Modeling Sound Synthesis. IEEE Transactions on Visualization and Computer Graphics 23, 11 (Nov. 2017), 2366–2371. Conference Name: IEEE Transactions on Visualization and Computer Graphics.
  47. Rumiński, D. An experimental study of spatial sound usefulness in searching and navigating through ar environments. Virtual Reality 19, 3-4 (2015), 223–233.
  48. Sonic Interactions in Virtual Reality: State of the Art, Current Challenges, and Future Directions. IEEE Computer Graphics and Applications 38, 2 (Mar. 2018), 31–43. Conference Name: IEEE Computer Graphics and Applications.
  49. I hear your true colors: Image guided audio generation. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), pp. 1–5.
  50. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580 (2023).
  51. Augmented reality in stem education: A systematic review. Interactive Learning Environments 30, 8 (2022), 1556–1569.
  52. Retargetable AR: Context-aware Augmented Reality in Indoor Scenes based on 3D Scene Graph. In 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) (Nov. 2020), pp. 249–255.
  53. Technologies, U. Unity, 2023. Accessed on Oct 9th, 2023.
  54. Unity Technologies. Getting started with unity mars. https://unity.com/products/mars/get-started, 2023. Accessed: 2024-04-02.
  55. A Dense Material Segmentation Dataset for Indoor and Outdoor Scene Parsing, July 2022. arXiv:2207.10614 [cs].
  56. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671 (2023).
  57. Large-scale contrastive language-audio pretraining with feature fusion and keyword-to-caption augmentation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2023), IEEE, pp. 1–5.
  58. Diffsound: Discrete diffusion model for text-to-sound generation. IEEE/ACM Transactions on Audio, Speech, and Language Processing (2023).
  59. Visual to sound: Generating natural sound for videos in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (June 2018).
  60. The Role of 3-D Sound in Human Reaction and Performance in Augmented Reality Environments. IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans 37, 2 (Mar. 2007), 262–272. Conference Name: IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com