Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ActiveRIR: Active Audio-Visual Exploration for Acoustic Environment Modeling (2404.16216v1)

Published 24 Apr 2024 in cs.CV, cs.RO, cs.SD, and eess.AS

Abstract: An environment acoustic model represents how sound is transformed by the physical characteristics of an indoor environment, for any given source/receiver location. Traditional methods for constructing acoustic models involve expensive and time-consuming collection of large quantities of acoustic data at dense spatial locations in the space, or rely on privileged knowledge of scene geometry to intelligently select acoustic data sampling locations. We propose active acoustic sampling, a new task for efficiently building an environment acoustic model of an unmapped environment in which a mobile agent equipped with visual and acoustic sensors jointly constructs the environment acoustic model and the occupancy map on-the-fly. We introduce ActiveRIR, a reinforcement learning (RL) policy that leverages information from audio-visual sensor streams to guide agent navigation and determine optimal acoustic data sampling positions, yielding a high quality acoustic model of the environment from a minimal set of acoustic samples. We train our policy with a novel RL reward based on information gain in the environment acoustic model. Evaluating on diverse unseen indoor environments from a state-of-the-art acoustic simulation platform, ActiveRIR outperforms an array of methods--both traditional navigation agents based on spatial novelty and visual exploration as well as existing state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. V. Välimäki, J. Parker, L. Savioja, J. Smith, and J. Abel, “More than 50 years of artificial reverberation,” in Proc. 60th International Conference of the Audio Engineering Society, S. Goetze and A. Spriet, Eds.   United States: Audio Engineering Society, 2016, aES International Conference on Dereverberation and Reverberation of Audio, Music, and Speech, DREAMS ; Conference date: 03-02-2016 Through 05-02-2016.
  2. S. Majumder, C. Chen, Z. Al-Halah, and K. Grauman, “Few-shot audio-visual learning of environment acoustics,” 2022.
  3. A. Luo, Y. Du, M. J. Tarr, J. B. Tenenbaum, A. Torralba, and C. Gan, “Learning neural acoustic fields,” 2023.
  4. A. Ratnarajah, Z. Tang, R. C. Aralikatti, and D. Manocha, “Mesh2ir: Neural acoustic impulse response generator for complex 3d scenes,” arXiv preprint arXiv:2205.09248, 2022.
  5. A. Ratnarajah and D. Manocha, “Listen2scene: Interactive material-aware binaural sound propagation for reconstructed 3d scenes,” 2024.
  6. A. Ratnarajah, S.-X. Zhang, M. Yu, Z. Tang, D. Manocha, and D. Yu, “Fast-rir: Fast neural diffuse room impulse response generator,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 571–575.
  7. C. Chen, S. Majumder, Z. Al-Halah, R. Gao, S. K. Ramakrishnan, and K. Grauman, “Learning to set waypoints for audio-visual navigation,” 2021.
  8. S. Majumder and K. Grauman, “Active audio-visual separation of dynamic sound sources,” 2022.
  9. S. Majumder, Z. Al-Halah, and K. Grauman, “Move2hear: Active audio-visual source separation,” 2021.
  10. S. Liang, C. Huang, Y. Tian, A. Kumar, and C. Xu, “Neural acoustic context field: Rendering realistic room impulse response with neural fields,” 2023.
  11. S. K. Ramakrishnan, D. Jayaraman, and K. Grauman, “An exploration of embodied visual exploration,” 2020.
  12. M. Bellemare, S. Srinivasan, G. Ostrovski, T. Schaul, D. Saxton, and R. Munos, “Unifying count-based exploration and intrinsic motivation,” in Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29.   Curran Associates, Inc., 2016.
  13. A. L. Strehl and M. L. Littman, “An analysis of model-based Interval Estimation for Markov Decision Processes,” Journal of Computer and System Sciences, vol. 74, no. 8, pp. 1309–1331, 2008. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0022000008000767
  14. C. Chen, C. Schissler, S. Garg, P. Kobernik, A. Clegg, P. Calamia, D. Batra, P. W. Robinson, and K. Grauman, “Soundspaces 2.0: A simulation platform for visual-acoustic learning,” 2023.
  15. A. Chang, A. Dai, T. Funkhouser, M. Halber, M. Niessner, M. Savva, S. Song, A. Zeng, and Y. Zhang, “Matterport3d: Learning from rgb-d data in indoor environments,” International Conference on 3D Vision (3DV), 2017.
  16. N. Savinov, A. Raichuk, R. Marinier, D. Vincent, M. Pollefeys, T. Lillicrap, and S. Gelly, “Episodic curiosity through reachability,” in International Conference on Learning Representations (ICLR), 2019.
  17. N. Singh, J. Mentch, J. Ng, M. Beveridge, and I. Drori, “Image2reverb: Cross-modal reverb impulse response synthesis,” 2021.
  18. L. Remaggi, H. Kim, P. J. B. Jackson, and A. Hilton, “Reproducing real world acoustics in virtual reality using spherical cameras,” 2019.
  19. C. Chen, R. Gao, P. Calamia, and K. Grauman, “Visual acoustic matching,” 2022.
  20. R. Gao and K. Grauman, “2.5d visual sound,” 2019.
  21. A. Somayazulu, C. Chen, and K. Grauman, “Self-supervised visual acoustic matching,” 2023.
  22. X. Xu, H. Zhou, Z. Liu, B. Dai, X. Wang, and D. Lin, “Visually informed binaural audio generation without binaural audios,” 2021.
  23. K. K. Rachavarapu, A. Aakanksha, V. Sundaresha, and R. A. N, “Localize to binauralize: Audio spatialization from visual sound source localization,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 1910–1919.
  24. N. Raghuvanshi and J. Snyder, “Parametric Wave Field Coding for Precomputed Sound Propagation,” ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH 2014, vol. 33, July 2014. [Online]. Available: https://www.microsoft.com/en-us/research/publication/parametric-wave-field-coding-precomputed-sound-propagation/
  25. R. Mehra, L. Antani, S. Kim, and D. Manocha, “Source and listener directivity for interactive wave-based sound propagation,” IEEE Transactions on Visualization and Computer Graphics, vol. 20, no. 4, pp. 495–503, 2014.
  26. C. R. A. Chaitanya, N. Raghuvanshi, K. W. Godin, Z. Zhang, D. Nowrouzezahrai, and J. M. Snyder, “Directional sources and listeners in interactive sound propagation using reciprocal wave field coding,” ACM Trans. Graph., vol. 39, no. 4, aug 2020. [Online]. Available: https://doi.org/10.1145/3386569.3392459
  27. C. Chen, U. Jain, C. Schissler, S. V. A. Gari, Z. Al-Halah, V. K. Ithapu, P. Robinson, and K. Grauman, “Soundspaces: Audio-visual navigation in 3d environments,” 2020.
  28. C. Chen, Z. Al-Halah, and K. Grauman, “Semantic audio-visual navigation,” 2021.
  29. Y. Yu, W. Huang, F. Sun, C. Chen, Y. Wang, and X. Liu, “Sound adversarial audio-visual navigation,” 2022.
  30. S. Purushwalkam, S. V. A. Gari, V. K. Ithapu, C. Schissler, P. Robinson, A. Gupta, and K. Grauman, “Audio-visual floorplan reconstruction,” arXiv preprint arXiv:2012.15470, 2020.
  31. S. Majumder, H. Jiang, P. Moulon, E. Henderson, P. Calamia, K. Grauman, and V. K. Ithapu, “Chat2map: Efficient scene mapping from multi-ego conversations,” 2023.
  32. X. Hu, S. Purushwalkam, D. Harwath, and K. Grauman, “Learning to map efficiently by active echolocation,” 2023.
  33. Y. Yu, C. Chen, L. Cao, F. Yang, and F. Sun, “Measuring Acoustics with Collaborative Multiple Agents,” in Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, E. Elkind, Ed.   International Joint Conferences on Artificial Intelligence Organization, Aug. 2023, pp. 335–343. [Online]. Available: https://doi.org/10.24963/ijcai.2023/38
  34. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  35. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  36. T. Chen, S. Gupta, and A. Gupta, “Learning exploration policies for navigation,” 2019.
  37. E. Wijmans, A. Kadian, A. Morcos, S. Lee, I. Essa, D. Parikh, M. Savva, and D. Batra, “Dd-ppo: Learning near-perfect pointgoal navigators from 2.5 billion frames,” arXiv preprint arXiv:1911.00357, 2019.
  38. M. Savva, A. Kadian, O. Maksymets, Y. Zhao, E. Wijmans, B. Jain, J. Straub, J. Liu, V. Koltun, J. Malik, et al., “Habitat: A platform for embodied ai research,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9339–9347.
Citations (1)

Summary

We haven't generated a summary for this paper yet.