Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Measuring Acoustics with Collaborative Multiple Agents (2310.05368v1)

Published 9 Oct 2023 in cs.AI, cs.MA, cs.SD, and eess.AS

Abstract: As humans, we hear sound every second of our life. The sound we hear is often affected by the acoustics of the environment surrounding us. For example, a spacious hall leads to more reverberation. Room Impulse Responses (RIR) are commonly used to characterize environment acoustics as a function of the scene geometry, materials, and source/receiver locations. Traditionally, RIRs are measured by setting up a loudspeaker and microphone in the environment for all source/receiver locations, which is time-consuming and inefficient. We propose to let two robots measure the environment's acoustics by actively moving and emitting/receiving sweep signals. We also devise a collaborative multi-agent policy where these two robots are trained to explore the environment's acoustics while being rewarded for wide exploration and accurate prediction. We show that the robots learn to collaborate and move to explore environment acoustics while minimizing the prediction error. To the best of our knowledge, we present the very first problem formulation and solution to the task of collaborative environment acoustics measurements with multiple agents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Interactive sound propagation with bidirectional path tracing. ACM Trans. Graph., 35(6):180:1–180:11, 2016.
  2. Matterport3d: Learning from RGB-D data in indoor environments. In International Conference on 3D Vision, pages 667–676, 2017.
  3. Soundspaces: Audio-visual navigation in 3d environments. In ECCV, pages 17–36, 2020.
  4. Semantic audio-visual navigation. In CVPR, pages 15516–15525, 2021.
  5. Learning to set waypoints for audio-visual navigation. In ICLR, 2021.
  6. Visual acoustic matching. In CVPR, pages 18836–18846, 2022.
  7. Batvision: Learning to see 3d spatial layout with two ears. In ICRA, pages 1581–1587, 2020.
  8. See, hear, explore: Curiosity via audio-visual association. In NeurIPS, 2020.
  9. gpurir: A python library for room impulse response simulation with GPU acceleration. Multim. Tools Appl., 80(4):5653–5671, 2021.
  10. Counterfactual multi-agent policy gradients. In AAAI, pages 2974–2982, 2018.
  11. Look, listen, and act: Towards audio-visual embodied navigation. In ICRA, pages 9701–9707, 2020.
  12. Finding fallen objects via asynchronous audio-visual integration. In CVPR, pages 10513–10523, 2022.
  13. Visualechoes: Spatial image representation learning through echolocation. In ECCV, pages 658–676, 2020.
  14. Geometry-aware multi-task learning for binaural audio generation from video. In BMVC, page 1, 2021.
  15. Egocentric deep multi-channel audio-visual active speaker localization. In CVPR, pages 10534–10542, 2022.
  16. Progressive growing of gans for improved quality, stability, and variation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018.
  17. Multi-agent actor-critic for mixed cooperative-competitive environments. In NeurIPS, pages 6379–6390, 2017.
  18. Learning neural acoustic fields. CoRR, abs/2204.00628, 2022.
  19. Active audio-visual separation of dynamic sound sources. In ECCV, pages 551–569, 2022.
  20. Few-shot audio-visual learning of environment acoustics. CoRR, abs/2206.04006, 2022.
  21. Storir: Stochastic room impulse response generation for audio data augmentation. In Interspeech, pages 2857–2861, 2020.
  22. UMAP: uniform manifold approximation and projection. J. Open Source Softw., 3(29):861, 2018.
  23. Wave-based sound propagation in large open scenes using an equivalent source formulation. ACM Trans. Graph., 32(2):19:1–19:13, 2013.
  24. An overview of deep-learning-based audio-visual speech enhancement and separation. IEEE ACM Trans. Audio Speech Lang. Process., 29:1368–1396, 2021.
  25. Nerf: representing scenes as neural radiance fields for view synthesis. Commun. ACM, 65(1):99–106, 2022.
  26. Self-supervised generation of spatial audio for 360° video. In NeurIPS, pages 360–370, 2018.
  27. Curiosity-driven exploration by self-supervised prediction. In ICML, pages 2778–2787, 2017.
  28. Audio-visual floorplan reconstruction. In ICCV, pages 1163–1172, 2021.
  29. Localize to binauralize: Audio spatialization from visual sound source localization. In ICCV, pages 1910–1919, 2021.
  30. Occupancy anticipation for efficient exploration and navigation. In ECCV, pages 400–418, 2020.
  31. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In ICML, pages 4292–4301, 2018.
  32. Weighted QMIX: expanding monotonic value function factorisation for deep multi-agent reinforcement learning. In NeurIPS, 2020.
  33. IR-GAN: room impulse response generator for far-field speech recognition. In Interspeech, pages 286–290, 2021.
  34. MESH2IR: neural acoustic impulse response generator for complex 3d scenes. In MM ’22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10 - 14, 2022, pages 924–933. ACM, 2022.
  35. Fast-rir: Fast neural diffuse room impulse response generator. In ICASSP, pages 571–575, 2022.
  36. 2.5 d visual sound. In CVPR, pages 324–333, 2019.
  37. Overview of geometrical room acoustic modeling techniques. The Journal of the Acoustical Society of America, 138(2):708–730, 2015.
  38. Habitat: A platform for embodied AI research. In ICCV, pages 9338–9346, 2019.
  39. High-order diffraction and diffuse reflections for interactive sound propagation in large environments. TOG, 33(4):1–12, 2014.
  40. Proximal policy optimization algorithms. CoRR, abs/1707.06347, 2017.
  41. Multi-channel multi-speaker ASR using 3d spatial feature. In ICASSP, pages 6067–6071, 2022.
  42. Image2reverb: Cross-modal reverb impulse response synthesis. In ICCV, pages 286–295, 2021.
  43. The replica dataset: A digital replica of indoor spaces. CoRR, abs/1906.05797, 2019.
  44. Value-decomposition networks for cooperative multi-agent learning based on team reward. In AAMAS, pages 2085–2087, 2018.
  45. Improving reverberant speech training using diffuse acoustic simulation. In ICASSP, pages 6969–6973, 2020.
  46. GWA: A large high-quality acoustic dataset for audio processing. In SIGGRAPH, pages 36:1–36:9, 2022.
  47. Guided multiview ray tracing for fast auralization. IEEE Trans. Vis. Comput. Graph., 18(11):1797–1810, 2012.
  48. More than 50 years of artificial reverberation. In Audio engineering society conference, 2016.
  49. DOP: off-policy multi-agent decomposed policy gradients. In ICLR, 2021.
  50. Visually informed binaural audio generation without binaural audios. In CVPR, pages 15485–15494, 2021.
  51. Pay self-attention to audio-visual navigation. In BMVC, 2022.
  52. Sound adversarial audio-visual navigation. In ICLR, 2022.
  53. Echo-enhanced embodied visual navigation. Neural Computation, 35(5):958–976, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.