Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SOAF: Scene Occlusion-aware Neural Acoustic Field (2407.02264v2)

Published 2 Jul 2024 in cs.CV, cs.SD, and eess.AS

Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Learning neural acoustic fields. Advances in Neural Information Processing Systems, 35:3165–3177, 2022.
  2. Av-nerf: Learning neural fields for real-world audio-visual scene synthesis. Advances in Neural Information Processing Systems, 36, 2024.
  3. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  4. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
  5. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022a.
  6. Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022.
  7. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF international conference on computer vision, pages 14335–14345, 2021.
  8. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
  9. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
  10. Attenuation of sound. Understanding Acoustics: An Experimentalist’s View of Sound and Vibration, pages 673–698, 2020.
  11. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
  12. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
  13. Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35:25018–25032, 2022a.
  14. Neuris: Neural reconstruction of indoor scenes using normal priors. In European Conference on Computer Vision, pages 139–155. Springer, 2022.
  15. Low frequency interpolation of room impulse responses using compressed sensing. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1):205–216, 2013.
  16. Room impulse response interpolation using a sparse spatio-temporal representation of the sound field. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(10):1929–1941, 2017.
  17. Kernel ridge regression with constraint of helmholtz equation for sound field interpolation. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pages 1–440. IEEE, 2018.
  18. Parametric wave field coding for precomputed sound propagation. ACM Transactions on Graphics (TOG), 33(4):1–11, 2014.
  19. Parametric directional coding for precomputed sound propagation. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.
  20. Directional sources and listeners in interactive sound propagation using reciprocal wave field coding. ACM Transactions on Graphics (TOG), 39(4):44–1, 2020.
  21. Inras: Implicit neural representation for audio scenes. Advances in Neural Information Processing Systems, 35:8144–8158, 2022.
  22. The sound of pixels. In Proceedings of the European conference on computer vision (ECCV), pages 570–586, 2018.
  23. Audio-visual event localization in unconstrained videos. In Proceedings of the European conference on computer vision (ECCV), pages 247–263, 2018.
  24. Soundspaces: Audio-visual navigation in 3d environments. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 17–36. Springer, 2020.
  25. Semantic audio-visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15516–15525, 2021.
  26. Localizing visual sounds the easy way. In European Conference on Computer Vision, pages 218–234. Springer, 2022.
  27. 2.5 d visual sound. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 324–333, 2019.
  28. Visual acoustic matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18858–18868, 2022b.
  29. Self-supervised visual acoustic matching. Advances in Neural Information Processing Systems, 36, 2024.
  30. Learning audio-visual dereverberation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023a.
  31. Adverb: Visually guided audio dereverberation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7884–7896, 2023.
  32. Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5784–5794, 2021.
  33. Audio–visual segmentation. In European Conference on Computer Vision, pages 386–403. Springer, 2022.
  34. Sep-stereo: Visually guided stereophonic audio generation by associating source separation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pages 52–69. Springer, 2020.
  35. Lavss: Location-guided audio-visual spatial audio separation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5508–5519, 2024.
  36. Novel-view acoustic synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6409–6419, 2023b.
  37. Neural acoustic context field: Rendering realistic room impulse response with neural fields. arXiv preprint arXiv:2309.15977, 2023.
  38. Few-shot audio-visual learning of environment acoustics. Advances in Neural Information Processing Systems, 35:2522–2536, 2022.
  39. Be everywhere-hear everything (bee): Audio scene reconstruction by sparse audio-visual samples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7853–7862, 2023c.
  40. Sdfstudio: A unified framework for surface reconstruction, 2022b. URL https://github.com/autonomousvision/sdfstudio.
  41. Distributing many points on a sphere. The mathematical intelligencer, 19:5–11, 1997.
  42. The room acoustic rendering equation. The Journal of the Acoustical Society of America, 122(3):1624–1635, 2007.
  43. Boaz Rafaely. Analysis and design of spherical microphone arrays. IEEE Transactions on speech and audio processing, 13(1):135–143, 2004.
  44. The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
  45. Interactive sound propagation with bidirectional path tracing. ACM Transactions on Graphics (TOG), 35(6):1–11, 2016.
  46. Bidirectional estimators for light transport. In Photorealistic Rendering Techniques, pages 145–167. Springer, 1995.
  47. Architectural acoustics, 1989.
  48. International Organization for Standardization. Advanced audio coding (aac). ISO/IEC 13818-7:2006, 2006.
  49. Xiph.Org Foundation. Xiph opus. https://opus-codec.org/, 2012.
  50. Visually informed binaural audio generation without binaural audios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15485–15494, 2021.
  51. Self-supervised generation of spatial audio for 360 video. Advances in neural information processing systems, 31, 2018.
  52. Soundspaces 2.0: A simulation platform for visual-acoustic learning. In NeurIPS 2022 Datasets and Benchmarks Track, 2022c.
  53. Real acoustic fields: An audio-visual room acoustics dataset and benchmark. 2024.
  54. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  55. P Kingma Diederik. Adam: A method for stochastic optimization. (No Title), 2014.
  56. Julius O Smith. Mathematics of the discrete Fourier transform (DFT): with audio applications. Julius Smith, 2008.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com