SOAF: Scene Occlusion-aware Neural Acoustic Field (2407.02264v2)
Abstract: This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset RWAVS and the synthetic dataset SoundSpaces demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/.
- Learning neural acoustic fields. Advances in Neural Information Processing Systems, 35:3165–3177, 2022.
- Av-nerf: Learning neural fields for real-world audio-visual scene synthesis. Advances in Neural Information Processing Systems, 36, 2024.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022.
- Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022a.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM transactions on graphics (TOG), 41(4):1–15, 2022.
- Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF international conference on computer vision, pages 14335–14345, 2021.
- Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7210–7219, 2021.
- Attenuation of sound. Understanding Acoustics: An Experimentalist’s View of Sound and Vibration, pages 673–698, 2020.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv preprint arXiv:2106.10689, 2021.
- Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
- Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. Advances in neural information processing systems, 35:25018–25032, 2022a.
- Neuris: Neural reconstruction of indoor scenes using normal priors. In European Conference on Computer Vision, pages 139–155. Springer, 2022.
- Low frequency interpolation of room impulse responses using compressed sensing. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(1):205–216, 2013.
- Room impulse response interpolation using a sparse spatio-temporal representation of the sound field. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25(10):1929–1941, 2017.
- Kernel ridge regression with constraint of helmholtz equation for sound field interpolation. In 2018 16th International Workshop on Acoustic Signal Enhancement (IWAENC), pages 1–440. IEEE, 2018.
- Parametric wave field coding for precomputed sound propagation. ACM Transactions on Graphics (TOG), 33(4):1–11, 2014.
- Parametric directional coding for precomputed sound propagation. ACM Transactions on Graphics (TOG), 37(4):1–14, 2018.
- Directional sources and listeners in interactive sound propagation using reciprocal wave field coding. ACM Transactions on Graphics (TOG), 39(4):44–1, 2020.
- Inras: Implicit neural representation for audio scenes. Advances in Neural Information Processing Systems, 35:8144–8158, 2022.
- The sound of pixels. In Proceedings of the European conference on computer vision (ECCV), pages 570–586, 2018.
- Audio-visual event localization in unconstrained videos. In Proceedings of the European conference on computer vision (ECCV), pages 247–263, 2018.
- Soundspaces: Audio-visual navigation in 3d environments. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16, pages 17–36. Springer, 2020.
- Semantic audio-visual navigation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15516–15525, 2021.
- Localizing visual sounds the easy way. In European Conference on Computer Vision, pages 218–234. Springer, 2022.
- 2.5 d visual sound. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 324–333, 2019.
- Visual acoustic matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18858–18868, 2022b.
- Self-supervised visual acoustic matching. Advances in Neural Information Processing Systems, 36, 2024.
- Learning audio-visual dereverberation. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023a.
- Adverb: Visually guided audio dereverberation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7884–7896, 2023.
- Ad-nerf: Audio driven neural radiance fields for talking head synthesis. In Proceedings of the IEEE/CVF international conference on computer vision, pages 5784–5794, 2021.
- Audio–visual segmentation. In European Conference on Computer Vision, pages 386–403. Springer, 2022.
- Sep-stereo: Visually guided stereophonic audio generation by associating source separation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII 16, pages 52–69. Springer, 2020.
- Lavss: Location-guided audio-visual spatial audio separation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5508–5519, 2024.
- Novel-view acoustic synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6409–6419, 2023b.
- Neural acoustic context field: Rendering realistic room impulse response with neural fields. arXiv preprint arXiv:2309.15977, 2023.
- Few-shot audio-visual learning of environment acoustics. Advances in Neural Information Processing Systems, 35:2522–2536, 2022.
- Be everywhere-hear everything (bee): Audio scene reconstruction by sparse audio-visual samples. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7853–7862, 2023c.
- Sdfstudio: A unified framework for surface reconstruction, 2022b. URL https://github.com/autonomousvision/sdfstudio.
- Distributing many points on a sphere. The mathematical intelligencer, 19:5–11, 1997.
- The room acoustic rendering equation. The Journal of the Acoustical Society of America, 122(3):1624–1635, 2007.
- Boaz Rafaely. Analysis and design of spherical microphone arrays. IEEE Transactions on speech and audio processing, 13(1):135–143, 2004.
- The replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
- Interactive sound propagation with bidirectional path tracing. ACM Transactions on Graphics (TOG), 35(6):1–11, 2016.
- Bidirectional estimators for light transport. In Photorealistic Rendering Techniques, pages 145–167. Springer, 1995.
- Architectural acoustics, 1989.
- International Organization for Standardization. Advanced audio coding (aac). ISO/IEC 13818-7:2006, 2006.
- Xiph.Org Foundation. Xiph opus. https://opus-codec.org/, 2012.
- Visually informed binaural audio generation without binaural audios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15485–15494, 2021.
- Self-supervised generation of spatial audio for 360 video. Advances in neural information processing systems, 31, 2018.
- Soundspaces 2.0: A simulation platform for visual-acoustic learning. In NeurIPS 2022 Datasets and Benchmarks Track, 2022c.
- Real acoustic fields: An audio-visual room acoustics dataset and benchmark. 2024.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- P Kingma Diederik. Adam: A method for stochastic optimization. (No Title), 2014.
- Julius O Smith. Mathematics of the discrete Fourier transform (DFT): with audio applications. Julius Smith, 2008.