NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields (2405.18213v2)
Abstract: Sound plays a major role in human perception. Along with vision, it provides essential information for understanding our surroundings. Despite advances in neural implicit representations, learning acoustics that align with visual scenes remains a challenge. We propose NeRAF, a method that jointly learns acoustic and radiance fields. NeRAF synthesizes both novel views and spatialized room impulse responses (RIR) at new positions by conditioning the acoustic field on 3D scene geometric and appearance priors from the radiance field. The generated RIR can be applied to auralize any audio signal. Each modality can be rendered independently and at spatially distinct positions, offering greater versatility. We demonstrate that NeRAF generates high-quality audio on SoundSpaces and RAF datasets, achieving significant performance improvements over prior methods while being more data-efficient. Additionally, NeRAF enhances novel view synthesis of complex scenes trained with sparse data through cross-modal learning. NeRAF is designed as a Nerfstudio module, providing convenient access to realistic audio-visual generation.
- Fast spectrogram inversion using multi-head convolutional neural networks. IEEE Signal Processing Letters, 2018.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022.
- Local time-domain spherical harmonic spatial encoding for wave-based acoustic simulation. IEEE Signal Processing Letters, 2019.
- The audio-visual batvision dataset for research on sight and sound. In IROS, 2023.
- Interactive sound propagation with bidirectional path tracing. ACM TOG, 2016.
- Soundspaces: Audio-visual navigaton in 3d environments. In ECCV, 2020a.
- Audio-visual waypoints for navigation. CoRR, abs/2008.09622, 2020b.
- Visual acoustic matching. In CVPR, 2022a.
- Soundspaces 2.0: A simulation platform for visual-acoustic learning. In NeurIPS, 2022b.
- Learning audio-visual dereverberation. In ICASSP, 2023a.
- Learning audio-visual dereverberation. In ICASSP, 2023b.
- Y. Chen and G. H. Lee. Dreg-nerf: Deep registration for neural radiance fields. In ICCV, 2023.
- Sound localization from motion: Jointly learning sound direction and camera rotation. In ICCV, 2023c.
- Adverb: Visually guided audio dereverberation. In ICCV, 2023.
- Batvision: Learning to see 3d spatial layout with two ears. In ICRA, 2020.
- Sing: Symbol-to-instrument neural generator. NeurIPS, 2018.
- I. O. for Standardization. Advanced audio coding (aac), 2006.
- X. O. Foundation. Xiph opus. https://opus-codec.org/, 2012.
- Look, listen, and act: Towards audio-visual embodied navigation. In ICRA, 2020.
- R. Gao and K. Grauman. 2.5 d visual sound. In CVPR, 2019.
- Visualechoes: Spatial image representation learning through echolocation. In ECCV, 2020.
- Visually-guided audio spatialization in video with geometry-aware multi-task learning. IJCV, 2023.
- D. Griffin and J. Lim. Signal estimation from modified short-time fourier transform. IEEE Transactions on acoustics, speech, and signal processing, 1984.
- N. A. Gumerov and R. Duraiswami. A broadband fast multipole accelerated boundary element method for the three dimensional helmholtz equation. ASA, 2009.
- Deep residual learning for image recognition. In CVPR, 2016.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Calculating the acoustical room response by the use of a ray tracing technique. JSV, 1968.
- Av-nerf: Learning neural fields for real-world audio-visual scene synthesis. In NeurIPS, 2023.
- Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
- Learning neural acoustic fields. In NeurIPS, 2022.
- Few-shot audio-visual learning of environment acoustics. NeurIPS, 2022.
- Chat2map: Efficient scene mapping from multi-ego conversations. In CVPR, 2023.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In CVPR, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM TOG, 2022.
- Beyond image to depth: Improving depth prediction using echoes. In CVPR, 2021.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
- A fast griffin-lim algorithm. In 2013 IEEE workshop on applications of signal processing to audio and acoustics, 2013.
- Habitat 3.0: A co-habitat for humans, avatars and robots, 2023.
- Audio-visual floorplan reconstruction. In ICCV, 2021.
- Efficient and accurate sound propagation using adaptive rectangular decomposition. TVCG, 2009.
- IR-GAN: Room Impulse Response Generator for Far-Field Speech Recognition. In Interspeech, 2021.
- Fast-rir: Fast neural diffuse room impulse response generator. In ICASSP, 2022.
- L. Savioja and U. P. Svensson. Overview of geometrical room acoustic modeling techniques. ASA, 2015.
- Habitat: A Platform for Embodied AI Research. In ICCV, 2019.
- C. Schissler and D. Manocha. Interactive sound propagation and rendering for large multi-source scenes. ACM TOG, 2016.
- Image2reverb: Cross-modal reverb impulse response synthesis. In ICCV, 2021.
- Self-supervised visual acoustic matching. NeurIPS, 2024.
- Gridless 3d recovery of image sources from room impulse responses. IEEE Signal Processing Letters, 2022.
- The Replica dataset: A digital replica of indoor spaces. arXiv preprint arXiv:1906.05797, 2019.
- Inras: Implicit neural representation for audio scenes. In NeurIPS, 2022.
- Habitat 2.0: Training home assistants to rearrange their habitat. In NeurIPS, 2021.
- Nerfstudio: A modular framework for neural radiance field development. In SIGGRAPH, 2023.
- D. Thery and B. F. Katz. Anechoic audio and 3d-video content database of small ensemble performances for virtual concerts. In Intl Cong on Acoustics (ICA), 2019.
- L. L. Thompson. A review of finite-element methods for time-harmonic acoustics. ASA, 2006.
- M. Vorländer. Simulation of the transient and steady-state sound propagation in rooms using a new combined ray-tracing/image-source algorithm. ASA, 1989.
- Nerf–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064, 2021.
- Point-nerf: Point-based neural radiance fields. In CVPR, 2022.
- Probability density distillation with generative adversarial networks for high-quality parallel waveform generation. Interspeech, 2019.
- Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP, 2020.
- Catch me if you hear me: Audio-visual navigation in complex unmapped environments with moving sounds. RA-L, 2023.
- Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
- Beyond visual field of view: Perceiving 3d environment with echoes and vision. arXiv preprint arXiv:2207.01136, 2022.