Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Source Positions (2305.04447v4)
Abstract: We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field. This task plays a pivotal role in reducing the resource-intensive measurements required for precise sound source separation and localization, essential as the front-end of speech recognition. Classical approaches to interpolation rely on linear weighting of nearby measurements in space on a fixed, discrete set of frequencies. Drawing inspiration from the success of neural fields for novel view synthesis in computer vision, we introduce the neural steerer, a continuous complex-valued function that takes both frequency and direction as input and produces the corresponding steering vector. Importantly, it incorporates inter-channel phase difference information and a regularization term enforcing filter causality, essential for accurate steering vector modeling. Our experiments, conducted using a dataset of real measured steering vectors, demonstrate the effectiveness of our resolution-free model in interpolating such measurements.
- “EasyCom: An augmented reality dataset to support algorithms for easy communication in noisy environments,” arXiv e-print, 2021, arXiv:2107.04174v2.
- “Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 28, pp. 2610–2625, 2020.
- Ralph Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280, 1986.
- “Geometric inference of the room geometry under temperature variations,” in Proc. Int. Symp. Control Commmun. Signal Process., 2012, pp. 1–4.
- “Signal enhancement using beamforming and nonstationarity with applications to speech,” IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, 2001.
- “A survey on sound source localization in robotics: From binaural to array processing methods,” Computer Speech & Language, vol. 34, no. 1, pp. 87–112, 2015.
- Ville Pulkki, “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society, vol. 45, no. 6, pp. 456–466, 1997.
- “Interpolation of head-related transfer functions (HRTFS): A multi-source approach,” in Proc. EUSIPCO, 2004.
- “Regularized HRTF fitting using spherical harmonics,” in Proc. IEEE WASPAA, 2009, pp. 257–260.
- “Fourier features let networks learn high frequency functions in low dimensional domains,” Proc. NeurIPS, vol. 33, pp. 7537–7547, 2020.
- “Implicit neural representations with periodic activation functions,” in Proc. NeurIPS, 2020, vol. 33, pp. 7462–7473.
- “Neural fields in visual computing and beyond,” in Comput. Graph. Forum, 2022, vol. 41, pp. 641–676.
- “NeRF: Representing scenes as neural radiance fields for view synthesis,” in Proc. ECCV, 2020, pp. 405–421.
- “Dynamic neural radiance fields for monocular 4D facial avatar reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 8649–8658.
- “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” J. Comput. Phys., 2019.
- “Physics-informed machine learning,” Nature Reviews Phys., vol. 3, no. 6, pp. 422–440, 2021.
- “Understanding and mitigating gradient flow pathologies in physics-informed neural networks,” SIAM J. Sci. Comput., vol. 43, no. 5, pp. A3055–A3081, 2021.
- “Deep impulse responses: Estimating and parameterizing filters with deep networks,” in Proc. IEEE ICASSP, 2022.
- “HRTF field: Unifying measured HRTF magnitude representation with neural fields,” in Proc. IEEE ICASSP, 2023.
- “Global HRTF interpolation via learned affine transformation of hyper-conditioned features,” in Proc. IEEE ICASSP, 2023.
- “Neural fourier shift for binaural speech rendering,” in Proc. IEEE ICASSP, 2023.
- “Learning neural acoustic fields,” in Proc. NeurIPS, 2022, pp. 1–13.
- “Inras: Implicit neural representation for audio scenes,” NeurIPS, 2022.
- “Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,” in Proc. Forum Acousticum., 2023.
- “Physics informed neural network for head-related transfer function upsampling,” arXiv preprint arXiv:2307.14650, 2023.
- Audio Source Separation and Speech Enhancement, John Wiley & Sons, 2018.
- “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in Proc. IEEE ICASSP, 2018.
- “How to (virtually) train your speaker localizer,” in INTERSPEECH 2023, 2023.
- “A deep generative model of speech complex spectrograms,” in Proc. IEEE ICASSP, 2019, pp. 905–909.
- “Neural synthesis of binaural speech from mono audio,” in Proc. ICLR, 2021, pp. 1–13.
- “DARE-Net: Speech dereverberation and room impulse response estimation,” Tech. Rep., Stanford University, 2022.
- “Respecting causality is all you need for training physics-informed neural networks,” arXiv e-print, 2022, arXiv:2203.07404v1.
- “Post processing sparse and instantaneous 2D velocity fields using physics-informed neural networks,” in Proc. Int. Symp. Appl. Laser Imag. Tech. Fluid Mech., 2022.
- Athanasios Papoulis, Signal analysis, Mcgraw-Hill College, 1977.