Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
12 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
37 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
2000 character limit reached

Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Source Positions (2305.04447v4)

Published 8 May 2023 in eess.AS and cs.SD

Abstract: We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field. This task plays a pivotal role in reducing the resource-intensive measurements required for precise sound source separation and localization, essential as the front-end of speech recognition. Classical approaches to interpolation rely on linear weighting of nearby measurements in space on a fixed, discrete set of frequencies. Drawing inspiration from the success of neural fields for novel view synthesis in computer vision, we introduce the neural steerer, a continuous complex-valued function that takes both frequency and direction as input and produces the corresponding steering vector. Importantly, it incorporates inter-channel phase difference information and a regularization term enforcing filter causality, essential for accurate steering vector modeling. Our experiments, conducted using a dataset of real measured steering vectors, demonstrate the effectiveness of our resolution-free model in interpolating such measurements.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. “EasyCom: An augmented reality dataset to support algorithms for easy communication in noisy environments,” arXiv e-print, 2021, arXiv:2107.04174v2.
  2. “Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation,” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 28, pp. 2610–2625, 2020.
  3. Ralph Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Trans. Antennas Propag., vol. 34, no. 3, pp. 276–280, 1986.
  4. “Geometric inference of the room geometry under temperature variations,” in Proc. Int. Symp. Control Commmun. Signal Process., 2012, pp. 1–4.
  5. “Signal enhancement using beamforming and nonstationarity with applications to speech,” IEEE Trans. Signal Process., vol. 49, no. 8, pp. 1614–1626, 2001.
  6. “A survey on sound source localization in robotics: From binaural to array processing methods,” Computer Speech & Language, vol. 34, no. 1, pp. 87–112, 2015.
  7. Ville Pulkki, “Virtual sound source positioning using vector base amplitude panning,” Journal of the audio engineering society, vol. 45, no. 6, pp. 456–466, 1997.
  8. “Interpolation of head-related transfer functions (HRTFS): A multi-source approach,” in Proc. EUSIPCO, 2004.
  9. “Regularized HRTF fitting using spherical harmonics,” in Proc. IEEE WASPAA, 2009, pp. 257–260.
  10. “Fourier features let networks learn high frequency functions in low dimensional domains,” Proc. NeurIPS, vol. 33, pp. 7537–7547, 2020.
  11. “Implicit neural representations with periodic activation functions,” in Proc. NeurIPS, 2020, vol. 33, pp. 7462–7473.
  12. “Neural fields in visual computing and beyond,” in Comput. Graph. Forum, 2022, vol. 41, pp. 641–676.
  13. “NeRF: Representing scenes as neural radiance fields for view synthesis,” in Proc. ECCV, 2020, pp. 405–421.
  14. “Dynamic neural radiance fields for monocular 4D facial avatar reconstruction,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recog., 2021, pp. 8649–8658.
  15. “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations,” J. Comput. Phys., 2019.
  16. “Physics-informed machine learning,” Nature Reviews Phys., vol. 3, no. 6, pp. 422–440, 2021.
  17. “Understanding and mitigating gradient flow pathologies in physics-informed neural networks,” SIAM J. Sci. Comput., vol. 43, no. 5, pp. A3055–A3081, 2021.
  18. “Deep impulse responses: Estimating and parameterizing filters with deep networks,” in Proc. IEEE ICASSP, 2022.
  19. “HRTF field: Unifying measured HRTF magnitude representation with neural fields,” in Proc. IEEE ICASSP, 2023.
  20. “Global HRTF interpolation via learned affine transformation of hyper-conditioned features,” in Proc. IEEE ICASSP, 2023.
  21. “Neural fourier shift for binaural speech rendering,” in Proc. IEEE ICASSP, 2023.
  22. “Learning neural acoustic fields,” in Proc. NeurIPS, 2022, pp. 1–13.
  23. “Inras: Implicit neural representation for audio scenes,” NeurIPS, 2022.
  24. “Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses,” in Proc. Forum Acousticum., 2023.
  25. “Physics informed neural network for head-related transfer function upsampling,” arXiv preprint arXiv:2307.14650, 2023.
  26. Audio Source Separation and Speech Enhancement, John Wiley & Sons, 2018.
  27. “Pyroomacoustics: A python package for audio room simulation and array processing algorithms,” in Proc. IEEE ICASSP, 2018.
  28. “How to (virtually) train your speaker localizer,” in INTERSPEECH 2023, 2023.
  29. “A deep generative model of speech complex spectrograms,” in Proc. IEEE ICASSP, 2019, pp. 905–909.
  30. “Neural synthesis of binaural speech from mono audio,” in Proc. ICLR, 2021, pp. 1–13.
  31. “DARE-Net: Speech dereverberation and room impulse response estimation,” Tech. Rep., Stanford University, 2022.
  32. “Respecting causality is all you need for training physics-informed neural networks,” arXiv e-print, 2022, arXiv:2203.07404v1.
  33. “Post processing sparse and instantaneous 2D velocity fields using physics-informed neural networks,” in Proc. Int. Symp. Appl. Laser Imag. Tech. Fluid Mech., 2022.
  34. Athanasios Papoulis, Signal analysis, Mcgraw-Hill College, 1977.

Summary

We haven't generated a summary for this paper yet.