Ambisonics Networks -- The Effect Of Radial Functions Regularization (2402.18968v1)
Abstract: Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.
- F. Zotter and M. Frank, Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality, 01 2019.
- B. Rafaely, Fundamentals of Spherical Array Processing, Springer Topics in Signal Processing. Springer, Germany, second edition, 2019, notValidatingIssn:1866-2609 ;.
- B. Rafaely, “Plane-wave decomposition of the sound field on a sphere by spherical convolution,” The Journal of the Acoustical Society of America, vol. 116, no. 4, pp. 2149–2157, 10 2004.
- J. Daniel and S. Kitic, “Echo-enabled direction-of-arrival and range estimation of a mobile source in ambisonic domain,” in 2022 30th European Signal Processing Conference (EUSIPCO), 2022, pp. 852–856.
- M. Lugasi and B. Rafaely, “Speech enhancement using masking for binaural reproduction of ambisonics signals,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 1767–1777, 2020.
- N. R. Shabtai and B. Rafaely, “Binaural sound reproduction beamforming using spherical microphone arrays,” in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013, pp. 101–105.
- “Sound source separation in the higher order ambisonics domain,” 07 2019.
- Parametric Time-Frequency Domain Spatial Audio, chapter 2, Wiley-Blackwell, United States, Dec. 2017.
- “Direction of arrival estimation for reverberant speech based on neural networks and the direct-path dominance test,” in 2022 International Workshop on Acoustic Signal Enhancement (IWAENC), 2022, pp. 1–5.
- “AmbiSep: Ambisonic-to-Ambisonic reverberant speech separation using transformer networks,” Bamberg, Germany, Sept. 2022.
- “Ambisonics domain singing voice separation combining deep neural network and direction aware multichannel nmf,” in 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), 2021, pp. 1–6.
- “Direction specific ambisonics source separation with end-to-end deep learning,” Acta Acustica, vol. 7, 06 2023.
- “Dilated u-net based approach for multichannel speech enhancement from first-order ambisonics recordings,” in 2020 28th European Signal Processing Conference (EUSIPCO), 2021, pp. 216–220.
- N. Hahn and S. Spors, “Further investigations on the design of radial filters for the driving functions of near-field compensated higher-order ambisonics,” in Audio Engineering Society Convention 142, May 2017.
- S. Lösler and F. Zotter, “Comprehensive radial filter design for practical higher-order ambisonic recording,” Fortschritte der Akustik, DAGA, , no. 1, pp. 452–455, 2015.
- “Comparison of modal versus delay-and-sum beamforming in the context of data-based binaural synthesis,” 04 2012.
- A. Tikhonov and V. IA. Arsenin, Solutions of ill-posed problems, Scripta series in mathematics. Winston and distributed solely by Halsted Press, 1977.
- R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, 1986.
- O. Nadiri and B. Rafaely, “Localization of multiple speakers under high reverberation using a spherical microphone array and the direct-path dominance test,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 10, pp. 1494–1505, 2014.
- J. Allen and D. Berkley, “Image method for efficiently simulating small-room acoustics,” The Journal of the Acoustical Society of America, vol. 65, pp. 943–950, 04 1979.
- J. S. Garofolo, “Timit acoustic phonetic continuous speech corpus,” Linguistic Data Consortium, 1993, 1993.
- “The locata challenge data corpus for acoustic source localization and tracking,” in 2018 IEEE 10th Sensor Array and Multichannel Signal Processing Workshop (SAM), 2018, pp. 410–414.