Spatial Upsampling of Head-Related Transfer Functions Using a Physics-Informed Neural Network (2307.14650v2)
Abstract: Head-related transfer function (HRTF) capture the information that a person uses to localize sound sources in space, and thus is crucial for creating personalized virtual acoustic experiences. However, practical HRTF measurement systems may only measure a person's HRTFs sparsely, and this necessitates HRTF upsampling. This paper proposes a physics-informed neural network (PINN) method for HRTF upsampling. The PINN exploits the Helmholtz equation, the governing equation of acoustic wave propagation, for regularizing the upsampling process. This helps the generation of physically valid upsamplings which generalize beyond the measured HRTF. Furthermore, the size (width and depth) of the PINN is set according to the Helmholtz equation and its solutions, the spherical harmonics (SHs). This makes the PINN have an appropriate level of expressive power and thus does not suffer from the over-fitting problem. Since the PINN is designed independent of any specific HRTF dataset, it offers more generalizability compared to pure data-driven methods. Numerical experiments confirm the better performance of the PINN method for HRTF upsampling in both interpolation and extrapolation scenarios in comparison with the SH method and the HRTF field method.
- S. Li and J. Peissig, “Measurement of head-related transfer functions: a review”, Appl. Sci., vol. 10, no. 14, pp. 5014, 2020.
- W. Zhang, P. N. Samarasinghe, H. Chen, and T. D. Abhayapala, “Surround by sound: a review of spatial audio recording and reproduction”, Appl. Sci., vol. 7, no. 6, pp. 532, May 2017.
- J. G. Richter and J. Fels, “On the influence of continuous subject rotation during high-resolution head-related transfer function measurements”, IEEE/ACM Trans. on Audio, Speech, and Lang. Proces., vol. 27, no. 4, pp. 730-741, April 2019.
- G. Enzner, “3D-continuous-azimuth acquisition of head-related impulse responses using multi-channel adaptive filtering”, Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., pp. 325-328, 2009.
- Z. Ben-Hur, D. L. Alon, R. Mehra, and B. Rafaely, “Efficient representation and sparse sampling of head-related transfer functions using phase-correction based on ear alignment”, IEEE/ACM Trans. on Audio, Speech, and Lang. Proces., vol. 27, no. 12, pp. 2249-2262, Dec. 2019.
- J. M. Arend, C. Pörschmann, S. Weinzierl, and F. Brinkmann, “Magnitude-Corrected and Time-Aligned Interpolation of Head-Related Transfer Functions”, arXiv preprint arXiv:2303.09966.
- L. S. Zhou, C. C. Bao, M. S. Jia, and B. Bu, “Range extrapolation of head-related transfer function using improved higher order ambisonics”, APSIPA ASC 2014, pp. 1-4, 2014.
- H. Gamper, “Head-related transfer function interpolation in azimuth, elevation, and distance”, J. Acoust. Soc. Am., vol. 134, no. 6, pp. 533–547, 2013.
- M. Pollow, K. V. Nguyen, O. Warusfel, T. Carpentier, M. Muller-Trapet, M. Vorlander, and M. Noisternig, “Calculation of head-related transfer functions for arbitrary field points using spherical harmonics decomposition”, Acta. Acustica united with Acustica, vol. 98, no. 1, pp. 72–82, 2012.
- S. Spors and J. Ahrens, “Interpolation and range extrapolation of head-related transfer functions using virtual local sound-field synthesis”, 130th Conv. AES, May 2011.
- R. Duraiswaini, D. N Zotkin, and N. A Gumerov, “Interpolation and range extrapolation of head related transfer functions”, IEEE Int. Conf. on Acoust. Speech, and Signal Proces. (ICASSP), vol. 4, pp. iv–iv, 2004.
- M. J. Evans, J. A. Angus, and A. I. Tew, “Analyzing head-related transfer function measurements using surface spherical harmonics”, J. Acoust. Soc. Am., vol. 104, no. 4, pp. 2400–2411, 1998.
- M. Aussal, F. Alouges, and B. Katz, “HRTF interpolation and ITD personalization for binaural synthesis using spherical harmonics”, Journal of Audio Engineering Society, 2012.
- B. Xie, “Recovery of individual head-related transfer functions from a small set of measurements”, J. Acoust. Soc. Am., vol. 132, no. 1, pp. 282–294, 2012.
- L. Wang, F. Yin, and Z. Chen, “Head-related transfer function interpolation through multivariate polynomial fitting of principal component weights”, Acoust. Sci. Tech., vol. 30, no. 6, pp. 395–403, 2009.
- M. Zhang, Z. Ge, T. Liu, X. Wu, and T. Qu, “Modeling of individual HRTFs based on spatial principal component analysis”, IEEE/ACM Trans. on Audio, Speech, and Lang. Process., vol. 28, pp. 785-797, 2020.
- K. Hartung, J. Braasch, and S. J. Sterbing, “Comparison of different methods for the interpolation of head-related transfer functions”, Proc. 16th Int. Audio Eng. Soc. Conf. Spatial Sound Reproduction, pp. 319–329, 1999.
- J. C. B. Torres and M. R. Petraglia, “HRTF interpolation in the wavelet transform domain”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 293-296, 2009.
- T.-Y. Chen, T.-H. Kuo, and T.-S. Chi, “Autoencoding HRTFs for DNN based HRTF personalization using anthropometric features”, IEEE Int. Conf. Acoust. Speech Signal Process. (ICASSP 2019), pp. 271-275, May 2019.
- R. Miccini and S. Spagnol, “HRTF individualization using deep learning”, IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), pp. 390-395, Mar. 2020.
- Y. Ito, T. Nakamura, S. Koyama, and H. Saruwatari, “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning”, International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1-5, 2022.
- P. Siripornpitak, I. Engel, I. Squires, S. J. Cooper, and L. Picinali, “Spatial up-sampling of HRTF sets using generative adversarial networks: a pilot study”, Frontiers in Signal Processing, vol. 2, 2022.
- A. O. Hogg, M. Jenkins, H. Liu, I. Squires, S. J. Cooper, and L. Picinali, “HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection”, arXiv preprint arXiv:2306.05812.
- J. W. Lee, S. Lee, and K. Lee, “Global hrtf interpolation via learned affine transformation of hyper-conditioned features”, IEEE Int. Conf. on Acoust. Speech and Signal Proces. (ICASSP), pp. 1-5, Jun. 2023.
- B. Zhi, D. N. Zotkin, and R. Duraiswami, “Towards fast and convenient end-to-end HRTF personalization”, IEEE Int. Conf. on Acoust. Speech and Signal Proces. (ICASSP), pp. 441-445, 2022.
- Y. Zhang, Y. Wang, and Z. Duan, “HRTF field: unifying measured HRTF magnitude representation with neural fields”, IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp 1–5, Jun. 2023.
- W. Zhang, R. A. Kennedy, and T. D. Abhayapala, “Iterative extrapolation algorithm for data reconstruction over sphere”, IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp. 3733–3736, Mar. 2008.
- U. Elahi, Z. Khalid, and R. A. Kennedy, “An improved iterative algorithm for band-limited signal extrapolation on the sphere”, IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), pp. 4619-4623, Mar. 2018.
- J. Ahrens, M. R. P. Thomas, and I. Tashev, “HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coefficients on incomplete data”, APSIPA, Dec. 2012.
- C. Pörschmann, J. M. Arend, and F. Brinkmann, “Directional equalization of sparse head-related transfer function sets for spatial upsampling”, IEEE/ACM Transa. on Audio, Speech, and Lang. Proces, vol. 27, no. 6, pp. 1060-1071, 2019.
- M. Raissi, P. Perdikaris, and G. Em Karniadakis, “Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations”, arXiv preprint arXiv:1711.10566.
- M. Raissi, P. Perdikaris, and G. Em Karniadakis, “Physics informed deep learning (part II): data-driven discovery of nonlinear partial differential equations”, arXiv preprint arXiv:1711.10566.
- G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,” Nature Reviews Physics, vol. 3, no. 6, pp. 422–440, May 2021.
- S. Cuomo, V. D. Di Cola, F. Giampaolo, G. Rozza, M. Raissi, and F. Piccialli, “Scientific machine learning through physics-informed neural networks: where we are and what’s next”, arXiv preprint arXiv:2201.05624.
- C. Song, T. Alkhalifah, and U. B. Waheed, “Solving the frequency-domain acoustic VTI wave equation using physics-informed neural networks”, Geophys. J. Int., vol. 225, no. 2, pp. 846-859, 2021.
- P. Ren, C. Rao, H. Sun, and Y. Liu, “SeismicNet: physics-informed neural networks for seismic wave modeling in semi-infinite domain”, arXiv preprint arXiv:2210.14044.
- Y. Wang, K. Wang, and M. Abdel-Maksoud, “NoiseNet: a neural network to predict marine propellers’ underwater radiated noise”, Ocean Engineering, vol. 236, pp. 109542, 2021.
- K. Shigemi, S. Koyama, T. Nakamura, and H. Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound-field estimation”, arXiv preprint arXiv:2207.10937.
- B. Moseley, A. Markham, and T. Nissen-Meyer, “Solving the wave equation with physics-informed deep learning”, arXiv preprint arXiv:2006.11894.
- N. Borrel-Jensen, A. P. Engsig-Karup, and C. H. Jeong, “Physics-informed neural networks for one-dimensional sound-field predictions with parameterized sources and impedance boundaries”, Jasa Express Lett., 2021.
- M. Rasht‐Behesht, C. Huber, K. Shukla, and G. E. Karniadakis, “Physics‐informed neural networks for wave propagation and full waveform inversions”, Journal of Geophysical Research: Solid Earth, 2022.
- K. Shigemi, S. Koyama, T. Nakamura, and H. Saruwatari, “Physics-informed convolutional neural network with bicubic spline interpolation for sound-field estimation”, International Workshop on Acoustic Signal Enhancement (IWAENC), pp. 1-5, 2022.
- R. Leiteritz, and D. Pflüger, “How to avoid trivial solutions in physics-informed neural networks”, arXiv preprint arXiv:2112.05620.
- S. Wang, Y. Teng and P. Perdikaris, “Understanding and mitigating gradient pathologies in physics-informed neural networks”, SIAM Journal on Scientific Computing, vol. 43, no. 5, pp. 3055-3081, 2021.
- F. M. Rohrhofer, S. Posch, C. Gößnitzer, and B. C. Geiger, “Understanding the difficulty of training physics-informed neural networks on dynamical systems”, arXiv preprint arXiv:2203.13648.
- W. Zhang, T. D. Abhayapala, R. A. Kennedy, and R. Duraiswami, “Insights into head-related transfer function: spatial dimensionality and continuous representation”, J. Acoust. Soc. Amer., vol. 127, pp. 2347–2357, 2010.
- D. B. Ward and T. D. Abhayapala, “Reproduction of a plane-wave sound-field using an array of loudspeakers”, IEEE Trans. Speech Audio Process., vol. 9, no. 66, pp. 697–707, 2001.
- Z. Lu, H. Pu, F. Wang, Z. Hu, and L. Wang, “The expressive power of neural networks: a view from the width”, Adv. Neural Inf. Process. Syst., pp. 6231–6239, 2017.
- F. Brinkmann, M. Dinakaran, R. Pelzer, P. Grosche, D. Voss, and S. Weinzierl, “A cross-evaluated database of measured and simulated HRTFs including 3D head meshes, anthropometric features, and headphone impulse responses”, AES, 2019.
- C. Lee, H. Hasegawa, and S. Gao, “Complex-valued neural networks: A comprehensive survey”, IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 8, pp. 1406-1426, 2022.
- O. Sener and V. Koltun, “Multi-task learning as multi-objective optimization”, Adv. Neural Inf. Process. Syst., pp. 525–536, 2018.
- X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTATS, pp. 249–256, 2010.