NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization (2402.17907v1)
Abstract: Head-related transfer functions (HRTFs) are important for immersive audio, and their spatial interpolation has been studied to upsample finite measurements. Recently, neural fields (NFs) which map from sound source direction to HRTF have gained attention. Existing NF-based methods focused on estimating the magnitude of the HRTF from a given sound source direction, and the magnitude is converted to a finite impulse response (FIR) filter. We propose the neural infinite impulse response filter field (NIIRF) method that instead estimates the coefficients of cascaded IIR filters. IIR filters mimic the modal nature of HRTFs, thus needing fewer coefficients to approximate them well compared to FIR filters. We find that our method can match the performance of existing NF-based methods on multiple datasets, even outperforming them when measurements are sparse. We also explore approaches to personalize the NF to a subject and experimentally find low-rank adaptation to be effective.
- F. Keyrouz and K. Diepold, “Binaural source localization and spatial audio reproduction for telepresence applications,” Presence: Teleoperators, Virtual Environ., vol. 16, no. 5, pp. 509–522, 2007.
- B. Xie, Head-related transfer function and virtual auditory display, J. Ross Publishing, 2013.
- “Sonic interactions in virtual reality: State of the art, current challenges, and future directions,” IEEE Comput. Graph., Appl., vol. 38, no. 2, pp. 31–43, 2018.
- “Experiments on localization accuracy with non-individual and individual hrtfs comparing static and dynamic reproduction methods,” BioRxiv, 2020.
- “Evaluation of individualized hrtfs in a 3D shooter game,” in Proc. I3DA, 2021, pp. 1–10.
- “Dataset of head-related transfer functions measured with a circular loudspeaker array,” Acout. Sci., Tech, vol. 35, no. 3, pp. 159–165, 2014.
- H. Gamper, “Head-related transfer function interpolation in azimuth, elevation, and distance,” J. Acoust. Soc. Am., vol. 134, no. 6, pp. EL547–EL553, 2013.
- “HRTF personalization using anthropometric measurements,” in Proc. WASPAA, 2003, pp. 157–160.
- “HRTF personalization based on artificial neural network in individual virtual auditory space,” Appl. Acout., vol. 69, no. 2, pp. 163–172, 2008.
- V. Pulkki, “Virtual sound source positioning using vector base amplitude panning,” J. Audio Eng. Soc., vol. 45, no. 6, pp. 456–466, 1997.
- “Sparse ℓ1subscriptℓ1\ell_{1}roman_ℓ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT-optimal multiloudspeaker panning and its relation to vector base amplitude panning,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 25, no. 5, pp. 996–1010, 2017.
- “Interpolation and range extrapolation of HRTFs [head related transfer functions],” in Proc. ICASSP, 2004, vol. IV, pp. 45–48.
- “HRTF magnitude modeling using a non-regularized least-squares fit of spherical harmonics coefficients on incomplete data,” in Proc. APSIPA, 2012, pp. 1–5.
- “Assessing spherical harmonics interpolation of time-aligned head-related transfer functions,” J. Audio Eng. Soc., vol. 69, no. 1, pp. 104–117, 2021.
- “MPEG-H 3D audio—The new standard for coding of immersive spatial audio,” IEEE J. Sel. Top. Signal Process., vol. 9, no. 5, pp. 770–779, 2015.
- “Modeling of individual HRTFs based on spatial principal component analysis,” IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 28, pp. 785–797, 2020.
- “Global HRTF interpolation via learned affine transformation of hyper-conditioned features,” in Proc. ICASSP, 2023.
- “Head-related transfer function interpolation from spatially sparse measurements using autoencoder with source position conditioning,” in Proc. IWAENC, 2022.
- “HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection,” arXiv:2306.05812, 2023.
- “Implicit HRTF modeling using temporal convolutional networks,” in Proc. ICASSP, 2021, pp. 3385–3389.
- “HRTF field: Unifying measured HRTF magnitude representation with neural fields,” in Proc. ICASSP, 2023.
- “NeRF: Representing scenes as neural radiance fields for view synthesis,” Commun. ACM, vol. 65, no. 1, pp. 99–106, 2022.
- “Neural fields in visual computing and beyond,” Comput. Graph. Forum, vol. 41, no. 2, pp. 641–676, 2022.
- “Learning neural acoustic fields,” in Proc. NeurIPS, 2022, pp. 3165–3177.
- “Neural steerer: Novel steering vector synthesis with a causal neural field over frequency and source positions,” arXiv:2305.04447, 2023.
- “A model of head-related transfer functions based on principal components analysis and minimum-phase reconstruction,” J. Acoust. Soc. Am., vol. 91, no. 3, pp. 1637–1647, 1992.
- G. Ramos and M. Cobos, “Parametric head-related transfer function modeling and interpolation for cost-efficient binaural sound applications,” J. Acoust. Soc. Am., vol. 134, no. 3, pp. 1735–1738, 2013.
- P. Nowak and U. Zölzer, “Spatial interpolation of HRTFs approximated by parametric IIR filters,” in Proc. DAGA, 2022.
- “Differentiable IIR filters for machine learning applications,” in Proc. DAFX, 2020, pp. 297–303.
- S. Nercessian, “Neural parametric equalizer matching using differentiable biquads,” in Proc. DAFX, 2020, pp. 265–272.
- “Lightweight and interpretable neural modeling of an audio distortion effect using hyperconditioned differentiable biquads,” in Proc. ICASSP, 2021, pp. 890–894.
- “DDSP: Differentiable digital signal processing,” in Proc. ICLR, 2019.
- “Optimization of cascaded parametric peak and shelving filters with backpropagation algorithm,” Proc. DAFX, pp. 101–108, 2020.
- “Direct design of biquad filter cascades with deep learning by sampling random polynomials,” in Proc. ICASSP, 2022, pp. 3104–3108.
- U. Zölzer, Digital audio signal processing, John Wiley & Sons, 2022.
- “An approach to the approximation problem for nonrecursive digital filters,” IEEE Trans. Audio Electroacust., vol. 18, no. 2, pp. 83–106, 1970.
- “Fourier features let networks learn high frequency functions in low dimensional domains,” in Proc. NeurIPS, 2020, pp. 7537–7547.
- “Implicit neural representations with periodic activation functions,” in Proc. NeurIPS, 2020, pp. 7462–7473.
- “FiLM: Visual reasoning with a general conditioning layer,” in Proc. AAAI, 2018, pp. 3942–3951.
- “BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models,” in Proc. ACL, 2022, vol. 2, pp. 1–9.
- “LoRA: Low-rank adaptation of large language models,” in Proc. ICLR, 2022.
- “The CIPIC HRTF database,” in Proc. WASPAA, 2001, pp. 99–102.
- “On the variance of the adaptive learning rate and beyond,” in Proc. ICLR, 2020.
- “Audibility and interpolation of head-above-torso orientation in binaural technology,” IEEE J. Sel. Top. Signal Process., vol. 9, no. 5, pp. 931–942, 2015.
- “A cross-evaluated database of measured and simulated HRTFs including 3D head meshes, anthropometric features, and headphone impulse responses,” J. Audio Eng. Soc., vol. 67, no. 9, pp. 705–718, 2019.