Ambisonics Super-Resolution Using A Waveform-Domain Neural Network (2508.00240v1)

Published 1 Aug 2025 in eess.AS and cs.SD

Abstract: Ambisonics is a spatial audio format describing a sound field. First-order Ambisonics (FOA) is a popular format comprising only four channels. This limited channel count comes at the expense of spatial accuracy. Ideally one would be able to take the efficiency of a FOA format without its limitations. We have devised a data-driven spatial audio solution that retains the efficiency of the FOA format but achieves quality that surpasses conventional renderers. Utilizing a fully convolutional time-domain audio neural network (Conv-TasNet), we created a solution that takes a FOA input and provides a higher order Ambisonics (HOA) output. This data driven approach is novel when compared to typical physics and psychoacoustic based renderers. Quantitative evaluations showed a 0.6dB average positional mean squared error difference between predicted and actual 3rd order HOA. The median qualitative rating showed an 80% improvement in perceived quality over the traditional rendering approach.

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Paper Prompts

Explore 10 Community Prompts

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (8)

Tweets

https://twitter.com/ArxivSound/status/1952526786170118521

https://twitter.com/AudioAndSpeech/status/1952278839377219849

alphaXiv

Ambisonics Super-Resolution Using A Waveform-Domain Neural Network (6 likes, 0 questions)