Papers
Topics
Authors
Recent
Search
2000 character limit reached

Zimtohrli: An Efficient Psychoacoustic Audio Similarity Metric

Published 30 Sep 2025 in eess.AS | (2509.26133v1)

Abstract: This paper introduces Zimtohrli, a novel, full-reference audio similarity metric designed for efficient and perceptually accurate quality assessment. In an era dominated by computationally intensive deep learning models and proprietary legacy standards, there is a pressing need for an interpretable, psychoacoustically-grounded metric that balances performance with practicality. Zimtohrli addresses this gap by combining a 128-bin gammatone filterbank front-end, which models the frequency resolution of the cochlea, with a unique non-linear resonator model that mimics the human eardrum's response to acoustic stimuli. Similarity is computed by comparing perceptually-mapped spectrograms using modified Dynamic Time Warping (DTW) and Neurogram Similarity Index Measure (NSIM) algorithms, which incorporate novel non-linearities to better align with human judgment. Zimtohrli achieves superior performance to the baseline open-source ViSQOL metric, and significantly narrows the performance gap with the latest commercial POLQA metric. It offers a compelling balance of perceptual relevance and computational efficiency, positioning it as a strong alternative for modern audio engineering applications, from codec development to the evaluation of generative audio systems.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.