Papers
Topics
Authors
Recent
2000 character limit reached

Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task (2409.14069v2)

Published 21 Sep 2024 in eess.AS and cs.SD

Abstract: Human perception has the unique ability to focus on specific events in a mixture of signals--a challenging task for existing non-intrusive assessment methods. In this work, we introduce semi-intrusive assessment that emulates human attention by framing audio assessment as a text-prediction task with audio-text inputs. To this end, we extend the multi-modal PENGI model through instruction fine-tuning for MOS and SNR estimation. For MOS, our approach achieves absolute Pearson correlation gains of 0.06 and 0.20 over the re-trained MOSRA model and the pre-trained PAM model, respectively. We further propose a novel SNR estimator that can focus on a specific audio source in a mixture, outperforming a random baseline and the fixed-prompt counterpart. Our findings suggest that semi-intrusive assessment can effectively capture human-like selective listening capabilities. Samples are available at https://jozefcoldenhoff.github.io/semi-intrusive-assessment.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.