A Textless Metric for Speech-to-Speech Comparison (2210.11835v2)

Published 21 Oct 2022 in cs.CL, cs.SD, and eess.AS

Abstract: In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.

Authors (4)

Laurent Besacier (76 papers)
Swen Ribeiro (1 paper)
Olivier Galibert (4 papers)
Ioan Calapodescu (12 papers)

Citations (5)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Textless Metric for Speech-to-Speech Comparison (2210.11835v2)

Summary

Related Papers