On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding (2210.05291v1)

Published 11 Oct 2022 in cs.CL, cs.SD, and eess.AS

Abstract: In this paper we examine the use of semantically-aligned speech representations for end-to-end spoken language understanding (SLU). We employ the recently-introduced SAMU-XLSR model, which is designed to generate a single embedding that captures the semantics at the utterance level, semantically aligned across different languages. This model combines the acoustic frame-level speech representation learning model (XLS-R) with the Language Agnostic BERT Sentence Embedding (LaBSE) model. We show that the use of the SAMU-XLSR model instead of the initial XLS-R model improves significantly the performance in the framework of end-to-end SLU. Finally, we present the benefits of using this model towards language portability in SLU.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Valentin Pelloin (5 papers)
Themos Stafylakis (35 papers)
Yannick Estève (45 papers)
Gaëlle Laperrière (4 papers)
Mickaël Rouvier (1 paper)

Citations (9)

View on Semantic Scholar

On the Use of Semantically-Aligned Speech Representations for Spoken Language Understanding (2210.05291v1)

Related Papers