A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding (2406.12141v1)

Published 17 Jun 2024 in cs.CL, cs.SD, and eess.AS

Abstract: Self-Supervised Learning is vastly used to efficiently represent speech for Spoken Language Understanding, gradually replacing conventional approaches. Meanwhile, textual SSL models are proposed to encode language-agnostic semantics. SAMU-XLSR framework employed this semantic information to enrich multilingual speech representations. A recent study investigated SAMU-XLSR in-domain semantic enrichment by specializing it on downstream transcriptions, leading to state-of-the-art results on a challenging SLU task. This study's interest lies in the loss of multilingual performances and lack of specific-semantics training induced by such specialization in close languages without any SLU implication. We also consider SAMU-XLSR's loss of initial cross-lingual abilities due to a separate SLU fine-tuning. Therefore, this paper proposes a dual task learning approach to improve SAMU-XLSR semantic enrichment while considering distant languages for multilingual and language portability experiments.

PDF HTML Abstract

Summarize Bookmark Chat (Pro)

Authors (4)

Sahar Ghannay (14 papers)
Bassam Jabaian (11 papers)
Yannick Estève (45 papers)
Gaëlle Laperrière (4 papers)

A dual task learning approach to fine-tune a multilingual semantic speech encoder for Spoken Language Understanding (2406.12141v1)

Related Papers