Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis (2203.12067v1)

Published 22 Mar 2022 in cs.CL, cs.SD, and eess.AS

Abstract: Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust SLU techniques.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Zexun Wang (2 papers)
  2. Yuquan Le (3 papers)
  3. Yi Zhu (233 papers)
  4. Yuming Zhao (14 papers)
  5. Mingchao Feng (2 papers)
  6. Meng Chen (98 papers)
  7. Xiaodong He (162 papers)
Citations (5)