Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features (2011.08238v1)

Published 16 Nov 2020 in cs.CL, cs.SD, and eess.AS

Abstract: Transformer networks and self-supervised pre-training have consistently delivered state-of-art results in the field of NLP; however, their merits in the field of spoken language understanding (SLU) still need further investigation. In this paper we introduce a modular End-to-End (E2E) SLU transformer network based architecture which allows the use of self-supervised pre-trained acoustic features, pre-trained model initialization and multi-task training. Several SLU experiments for predicting intent and entity labels/values using the ATIS dataset are performed. These experiments investigate the interaction of pre-trained model initialization and multi-task training with either traditional filterbank or self-supervised pre-trained acoustic features. Results show not only that self-supervised pre-trained acoustic features outperform filterbank features in almost all the experiments, but also that when these features are used in combination with multi-task training, they almost eliminate the necessity of pre-trained model initialization.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Edmilson Morais (7 papers)
  2. Hong-Kwang J. Kuo (11 papers)
  3. Samuel Thomas (42 papers)
  4. Brian Kingsbury (54 papers)
  5. Zoltan Tuske (14 papers)
Citations (11)