Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions (2310.02973v2)

Published 4 Oct 2023 in cs.CL, cs.SD, and eess.AS

Abstract: Recent studies leverage LLMs with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models. Motivated by this, we ask: can we build a single model that jointly performs various spoken language understanding (SLU) tasks? We start by adapting a pre-trained automatic speech recognition model to additional tasks using single-token task specifiers. We enhance this approach through instruction tuning, i.e., finetuning by describing the task using natural language instructions followed by the list of label options. Our approach can generalize to new task descriptions for the seen tasks during inference, thereby enhancing its user-friendliness. We demonstrate the efficacy of our single multi-task learning model "UniverSLU" for 12 speech classification and sequence generation task types spanning 17 datasets and 9 languages. On most tasks, UniverSLU achieves competitive performance and often even surpasses task-specific models. Additionally, we assess the zero-shot capabilities, finding that the model generalizes to new datasets and languages for seen task types.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Siddhant Arora (50 papers)
  2. Hayato Futami (24 papers)
  3. Jee-weon Jung (69 papers)
  4. Yifan Peng (147 papers)
  5. Roshan Sharma (24 papers)
  6. Yosuke Kashiwagi (29 papers)
  7. Emiru Tsunoo (34 papers)
  8. Shinji Watanabe (416 papers)
  9. Karen Livescu (89 papers)
Citations (4)