Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Are Soft Prompts Good Zero-shot Learners for Speech Recognition? (2309.09413v1)

Published 18 Sep 2023 in cs.SD and eess.AS

Abstract: Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR). Our findings highlight their role as zero-shot learners in improving ASR performance but also make them vulnerable to malicious modifications. Soft prompts aid generalization but are not obligatory for inference. We also identify two primary roles of soft prompts: content refinement and noise information enhancement, which enhances robustness against background noise. Additionally, we propose an effective modification on noise prompts to show that they are capable of zero-shot learning on adapting to out-of-distribution noise environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Dianwen Ng (21 papers)
  2. Chong Zhang (137 papers)
  3. Ruixi Zhang (7 papers)
  4. Yukun Ma (33 papers)
  5. Fabian Ritter-Gutierrez (7 papers)
  6. Trung Hieu Nguyen (12 papers)
  7. Chongjia Ni (18 papers)
  8. Shengkui Zhao (21 papers)
  9. Eng Siong Chng (112 papers)
  10. Bin Ma (78 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.