Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Contextual Biasing of Named-Entities with Large Language Models (2309.00723v2)

Published 1 Sep 2023 in cs.CL, cs.AI, cs.LG, cs.SD, and eess.AS

Abstract: This paper studies contextual biasing with LLMs, where during second-pass rescoring additional contextual information is provided to a LLM to boost Automatic Speech Recognition (ASR) performance. We propose to leverage prompts for a LLM without fine tuning during rescoring which incorporate a biasing list and few-shot examples to serve as additional information when calculating the score for the hypothesis. In addition to few-shot prompt learning, we propose multi-task training of the LLM to predict both the entity class and the next token. To improve the efficiency for contextual biasing and to avoid exceeding LLMs' maximum sequence lengths, we propose dynamic prompting, where we select the most likely class using the class tag prediction, and only use entities in this class as contexts for next token prediction. Word Error Rate (WER) evaluation is performed on i) an internal calling, messaging, and dictation dataset, and ii) the SLUE-Voxpopuli dataset. Results indicate that biasing lists and few-shot examples can achieve 17.8% and 9.6% relative improvement compared to first pass ASR, and that multi-task training and dynamic prompting can achieve 20.0% and 11.3% relative WER improvement, respectively.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. “Shallow-fusion end-to-end contextual biasing.,” in Interspeech, 2019, pp. 1418–1422.
  2. “Deep shallow fusion for rnn-t personalization,” in 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021, pp. 251–257.
  3. “Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion,” in Proc. Interspeech 2021, 2021, pp. 1772–1776.
  4. “Adaptive multi-corpora language model training for speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023.
  5. “Improved recognition of contact names in voice commands,” in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015, pp. 5172–5175.
  6. “Improved name recognition with meta-data dependent name networks,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2004, vol. 1, pp. I–789.
  7. “Class-based n-gram models of natural language,” Computational linguistics, vol. 18, no. 4, pp. 467–480, 1992.
  8. “Neural-fst class language model for end-to-end speech recognition,” in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6107–6111.
  9. “Deep shallow fusion for rnn-t personalization,” in 2021 IEEE Spoken Language Technology Workshop (SLT), 2021, pp. 251–257.
  10. “Context-aware transformer transducer for speech recognition,” in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021, pp. 503–510.
  11. “Lamp: When large language models meet personalization,” arXiv preprint arXiv:2304.11406, 2023.
  12. “Robust acoustic and semantic contextual biasing in neural transducers for speech recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  13. “Deep context: end-to-end contextual speech recognition,” in 2018 IEEE spoken language technology workshop (SLT). IEEE, 2018, pp. 418–425.
  14. “Joint grapheme and phoneme embeddings for contextual end-to-end asr.,” in Interspeech, 2019, pp. 3490–3494.
  15. “Contextual rnn-t for open domain asr,” Proc. Interspeech 2020, pp. 11–15, 2020.
  16. “Prompting large language models with speech recognition abilities,” arXiv preprint arXiv:2307.11795, 2023.
  17. “Large-scale language model rescoring on long-form data,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
  18. “Effect and analysis of large-scale language model rescoring on competitive asr systems,” arXiv preprint arXiv:2204.00212, 2022.
  19. OpenAI, “Chatgpt: Optimizing language models for dialogue,” Feb 2022.
  20. “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
  21. “Gaussian prior reinforcement learning for nested named entity recognition,” in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2023, pp. 1–5.
  22. “Multi-task transformer with relation-attention and type-attention for named entity recognition,” in ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
  23. “Improving biomedical named entity recognition with a unified multi-task mrc framework,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 8332–8336.
  24. “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021.
  25. “Categorical reparameterization with gumbel-softmax,” arXiv preprint arXiv:1611.01144, 2016.
  26. “Slue: New benchmark tasks for spoken language understanding evaluation on natural speech,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 7927–7931.
  27. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in neural information processing systems, vol. 33, pp. 12449–12460, 2020.
  28. “Llama 2: Open foundation and fine-tuned chat models,” arXiv preprint arXiv:2307.09288, 2023.
  29. “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  30. “Rescorebert: Discriminative speech recognition rescoring with bert,” in ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 6117–6121.
  31. “Masked language model scoring,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2699–2712.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chuanneng Sun (4 papers)
  2. Zeeshan Ahmed (95 papers)
  3. Yingyi Ma (9 papers)
  4. Zhe Liu (234 papers)
  5. Lucas Kabela (3 papers)
  6. Yutong Pang (7 papers)
  7. Ozlem Kalinli (49 papers)
Citations (8)