Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards a World-English Language Model for On-Device Virtual Assistants (2403.18783v1)

Published 27 Mar 2024 in cs.CL

Abstract: Neural Network LLMs (NNLMs) for Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories is one way to improve scalability. In this work, we combine regional variants of English to build a ``World English'' NNLM for on-device VAs. In particular, we investigate the application of adapter bottlenecks to model dialect-specific characteristics in our existing production NNLMs {and enhance the multi-dialect baselines}. We find that adapter modules are more effective in modeling dialects than specializing entire sub-networks. Based on this insight and leveraging the design of our production models, we introduce a new architecture for World English NNLM that meets the accuracy, latency, and memory constraints of our single-dialect models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. “Application-agnostic language modeling for on-device ASR,” in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), July 2023, pp. 268–275.
  2. “Language models are few-shot learners,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, Eds., 2020.
  3. “A survey of multilingual models for automatic speech recognition,” arXiv preprint arXiv:2202.12576, 2022.
  4. “Multi-dialect speech recognition with a single sequence-to-sequence model,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 4749–4753.
  5. “Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems,” in Proceedings Interspeech, 2021, pp. 1767–1771.
  6. “Massively multilingual asr: 50 languages, 1 model, 1 billion parameters,” arXiv preprint arXiv:2007.03001, 2020.
  7. “Adapt-and-adjust: Overcoming the long-tail problem of multilingual speech recognition,” in Interspeech, 2020.
  8. “Large-scale multilingual speech recognition with a streaming end-to-end model,” in Proceedings Interspeech, 2019.
  9. “Learning multiple visual domains with residual adapters,” Advances in neural information processing systems, vol. 30, 2017.
  10. “Parameter-efficient transfer learning for NLP,” in International Conference on Machine Learning (ICML), 2019, pp. 2790–2799.
  11. “Mad-x: An adapter-based framework for multi-task cross-lingual transfer,” arXiv preprint arXiv:2005.00052, 2020.
  12. Alex Graves, “Sequence transduction with recurrent neural networks,” in International Conference of Machine Learning (ICML) Workshop on Representation Learning, 2012.
  13. “The fixed-size ordinally-forgetting encoding method for neural network language models,” in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015, pp. 495–500.
  14. “A Primer on Pretrained Multilingual Language Models,” arXiv preprint arXiv:2107.00676, 2021.
  15. “AdapterFusion: Non-destructive task composition for transfer learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, Apr. 2021, pp. 487–503.
  16. “Lifting the curse of multilinguality by pre-training modular transformers,” in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, 2022, pp. 3479–3495.
  17. “SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020, pp. 6854–6858.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Rricha Jalota (7 papers)
  2. Lyan Verwimp (11 papers)
  3. Amr Mousa (2 papers)
  4. Arturo Argueta (5 papers)
  5. Youssef Oualil (11 papers)
  6. Markus Nussbaum-Thom (1 paper)

Summary

We haven't generated a summary for this paper yet.