Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Chat About Boring Problems: Studying GPT-based text normalization (2309.13426v2)

Published 23 Sep 2023 in cs.CL and cs.AI

Abstract: Text normalization - the conversion of text from written to spoken form - is traditionally assumed to be an ill-formed task for LLMs. In this work, we argue otherwise. We empirically show the capacity of Large-LLMs (LLM) for text normalization in few-shot scenarios. Combining self-consistency reasoning with linguistic-informed prompt engineering, we find LLM based text normalization to achieve error rates around 40\% lower than top normalization systems. Further, upon error analysis, we note key limitations in the conventional design of text normalization tasks. We create a new taxonomy of text normalization errors and apply it to results from GPT-3.5-Turbo and GPT-4.0. Through this new framework, we can identify strengths and weaknesses of GPT-based TN, opening opportunities for future work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. “The kestrel tts text normalization system,” Natural Language Engineering, 2015.
  2. Introduction to Automata Theory, Languages, and Computation (3rd Edition), Addison-Wesley Longman Publishing Co., Inc., USA, 2006.
  3. “Normalization of non-standard words,” Computer speech & language, vol. 15, no. 3, pp. 287–333, 2001.
  4. “The OpenGrm open-source finite-state grammar software libraries,” in Proceedings of the ACL 2012 System Demonstrations, Min Zhang, Ed., Jeju Island, Korea, July 2012, pp. 61–66, Association for Computational Linguistics.
  5. “On using monolingual corpora in neural machine translation,” CoRR, vol. abs/1503.03535, 2015.
  6. “Proteno: Text normalization with limited data for fast deployment in text to speech systems,” in Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, Online, June 2021, pp. 72–79, Association for Computational Linguistics.
  7. “An RNN model of text normalization,” in Interspeech, 2017.
  8. “Neural text normalization with subword units,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers), 2019.
  9. “Neural models of text normalization for speech applications,” Computational Linguistics, 2019.
  10. “Improving neural text normalization with partial parameter generator and pointer-generator network,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 7583–7587.
  11. “NeMo (Inverse) Text Normalization: From Development to Production,” in Proc. Interspeech 2021, 2021, pp. 4857–4859.
  12. “A Mostly Data-Driven Approach to Inverse Text Normalization,” in Proc. Interspeech 2017, 2017, pp. 2784–2788.
  13. Richard Sproat, “Boring problems are sometimes the most interesting,” Computational Linguistics, vol. 48, no. 2, pp. 483–490, June 2022.
  14. “NeMo (Inverse) Text Normalization: From Development to Production,” in Interspeech, 2021.
  15. John L. Austin, How to do things with words, William James Lectures delivered at Harvard University in 1955. Harvard University Press, Cambridge, Massachusetts, second edition. edition, 1975 - 1975.
  16. “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  17. OpenAI, “Gpt-4 technical report,” 2023.
  18. Paul Taylor, Text-to-Speech Synthesis, Cambridge University Press, 2009.
  19. Daan van Esch and Richard Sproat, “An Expanded Taxonomy of Semiotic Classes for Text Normalization,” in Proc. Interspeech 2017, 2017, pp. 4016–4020.
  20. “Self-consistency improves chain of thought reasoning in language models,” arXiv preprint arXiv:2203.11171, 2022.
  21. “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, pp. 9, 2019.
  22. “The curious case of neural text degeneration,” arXiv preprint arXiv:1904.09751, 2019.
  23. “Lost in the middle: How language models use long contexts,” arXiv preprint arXiv:2307.03172, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yang Zhang (1129 papers)
  2. Travis M. Bartley (3 papers)
  3. Mariana Graterol-Fuenmayor (1 paper)
  4. Vitaly Lavrukhin (32 papers)
  5. Evelina Bakhturina (21 papers)
  6. Boris Ginsburg (111 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.