Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Resilience of Large Language Models for Noisy Instructions (2404.09754v2)

Published 15 Apr 2024 in cs.CL

Abstract: As the rapidly advancing domain of NLP, LLMs have emerged as powerful tools for interpreting human commands and generating text across various tasks. Nonetheless, the resilience of LLMs to handle text containing inherent errors, stemming from human interactions and collaborative systems, has not been thoroughly explored. Our study investigates the resilience of LLMs against five common types of disruptions including 1) ASR (Automatic Speech Recognition) errors, 2) OCR (Optical Character Recognition) errors, 3) grammatical mistakes, 4) typographical errors, and 5) distractive content. We aim to investigate how these models react by deliberately embedding these errors into instructions. Our findings reveal that while some LLMs show a degree of resistance to certain types of noise, their overall performance significantly suffers. This emphasizes the importance of further investigation into enhancing model resilience. In response to the observed decline in performance, our study also evaluates a "re-pass" strategy, designed to purify the instructions of noise before the LLMs process them. Our analysis indicates that correcting noisy instructions, particularly for open-source LLMs, presents significant challenges.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
  2. Common voice: A massively-multilingual speech corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4218–4222, Marseille, France. European Language Resources Association.
  3. Self-rag: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511.
  4. Grammatical error correction: A survey of the state of the art. Computational Linguistics, 49(3):643–701.
  5. Is making mistakes human? on the perception of typing errors in chatbot communication. Proceedings of the 54th Hawaii International Conference on System Sciences.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  7. Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models. arXiv preprint arXiv:2311.07919.
  8. Kenneth W Church and William A Gale. 1991. Probability scoring for spelling correction. Statistics and Computing, 1:93–103.
  9. Philip Gage. 1994. A new algorithm for data compression. The C Users Journal, 12(2):23–38.
  10. Black-box generation of adversarial text sequences to evade deep learning classifiers. In 2018 IEEE Security and Privacy Workshops (SPW), pages 50–56. IEEE.
  11. Optimizing prompts for text-to-image generation. arXiv preprint arXiv:2212.09611.
  12. Measuring massive multitask language understanding. Proceedings of the International Conference on Learning Representations (ICLR).
  13. Carl James. 2013. Errors in language learning and use: Exploring error analysis. Routledge.
  14. Mistral 7b. arXiv preprint arXiv:2310.06825.
  15. Speech-aware multi-domain dialogue state generation with asr error correction modules. In Proceedings of The Eleventh Dialog System Technology Challenge, pages 105–112.
  16. Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406.
  17. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213.
  18. Fastcorrect: Fast error correction with edit alignment for automatic speech recognition. Advances in Neural Information Processing Systems, 34:21708–21719.
  19. Textbugger: Generating adversarial text against real-world applications. arXiv preprint arXiv:1812.05271.
  20. Edward Ma. 2019. Nlp augmentation. https://github.com/makcedward/nlpaug.
  21. Can generative large language models perform asr error correction? arXiv preprint arXiv:2307.04172.
  22. Long Mai and Julie Carson-Berndsen. 2024. Enhancing conversation smoothness in language learning chatbots: An evaluation of gpt4 for asr error correction. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 11001–11005. IEEE.
  23. Asr error correction and domain adaptation using machine translation. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6344–6348. IEEE.
  24. Gemma. Kaggle.
  25. JFLEG: A fluency corpus and benchmark for grammatical error correction. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 229–234, Valencia, Spain. Association for Computational Linguistics.
  26. Survey of post-ocr processing approaches. ACM Computing Surveys (CSUR), 54(6):1–37.
  27. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5206–5210. IEEE.
  28. GrIPS: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3845–3864, Dubrovnik, Croatia. Association for Computational Linguistics.
  29. Robust speech recognition via large-scale weak supervision. In International Conference on Machine Learning, pages 28492–28518. PMLR.
  30. Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. arXiv preprint arXiv:2310.11324.
  31. BART for post-correction of OCR newspaper text. In Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021), pages 284–290, Online. Association for Computational Linguistics.
  32. Felix Stahlberg and Shankar Kumar. 2021. Synthetic data generation for grammatical error correction with tagged corruption models. In Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, pages 37–47, Online. Association for Computational Linguistics.
  33. SALMONN: Towards generic hearing abilities for large language models. In The Twelfth International Conference on Learning Representations.
  34. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805.
  35. Xiang Tong and David A. Evans. 1996. A statistical approach to automatic OCR error correction in context. In Fourth Workshop on Very Large Corpora, Herstmonceux Castle, Sussex, UK. Association for Computational Linguistics.
  36. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
  37. Zephyr: Direct distillation of lm alignment. arXiv preprint arXiv:2310.16944.
  38. Seaeval for multilingual foundation models: From cross-lingual alignment to cultural reasoning. arXiv preprint arXiv:2309.04766.
  39. An overview on language models: Recent developments and outlook. arXiv preprint arXiv:2303.05759.
  40. Byt5: Towards a token-free future with pre-trained byte-to-byte models. Transactions of the Association for Computational Linguistics, 10:291–306.
  41. Zheng Yuan and Ted Briscoe. 2016. Grammatical error correction using neural machine translation. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 380–386, San Diego, California. Association for Computational Linguistics.
  42. Tinyllama: An open-source small language model.
  43. Spelling error correction with soft-masked BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 882–890, Online. Association for Computational Linguistics.
  44. Large language models are human-level prompt engineers. ICLR.
  45. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv preprint arXiv:2306.04528.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Bin Wang (750 papers)
  2. Chengwei Wei (17 papers)
  3. Zhengyuan Liu (41 papers)
  4. Geyu Lin (10 papers)
  5. Nancy F. Chen (97 papers)
Citations (9)