Large Language Models Lack Understanding of Character Composition of Words (2405.11357v3)
Abstract: LLMs have demonstrated remarkable performances on a wide range of natural language tasks. Yet, LLMs' successes have been largely restricted to tasks concerning words, sentences, or documents, and it remains questionable how much they understand the minimal units of text, namely characters. In this paper, we examine contemporary LLMs regarding their ability to understand character composition of words, and show that most of them fail to reliably carry out even the simple tasks that can be handled by humans with perfection. We analyze their behaviors with comparison to token level performances, and discuss the potential directions for future research.
- Gpt-4 technical report. 2023.
- Flair: An easy-to-use framework for state-of-the-art nlp. In North American Chapter of the Association for Computational Linguistics, 2019.
- See: Towards semi-supervised end-to-end scene text recognition. In AAAI Conference on Artificial Intelligence, 2017.
- Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135–146, 2016.
- Palm: Scaling language modeling with pathways. J. Mach. Learn. Res., 24:240:1–240:113, 2022.
- Electra: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations, 2020.
- Claude. Claude.ai. https://claude.ai/, 2023. [Accessed 17-05-2024].
- Bert: Pre-training of deep bidirectional transformers for language understanding. In North American Chapter of the Association for Computational Linguistics, 2019.
- Svtr: Scene text recognition with a single visual model. In International Joint Conference on Artificial Intelligence, 2022.
- Mistral 7b. ArXiv, abs/2310.06825, 2023.
- Character-aware neural language models. In AAAI Conference on Artificial Intelligence, 2015.
- Building a large annotated corpus of english: The penn treebank. Comput. Linguistics, 19:313–330, 1993.
- OpenAI. Openai: Introducing chatgpt. https://openai.com/blog/chatgpt, 2022.
- Deep contextualized word representations. ArXiv, abs/1802.05365, 2018.
- Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. ArXiv, abs/2403.05530, 2024.
- Neural machine translation of rare words with subword units. ArXiv, abs/1508.07909, 2015.
- Llama 2: Open foundation and fine-tuned chat models. ArXiv, abs/2307.09288, 2023.
- Glue: A multi-task benchmark and analysis platform for natural language understanding. In BlackboxNLP@EMNLP, 2018.
- Superglue: A stickier benchmark for general-purpose language understanding systems. ArXiv, abs/1905.00537, 2019.
- Large language models for simultaneous named entity extraction and spelling correction. ArXiv, abs/2403.00528, 2024.
- Charagram: Embedding words and sentences via character n-grams. In Su, J., Duh, K., and Carreras, X. (eds.), Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1504–1515, Austin, Texas, November 2016. Association for Computational Linguistics. doi: 10.18653/v1/D16-1157.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Collections
Sign up for free to add this paper to one or more collections.