Contextual Multilingual Spellchecker for User Queries (2305.01082v2)
Abstract: Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.
- Kevin Atkinson. 2018. GNU Aspell Manual. (2018). http://aspell.net/man-html/index.html Online documentation and code.
- Enriching Word Vectors with Subword Information. In Transactions of the Association for Computational Linguistics, Vol. 5. MIT Press, Cambridge, MA, 135–146. https://doi.org/10.1162/tacl_a_00051
- Double Dutch: The Dutch spelling system and learning to spell in Dutch. In Handbook of Orthography and Literacy, R. Malatesha Joshi and P. G. Aaron (Eds.). Routledge, 135–150.
- Improving Query Spelling Correction Using Web Search Results. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, 181–189. https://aclanthology.org/D07-1019
- How Difficult is it to Develop a Perfect Spell-checker? A Cross-Linguistic Analysis through Complex Network Approach. In Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing. Association for Computational Linguistics, Rochester, NY, USA, 81–88. https://aclanthology.org/W07-0212
- A Large Scale Ranker-Based System for Search Query Spelling Correction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). Coling 2010 Organizing Committee, Beijing, China, 358–366. https://aclanthology.org/C10-1041
- Wolf Garbe. 2012. 1000x Faster Spelling Correction algorithm. (2012). https://seekstorm.com/blog/1000x-spelling-correction/ SeekStorm blog post.
- A Light Weight Stemmer for Bengali and its Use in Spelling Checker. In Proceedings of the 1st International Conference on Digital Communications and Computer Applications (DCCA2007). 87–93. Center for Research on Bangla Language Processing, BRAC University.
- NeuSpell: A Neural Spelling Correction Toolkit. CoRR abs/2010.11085 (2020). arXiv:2010.11085 https://arxiv.org/abs/2010.11085
- Alex Kuznetsov and Hector Urdiales. 2021. Spelling Correction with Denoising Transformer. https://doi.org/10.48550/ARXIV.2105.05977
- Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 8 (1966), 707–710. Doklady Akademii Nauk SSSR, V163 No4 845-848 1965.
- Using the web for language independent spellchecking and autocorrection. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. ACM, 890–899. https://dl.acm.org/doi/10.5555/1699571.1699629
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.