Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 165 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 25 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 81 tok/s Pro
Kimi K2 189 tok/s Pro
GPT OSS 120B 445 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

Contextual Multilingual Spellchecker for User Queries (2305.01082v2)

Published 1 May 2023 in cs.CL, cs.IR, and cs.LG

Abstract: Spellchecking is one of the most fundamental and widely used search features. Correcting incorrectly spelled user queries not only enhances the user experience but is expected by the user. However, most widely available spellchecking solutions are either lower accuracy than state-of-the-art solutions or too slow to be used for search use cases where latency is a key requirement. Furthermore, most innovative recent architectures focus on English and are not trained in a multilingual fashion and are trained for spell correction in longer text, which is a different paradigm from spell correction for user queries, where context is sparse (most queries are 1-2 words long). Finally, since most enterprises have unique vocabularies such as product names, off-the-shelf spelling solutions fall short of users' needs. In this work, we build a multilingual spellchecker that is extremely fast and scalable and that adapts its vocabulary and hence speller output based on a specific product's needs. Furthermore, our speller out-performs general purpose spellers by a wide margin on in-domain datasets. Our multilingual speller is used in search in Adobe products, powering autocomplete in various applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Kevin Atkinson. 2018. GNU Aspell Manual. (2018). http://aspell.net/man-html/index.html Online documentation and code.
  2. Enriching Word Vectors with Subword Information. In Transactions of the Association for Computational Linguistics, Vol. 5. MIT Press, Cambridge, MA, 135–146. https://doi.org/10.1162/tacl_a_00051
  3. Double Dutch: The Dutch spelling system and learning to spell in Dutch. In Handbook of Orthography and Literacy, R. Malatesha Joshi and P. G. Aaron (Eds.). Routledge, 135–150.
  4. Improving Query Spelling Correction Using Web Search Results. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Association for Computational Linguistics, Prague, Czech Republic, 181–189. https://aclanthology.org/D07-1019
  5. How Difficult is it to Develop a Perfect Spell-checker? A Cross-Linguistic Analysis through Complex Network Approach. In Proceedings of the Second Workshop on TextGraphs: Graph-Based Algorithms for Natural Language Processing. Association for Computational Linguistics, Rochester, NY, USA, 81–88. https://aclanthology.org/W07-0212
  6. A Large Scale Ranker-Based System for Search Query Spelling Correction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010). Coling 2010 Organizing Committee, Beijing, China, 358–366. https://aclanthology.org/C10-1041
  7. Wolf Garbe. 2012. 1000x Faster Spelling Correction algorithm. (2012). https://seekstorm.com/blog/1000x-spelling-correction/ SeekStorm blog post.
  8. A Light Weight Stemmer for Bengali and its Use in Spelling Checker. In Proceedings of the 1st International Conference on Digital Communications and Computer Applications (DCCA2007). 87–93. Center for Research on Bangla Language Processing, BRAC University.
  9. NeuSpell: A Neural Spelling Correction Toolkit. CoRR abs/2010.11085 (2020). arXiv:2010.11085 https://arxiv.org/abs/2010.11085
  10. Alex Kuznetsov and Hector Urdiales. 2021. Spelling Correction with Denoising Transformer. https://doi.org/10.48550/ARXIV.2105.05977
  11. Vladimir Iosifovich Levenshtein. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10, 8 (1966), 707–710. Doklady Akademii Nauk SSSR, V163 No4 845-848 1965.
  12. Using the web for language independent spellchecking and autocorrection. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. ACM, 890–899. https://dl.acm.org/doi/10.5555/1699571.1699629
Citations (6)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube