Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Spelling Correction with Denoising Transformer (2105.05977v1)

Published 12 May 2021 in cs.CL, cs.AI, and cs.LG

Abstract: We present a novel method of performing spelling correction on short input strings, such as search queries or individual words. At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans. This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the HubSpot product search. We show that our approach to typo generation is superior to the widespread practice of adding noise, which ignores human patterns. We also demonstrate how our approach may be extended to resource-scarce settings and train spelling correction models for Arabic, Greek, Russian, and Setswana languages, without using any labeled data.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Alex Kuznetsov (1 paper)
  2. Hector Urdiales (2 papers)
Citations (17)

Summary

We haven't generated a summary for this paper yet.