Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Proofread: Fixes All Errors with One Tap (2406.04523v1)

Published 6 Jun 2024 in cs.CL and cs.LG

Abstract: The impressive capabilities in LLMs provide a powerful approach to reimagine users' typing experience. This paper demonstrates Proofread, a novel Gboard feature powered by a server-side LLM in Gboard, enabling seamless sentence-level and paragraph-level corrections with a single tap. We describe the complete system in this paper, from data generation, metrics design to model tuning and deployment. To obtain models with sufficient quality, we implement a careful data synthetic pipeline tailored to online use cases, design multifaceted metrics, employ a two-stage tuning approach to acquire the dedicated LLM for the feature: the Supervised Fine Tuning (SFT) for foundational quality, followed by the Reinforcement Learning (RL) tuning approach for targeted refinement. Specifically, we find sequential tuning on Rewrite and proofread tasks yields the best quality in SFT stage, and propose global and direct rewards in the RL tuning stage to seek further improvement. Extensive experiments on a human-labeled golden set showed our tuned PaLM2-XS model achieved 85.56\% good ratio. We launched the feature to Pixel 8 devices by serving the model on TPU v5 in Google Cloud, with thousands of daily active users. Serving latency was significantly reduced by quantization, bucket inference, text segmentation, and speculative decoding. Our demo could be seen in \href{https://youtu.be/4ZdcuiwFU7I}{Youtube}.

Citations (1)

Summary

  • The paper introduces a one-tap error correction system that leverages an LLM to reduce grammatical mistakes on Gboard.
  • It details a comprehensive methodology including synthetic data generation, precise metrics design, and sequential tuning with supervised and reinforcement learning.
  • Empirical results achieve an 85.56% Good Ratio, demonstrating significant improvements in error correction and practical system viability.

Proofread: Fixes All Errors with One Tap

The paper "Proofread: Fixes All Errors with One Tap" by Liu et al. presents a comprehensive system designed to significantly enhance user typing experiences on Gboard through an advanced grammatical error correction feature powered by a LLM. The Proofread feature offers sentence-level and paragraph-level corrections with a single tap, aiming to alleviate the cognitive load and inefficiencies associated with traditional error correction methods on mobile keyboards.

System Overview

The architecture of the Proofread feature is delineated into four primary components: data generation, metrics design, model tuning, and model serving. The underlying model operates on Gboard, leveraging a server-side LLM to provide high-quality grammatical corrections that are deployable in real-world scenarios.

  1. Data Generation: The authors designed a detailed synthetic data pipeline to generate a robust training dataset. This pipeline integrates typical keyboard input errors, such as character omission, insertion, transposition, and others. The generated data is subsequently refined through Gboard's built-in functionalities and heuristic filtering utilizing LLM diagnostics to ensure alignment with real user scenarios. This careful generation and filtration process results in a dataset that closely mimics the actual input patterns observed in Gboard usage.
  2. Metrics Design: To effectively evaluate the model, the authors defined several specific metrics: Exact Match Ratio (EM), Normalized Exact Match Ratio (NEM), Error Ratio, Diff Meaning Ratio, Good Ratio, and Bad Ratio. Among these, the Good and Bad ratios serve as the primary evaluation metrics due to their robustness, combining grammar error detection and meaning preservation checks based on LLMs. This multifaceted metrics framework ensures a comprehensive evaluation of the model's performance on user-relevant dimensions.
  3. Model Tuning: The model tuning process involved a two-stage approach. Initially, a supervised fine-tuning (SFT) on a rewrite dataset was conducted, followed by further fine-tuning on the synthetic proofreading dataset. This was inspired by the success of instruction tuning in InstructGPT. Experiments showed that sequential tuning on Rewrite and Proofread datasets yielded the best results. The authors also employed reinforcement learning (RL) with heuristic rewards to further refine the model. The use of global and direct rewards in RL led to significant reductions in the grammar error rate, improving the model's robustness and performance.
  4. Model Serving: Deployment of the model was optimized for efficiency on TPU v5 in Google Cloud. Techniques such as 8-bit quantization, bucket inference, text segmentation, and speculative decoding were used to minimize serving latency without sacrificing quality. Notably, speculative decoding alone reduced the median latency by 39.4%, demonstrating the practical viability of the system for real-world usage.

Experimental Results

The empirical results underscore the efficacy of the proposed system. The PaLM2-XS model tuned with supervised fine-tuning and RL achieved an impressive 85.56% Good Ratio and a 14.44% Bad Ratio on a human-labeled golden dataset. These metrics indicate a substantial improvement over baseline models and validate the effectiveness of the proposed tuning strategies.

Implications and Future Directions

This research has significant practical implications for enhancing the typing experience on mobile devices. The deployment of the Proofread feature can dramatically reduce the cognitive load associated with error correction, enabling users to type more quickly and with fewer interruptions. The methodology described could be extended to other applications requiring high-accuracy text correction and synthesis.

The theoretical implications lie in the demonstrated effectiveness of combining SFT and RL strategies for LLM tuning. By optimizing different facets of model performance through these sequential stages, the authors provided a blueprint for achieving high-quality outputs from LLMs in specific application domains.

Future research directions could explore the integration of real-user feedback for continuous improvement, the extension of the system to support multiple languages, the adaptation to diverse writing styles, and the development of privacy-preserving methods for on-device deployment.

Conclusion

This paper elucidates a novel approach to enhancing user typing experiences through advanced grammatical error correction powered by an LLM. The careful design of the data generation process, the multifaceted evaluation metrics, the sequential model tuning approach, and the efficient deployment techniques collectively contribute to a robust and practical solution. The Proofread feature exemplifies the potential of LLMs to transform everyday user interactions, opening avenues for further advancements in AI-driven text processing technologies.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews