Gmail Smart Compose: Real-Time Assisted Writing

Published 17 May 2019 in cs.CL and cs.LG | (1906.00080v1)

Abstract: In this paper, we present Smart Compose, a novel system for generating interactive, real-time suggestions in Gmail that assists users in writing mails by reducing repetitive typing. In the design and deployment of such a large-scale and complicated system, we faced several challenges including model selection, performance evaluation, serving and other practical issues. At the core of Smart Compose is a large-scale neural LLM. We leveraged state-of-the-art machine learning techniques for LLM training which enabled high-quality suggestion prediction, and constructed novel serving infrastructure for high-throughput and real-time inference. Experimental results show the effectiveness of our proposed system design and deployment approach. This system is currently being served in Gmail.

Abstract PDF Chat (Pro)

Citations (196)

View on Semantic Scholar

Summary

The paper presents a real-time assisted writing system that leverages neural language models to generate predictive email suggestions and reduce repetitive typing.
It details the use of optimized beam search and TPU acceleration to meet stringent latency requirements, keeping the 90th percentile response time under 60ms.
The evaluation compares Transformer and RNN models, highlighting trade-offs in scalability, personalization, and bias mitigation for large-scale deployment.

Smart Compose: Enhancing Email Composition with Real-Time Suggestions

The paper "Gmail Smart Compose: Real-Time Assisted Writing," presented at the KDD 2019 conference, focuses on the development and deployment of a system called Smart Compose within Gmail to enhance email composition through the provision of real-time, interactive text suggestions. As a sophisticated neural LLM, Smart Compose fundamentally aims to reduce the repetitive nature of email writing by offering predictive text completions, thereby improving user efficiency without compromising the coherence of the message.

Core Components and System Architecture

At the heart of Smart Compose lies a large-scale neural LLM chosen after extensive evaluation of various architectures, including RNN-based and Transformer-based models. The neural LLM predicts a sequence of tokens that could follow a user's partial message entry. The system utilizes embeddings to account for contextual information such as previous email content, subject lines, user locale, and date/time information, which enhance the relevancy and appropriateness of suggestions.

The system architecture involves implementing Smart Compose as a real-time interactive tool within Gmail. This requires orchestrating a complex serving infrastructure capable of handling high-throughput requests with minimal latency to ensure seamless user interaction. Critical to its success is the optimization of beam search techniques for efficiently generating and selecting high-probability text completions.

Challenges and Solutions

Designing and deploying Smart Compose necessitated addressing several challenges unique to real-time, large-scale systems:

Latency: Real-time interaction imposes strict requirements on processing speed. The system's design aimed for the 90th percentile latency to remain below 60ms, achieved through strategic use of TPU accelerators and optimized batch processing of requests.
Scalability: With over 1.5 billion diverse users, the model needed to offer personalized suggestions that capture individual writing styles without breaching user privacy constraints.
Fairness and Bias: Care was taken to ensure that suggestions do not propagate gender or occupation biases, with specific attention to excluding suggestions containing gender pronouns and adhering to rigorous data privacy standards.
Personalization: A Katz-backed n-gram model fine-tuned for individual users complemented the general model to better capture personal linguistic nuances.

System Evaluation

Evaluation metrics included log perplexity, which measures the prediction quality of generated suggestions, and ExactMatch@N, which assesses the accuracy of suggestions compared to real user messages. Results indicated that Transformer models delivered superior quality but incurred higher latency costs compared to RNN models.

The production deployment favored an RNN-based model due to its balanced trade-off between model quality and inference efficiency, making it viable under stringent operational latency constraints.

Implications and Future Directions

Smart Compose represents a significant advancement in the intersection of language modeling and user interaction systems in AI. Its successful deployment in Gmail showcases the practical complexities of delivering cutting-edge machine learning solutions at scale.

Future advancements are poised to focus on further reducing latency while maintaining model quality, exploring strategies such as localized attention within Transformer models, adapting pre-training techniques, and incorporating variational methods for enhanced text generation diversity. Such developments could further refine the user experience while broadening the applicability of assisted writing technologies beyond email composition.

In summary, Smart Compose contributes to the ongoing evolution of AI-driven productivity tools, setting a robust foundation for future innovation in automatic text completion and interactive writing assistance systems.