Federated Learning for Mobile Keyboard Prediction (1811.03604v2)

Published 8 Nov 2018 in cs.CL

Abstract: We train a recurrent neural network LLM using a distributed, on-device learning framework called federated learning for the purpose of next-word prediction in a virtual keyboard for smartphones. Server-based training using stochastic gradient descent is compared with training on client devices using the Federated Averaging algorithm. The federated algorithm, which enables training on a higher-quality dataset for this use case, is shown to achieve better prediction recall. This work demonstrates the feasibility and benefit of training LLMs on client devices without exporting sensitive user data to servers. The federated learning environment gives users greater control over the use of their data and simplifies the task of incorporating privacy by default with distributed training and aggregation across a population of client devices.

Citations (1,415)

View on Semantic Scholar

Summary

The paper introduces a federated learning framework that trains CIFG RNN language models on mobile devices, outperforming centralized training while protecting user data.
It employs a CIFG variant of LSTM with 1.4M parameters optimized for on-device inference via TensorFlow Lite in a mobile keyboard setting.
Live experiments showed a 10% increase in click-through rate and improvements in top-1 and top-3 recall compared to server-based models.

Federated Learning for Mobile Keyboard Prediction

The paper "Federated learning for mobile keyboard prediction" by Andrew Hard et al. presents an investigation into the application of federated learning for training recurrent neural network (RNN) LLMs in a mobile keyboard setting, specifically Gboard. The paper compares performance metrics between a neural model trained on a centralized server and one trained using a federated approach, demonstrating that federated learning can yield superior results while enhancing user privacy.

Introduction

Gboard is a widely used virtual keyboard for touchscreen devices, offering functionalities such as auto-correction, word completion, and next-word prediction. These features are critical as mobile typing becomes increasingly predominant. Traditionally, Gboard's LLM for next-word prediction utilized a word n-gram finite state transducer (FST). This paper focuses on training a neural LLM using federated learning, thereby leveraging user-generated data stored on client devices without transferring sensitive information to centralized servers.

Model Architecture

The paper employs a Coupled Input-Forget Gate (CIFG) variant of the Long Short-Term Memory (LSTM) RNN for next-word prediction. CIFG reduces the number of parameters per cell, which is advantageous for mobile environments where computational resources are constrained. The CIFG models are trained using TensorFlow and support on-device inference via TensorFlow Lite. The model includes a dictionary of 10,000 words and a single-layer CIFG with 670 units, resulting in a total of 1.4 million parameters.

Federated Learning

Federated learning is introduced as a decentralized approach where client devices perform local computations and only share model updates with a central server, which aggregates these updates to form a new global model. This method improves data privacy and security, as raw user data is never transmitted to the server. The paper employs the FederatedAveraging algorithm to aggregate the client updates, showcasing that this method can handle data that are not independently and identically distributed across clients.

Experiments

The experiments encompass both server-based and federated training of the CIFG LLM. For server-based training, data is collected from Gboard usage logs, while federated training leverages locally stored caches on client devices. Key metrics such as top-1 and top-3 recall are evaluated on both server-hosted logs data and client-held data. The experiments also include live user evaluations to measure real-world performance improvements.

Results

The results indicate that the CIFG model trained via federated learning outperforms its server-trained counterpart and the baseline n-gram FST model. Detailed performance metrics are provided:

Server CIFG: Achieved a top-1 recall of 16.5% and top-3 recall of 27.1% on server-hosted logs data.
Federated CIFG: Showed top-1 recall improvements (5% relative) over the server CIFG evaluated on client-held data.
Live Production Metrics: In live experiments, federated learning achieved higher top-1 and top-3 prediction impression recall by 1% relative to server-based training, along with a significant 10% increase in prediction click-through rate (CTR) compared to the n-gram model.

Conclusion

The paper concludes that federated learning not only safeguards user privacy but also enhances model performance for next-word prediction in mobile keyboards. This research exemplifies the potential of federated learning in commercial applications, especially within environments demanding stringent privacy controls and computational efficiency. Future work could explore more sophisticated aggregation algorithms and further integrations of privacy-preserving techniques like differential privacy and secure aggregation.

Acknowledgements

The paper acknowledges the contributions of the Google AI team for providing the federated learning framework and valuable discussions throughout the research.

In summary, the research demonstrates the practical benefits and feasibility of adopting federated learning for LLMing in mobile keyboards, offering both performance enhancements and improved user data privacy.

PDF Markdown

Related Papers

Tweets

https://twitter.com/DivineGupta4/status/1906673729612161370

YouTube

Show All Videos