Federated Learning of Gboard Language Models with Differential Privacy (2305.18465v2)

Published 29 May 2023 in cs.LG and cs.CR

Abstract: We train LLMs (LMs) with federated learning (FL) and differential privacy (DP) in the Google Keyboard (Gboard). We apply the DP-Follow-the-Regularized-Leader (DP-FTRL)~\citep{kairouz21b} algorithm to achieve meaningfully formal DP guarantees without requiring uniform sampling of client devices. To provide favorable privacy-utility trade-offs, we introduce a new client participation criterion and discuss the implication of its configuration in large scale systems. We show how quantile-based clip estimation~\citep{andrew2019differentially} can be combined with DP-FTRL to adaptively choose the clip norm during training or reduce the hyperparameter tuning in preparation for training. With the help of pretraining on public data, we train and deploy more than twenty Gboard LMs that achieve high utility and $\rho-$zCDP privacy guarantees with $\rho \in (0.2, 2)$, with two models additionally trained with secure aggregation~\citep{bonawitz2017practical}. We are happy to announce that all the next word prediction neural network LMs in Gboard now have DP guarantees, and all future launches of Gboard neural network LMs will require DP guarantees. We summarize our experience and provide concrete suggestions on DP training for practitioners.

Citations (67)

View on Semantic Scholar

Summary

The paper introduces the DP-FTRL algorithm to provide formal differential privacy guarantees in federated learning without uniform client sampling.
It integrates adaptive clipping with quantile-based estimation and periodic restarts to optimize the privacy-utility trade-off in Gboard LM training.
Deployments of over twenty models demonstrate effective privacy protection (ρ‑zCDP 0.2–2) and enhanced security through secure aggregation.

Analysis of Federated Learning of Gboard LLMs with Differential Privacy

The publication under review examines the methodologies and practical implementations of training LLMs (LMs) for the Google Keyboard (Gboard) using federated learning (FL) combined with differential privacy (DP). It focuses on the challenges and solutions associated with ensuring user privacy without compromising the utility of the models in a large-scale deployment system.

The main contribution of this work is the implementation of the DP-Follow-the-Regularized-Leader (DP-FTRL) algorithm, enabling formal DP guarantees in the absence of uniform client device sampling. A critical aspect of this implementation is the introduction of a client participation criterion to balance the privacy-utility trade-off in expansive system configurations. The paper explores the application of quantile-based clip estimation to facilitate adaptive clip norm selection during training, reducing the required hyperparameter tuning efforts.

Remarkably, the authors deploy over twenty Gboard LMs, achieving notable utility with $\rho-$ zCDP guarantees ranging from 0.2 to 2, illustrating varying degrees of differential privacy protection. Two of these models were also trained with secure aggregation to enhance data minimization. The deployment now mandates DP guarantees for all next word prediction (NWP) neural network LMs, marking a significant shift towards stringent privacy standards in model training and deployment.

From a technical perspective, the paper explores the integration of adaptive clipping within DP-FTRL by implementing periodic restarts, enabling the algorithm to maintain strong privacy assurances while adjusting to changing data conditions. This flexibility is crucial in dynamic environments such as Gboard, where client data is constantly evolving.

The authors' approach to analyzing the impact of several critical factors on privacy and utility, including the noise multiplier, total rounds of training, and client participation frequency, adds considerable value to the field. By employing a timer-driven participation mechanism to control device level DP constraints and configuring system settings for maximum MinS (minimum separation) between rounds, the paper demonstrates how these parameters can affect the robustness of privacy guarantees.

Interestingly, the paper also addresses the role of pretraining on public data to enhance the utility and privacy of production models, emphasizing the importance of careful pretraining regimen to mitigate any potential finescale utility drop upon subsequent private training rounds.

Importantly, the application of secure aggregation alongside DP-FTRL introduces a robust mechanism against data inspection, ensuring that aggregated client data remains secure even from potential internal exposure. The analysis of sensitivity associated to secure aggregation affirms the meticulous attention required to ensure model updates remain bounded within DP constraints post-aggregation and scaling.

Looking forward, this paper sets a framework for continued exploration in the realms of AI where privacy-preserving techniques are paramount. By prioritizing strong DP guarantees in LLM training, the implications resonate across various AI applications, pointing towards a future where data privacy is inseparably integrated within AI systems without eroding model performance or scalability.

Overall, the findings and methodologies elucidated in the paper contribute meaningfully to the discipline, presenting actionable insights and innovative strategies for practitioners navigating the complexity of federated learning under strict privacy frameworks. This paper stands as a valuable resource for researchers striving to balance high utility and stringent privacy in AI systems, laying a foundation for future advancements in this critical area.

PDF Markdown

Related Papers

Tweets

https://twitter.com/snubeaver/status/1932675964913529291

https://twitter.com/Ar_Douillard/status/1932845591375065428

YouTube

Show All Videos