Improved Algorithms for Differentially Private Language Model Alignment (2505.08849v1)

Published 13 May 2025 in cs.CR, cs.AI, and cs.LG

Abstract: LLM alignment is crucial for ensuring that LLMs align with human preferences, yet it often involves sensitive user data, raising significant privacy concerns. While prior work has integrated differential privacy (DP) with alignment techniques, their performance remains limited. In this paper, we propose novel algorithms for privacy-preserving alignment and rigorously analyze their effectiveness across varying privacy budgets and models. Our framework can be deployed on two celebrated alignment techniques, namely direct preference optimization (DPO) and reinforcement learning from human feedback (RLHF). Through systematic experiments on large-scale LLMs, we demonstrate that our approach achieves state-of-the-art performance. Notably, one of our algorithms, DP-AdamW, combined with DPO, surpasses existing methods, improving alignment quality by up to 15% under moderate privacy budgets ({\epsilon}=2-5). We further investigate the interplay between privacy guarantees, alignment efficacy, and computational demands, providing practical guidelines for optimizing these trade-offs.

Authors (4)

Keyu Chen (76 papers)
Hao Tang (379 papers)
Qinglin Liu (12 papers)
Yizhao Xu (1 paper)

Summary

Improved Algorithms for Differentially Private LLM Alignment

Differential privacy (DP) has become a cornerstone for protecting sensitive data in machine learning, yet the integration of DP with LLM alignment techniques has remained limited in efficacy. The paper "Improved Algorithms for Differentially Private LLM Alignment" addresses this challenge by introducing novel algorithms tailored to privacy-preserving alignment, focusing on two prominent techniques: Direct Preference Optimization (DPO) and Reinforcement Learning from Human Feedback (RLHF).

The authors propose a unified framework designed to safeguard the privacy of user data during model alignment while maintaining state-of-the-art performance. A key component of their approach is the development of a new optimizer, DP-ADAMW, which extends the capabilities of DP-ADAM by incorporating decoupled weight decay. This new optimizer demonstrates enhanced performance over traditional DP-SGD, with experimental results indicating up to a 15% improvement in alignment quality under moderate privacy budgets ( $\varepsilon = 2$ –5).

The comprehensive experimental evaluation utilizes large-scale LLMs including LLAMA-8B and GPT-2, along with three differentially private optimizers (DP-ADAM, DP-ADAMW, and DP-SGD) across a range of privacy budgets. Key findings indicate that adaptive optimizers like DP-ADAMW substantially outperform DP-SGD, especially in the context of large model architectures where they offer significant robustness to privacy noise. Among the alignment algorithms tested, DPO consistently outperforms PPO in terms of stability and performance metrics, achieving effective alignment while adhering to differential privacy constraints.

The implications of these findings suggest a favorable balance between privacy protection and model utility can be achieved with moderate privacy budgets. The paper identifies critical insights into selecting privacy budgets and optimizers, recommending DP-ADAMW as particularly promising due to its superior performance in preserving alignment quality under stringent privacy settings. Moreover, the results highlight the potential benefits of scaling model architectures, which show greater resilience to privacy-induced noise.

Future directions proposed by the authors include developing hybrid optimization strategies that leverage the strengths of multiple optimizers, dynamically adapting privacy budgets during the training process, and extending these methodologies to even larger model architectures. Additionally, exploring alternative privacy-preserving mechanisms and integrating model compression techniques can provide further improvements to the scalability and efficiency of private LLM alignment.

Through the introduction of innovative algorithms and a comprehensive evaluation framework, this research significantly advances the deployment of privacy-aligned LLMs. The establishment of practical guidelines for balancing privacy and alignment quality marks a critical step toward robust privacy protection without compromising the utility of aligned models.

Related Papers

Find Related Papers

YouTube

Show All Videos