Can LLMs get help from other LLMs without revealing private information? (2404.01041v2)

Published 1 Apr 2024 in cs.LG, cs.AI, cs.CR, and cs.MA

Abstract: Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself. Serving stacks for LLMs increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs. However, applying cascade systems in situations where the local model has access to sensitive data constitutes a significant privacy risk for users since such data could be forwarded to the remote model. In this work, we show the feasibility of applying cascade systems in such setups by equipping the local model with privacy-preserving techniques that reduce the risk of leaking private information when querying the remote model. To quantify information leakage in such setups, we introduce two privacy measures. We then propose a system that leverages the recently introduced social learning paradigm in which LLMs collaboratively learn from each other by exchanging natural language. Using this paradigm, we demonstrate on several datasets that our methods minimize the privacy loss while at the same time improving task performance compared to a non-cascade baseline.

References (30)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a cascade system where a local student model queries a remote teacher model without revealing sensitive data.
It proposes two novel metrics—the entity leak and mapping leak—to rigorously assess privacy risks in LLM interactions.
Experiments across diverse tasks show that replacing entities minimizes privacy loss while maintaining high performance.

Exploring Privacy-Preserving Cascade Systems in LLMs

Introduction

LLMs have become a cornerstone in the advancement of machine learning capabilities, tackling a wide range of tasks with notable success. However, their deployment, especially in contexts handling sensitive information, is marred by privacy concerns. This paper investigates privacy-preserving cascade systems, wherein a local, less capable model (the student) queries a more powerful, remote model (the teacher) without compromising the privacy of the data involved. Employing the social learning paradigm, where models learn from each other through natural language exchanges, this work makes significant strides in minimizing privacy loss while enhancing task performance in scenarios requiring access to sensitive data.

Privacy Measures

The paper introduces two novel privacy measures to assess the effectiveness of its proposed cascade system. Firstly, the "entity leak metric" quantifies the extent to which sensitive entities, such as personal names or numbers, remain within the query sent from the student to the teacher. Secondly, the "mapping leak metric" evaluates the potential for a malicious teacher to reconstruct private information despite entity masking, using auxiliary information. These measures address the nuanced and multi-faceted nature of privacy risks in cascade systems, providing a comprehensive evaluation framework.

Proposed Methods

The paper details three methods designed to facilitate private communication between the student and teacher models:

Creating a problem description: The student generates an abstract description of its task, aiming to elicit helpful input from the teacher without revealing sensitive details.
Generating new unlabeled examples: This method involves the student synthesizing similar but novel tasks, based on the original, which are then labeled by the teacher.
Replacing entities in original examples: The student modifies the original task by obfuscating or replacing entities likely to contain sensitive information.

Additionally, it explores the use of grouping unlabeled examples to optimize the balance between information disclosure and the utility of teacher responses.

Experiments

The conducted experiments highlight the effectiveness of the proposed methods across various datasets, including GSM8k, Intent Recognition, Subj, and machine translation tasks. Notably, Method 3 ("Replacing entities") consistently delivers strong performance while ensuring minimal privacy loss, as per the entity leak metric. When considering the potential for information reconstruction using auxiliary data, generating new examples with grouping (Method 2) presents a robust approach to preserving privacy.

Implications and Future Directions

The research underscores the feasibility of implementing privacy-preserving cascade systems within LLMs, marking a pivotal step towards their ethical and safe deployment in privacy-sensitive applications. It opens new avenues for enhancing the privacy measures introduced, exploring complex interactions between student and teacher models, and extending the framework to other data modalities beyond text.

The work presented is a profound contribution to the ongoing dialogue on privacy in AI, providing valuable insights and methodologies for developing privacy-aware machine learning systems. As LLMs continue to permeate various aspects of technology and society, the importance of such research cannot be overstated. Future explorations might explore the intricacies of social learning dynamics, alternative privacy-preserving techniques, and the scalability of the proposed methods to cover a broader spectrum of LLM applications.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

Tweets

https://twitter.com/fly51fly/status/1775166394776334718

https://twitter.com/knishimae0531/status/1775296898066497864