- The paper introduces PRIV-QA, a multi-stage framework that safeguards user privacy during cloud-based QA by sanitizing sensitive information.
- It utilizes fine-tuned models to detect, substitute, and preserve key words while ensuring the original query context remains intact.
- Experiments on the SensitiveQA dataset show high recall (89.40%) for sensitive detection and an 85.83% defense rate against extraction attacks with moderate overhead.
This paper introduces PRIV-QA, a framework designed to protect user privacy when interacting with cloud-based LLMs for question-answering tasks (2502.13564). The core problem addressed is the risk of exposing sensitive personal information when user queries, often containing background context and specific questions, are sent to third-party LLM providers.
To facilitate research and evaluation in this area, the authors first construct SensitiveQA, a new bilingual (Chinese and English) dataset. It contains over 57,000 interactions, each comprising background text rich in personal sensitive information (names, dates, locations, personal details, sensitive numbers) and a related question (covering tasks like information extraction, open-ended QA, summarization).
The proposed PRIV-QA framework operates as a pipeline with two main modules:
- Hide Module (H): This module processes the user query (X) before sending it to the cloud LLM. It employs a multi-stage text sanitization strategy based on classifying words/tokens into three levels: High-Risk, Low-Risk, and Key-Words.
- Sensitive Information Detection: A fine-tuned generative model (SenM, based on Qwen2-0.5B-Chat) identifies "High-Risk" words containing sensitive information according to GDPR guidelines. To handle long texts, the input query is split into chunks, processed individually, and the results are aggregated.
- Sensitive Words Substitution: Another model (SubM, also Qwen2-0.5B-Chat) replaces each detected sensitive word (si) with a semantically similar but distinct placeholder word (pi). This creates a privacy-protected version of the query (Xs). The substitution pairs (si:pi) are stored.
- Important Words Preservation: A third model (ImpM, Qwen2-0.5B-Chat) identifies "Key-Words" crucial for understanding the query's context and intent, ensuring they are not obfuscated.
- (Optional) Non-Privacy Text Obfuscation: For enhanced protection, remaining "Low-Risk" tokens (excluding Key-Words and placeholders pi) can be further obfuscated using a token substitution method based on differential privacy principles (similar to InferDPT (2310.12214)), generating the final query X′ sent to the cloud.
- Recover Module (R): After the cloud LLM processes the sanitized query X′ and returns a response A′, this module restores the original meaning.
- A generative model (RcvM, based on Qwen2-1.5B-Chat) takes the original query X, the sanitized query X′, and the LLM's response A′ as input.
- It restores the original sensitive words (si) by reversing the substitution (pi→si) and corrects potential reasoning errors or inaccuracies introduced in A′ due to the sanitization process.
- The output is the final, corrected response A presented to the user.
The workflow is depicted in Algorithm 1 and Figure 3 of the paper.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
Algorithm PRIV-QA Workflow (Simplified):
1. Input: User Query X (Background T + Question Q)
2. # --- Hide Module ---
3. Split X into chunks x_i
4. Detect sensitive words S = Union(Sen_M(x_i)) for all chunks
5. Generate substitution pairs P = Sub_M(S) = {(s_i: p_i)}
6. Substitute sensitive words in X using P -> X_s
7. Identify important words I = Imp_M(X_s)
8. (Optional) Obfuscate non-private tokens in T_s (excluding I, P) -> T_{s,o}
9. Construct final sanitized query X' = T_{s,o} + Q_s (or T_s + Q_s if no obfuscation)
10. # --- Cloud Interaction ---
11. Send X' to Cloud LLM -> Get response A' = LLM(X')
12. # --- Recover Module ---
13. Recover original info & correct errors -> A = Rcv_M(X', X, A')
14. Output: Final Response A |
Implementation and Evaluation:
- The models for SenM, SubM, ImpM, and RcvM were fine-tuned from Qwen2-Chat models.
- Experiments were conducted using GPT-4-turbo and Qwen-Plus as cloud LLMs.
- Evaluation on the SensitiveQA dataset showed:
- High performance in sensitive information detection (e.g., 89.40% Recall for English).
- Strong query protection, measured by Extraction Defense Rate (EDR) – PRIV-QA resisted 85.83% of extraction attacks in English with obfuscation.
- High quality of recovered responses, outperforming baselines (CUSTEXT+, SANTEXT+, HaS) in metrics like BLEU, METEOR, ROUGE, and model-based evaluation using GPT-4o. For example, PRIV-QA achieved a BLEU score of 0.563 (English, GPT-4-turbo w/ obfuscation) and a 74.49% win+tie rate against ground truth.
- The framework demonstrates a favorable trade-off between privacy protection (security) and the utility/quality of the final response.
- Time analysis indicated an overhead of ~30-60%, decreasing relatively with longer inputs/outputs.
Contributions:
- The SensitiveQA dataset for privacy-preserving QA research.
- The PRIV-QA framework, offering a practical multi-stage approach to balance privacy and response quality for cloud LLMs.
- Demonstrated effectiveness through comprehensive experiments.