Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

You Don't Know My Favorite Color: Preventing Dialogue Representations from Revealing Speakers' Private Personas (2205.10228v1)

Published 26 Apr 2022 in cs.CL, cs.CR, and cs.LG

Abstract: Social chatbots, also known as chit-chat chatbots, evolve rapidly with large pretrained LLMs. Despite the huge progress, privacy concerns have arisen recently: training data of LLMs can be extracted via model inversion attacks. On the other hand, the datasets used for training chatbots contain many private conversations between two individuals. In this work, we further investigate the privacy leakage of the hidden states of chatbots trained by LLMing which has not been well studied yet. We show that speakers' personas can be inferred through a simple neural network with high accuracy. To this end, we propose effective defense objectives to protect persona leakage from hidden states. We conduct extensive experiments to demonstrate that our proposed defense objectives can greatly reduce the attack accuracy from 37.6% to 0.5%. Meanwhile, the proposed objectives preserve LLMs' powerful generation ability.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Haoran Li (166 papers)
  2. Yangqiu Song (196 papers)
  3. Lixin Fan (77 papers)
Citations (16)