Information Leakage in Embedding Models (2004.00053v2)

Published 31 Mar 2020 in cs.LG, cs.CL, cs.CR, and stat.ML

Abstract: Embeddings are functions that map raw input data to low-dimensional vector representations, while preserving important semantic information about the inputs. Pre-training embeddings on a large amount of unlabeled data and fine-tuning them for downstream tasks is now a de facto standard in achieving state of the art learning in many domains. We demonstrate that embeddings, in addition to encoding generic semantics, often also present a vector that leaks sensitive information about the input data. We develop three classes of attacks to systematically study information that might be leaked by embeddings. First, embedding vectors can be inverted to partially recover some of the input data. As an example, we show that our attacks on popular sentence embeddings recover between 50\%--70\% of the input words (F1 scores of 0.5--0.7). Second, embeddings may reveal sensitive attributes inherent in inputs and independent of the underlying semantic task at hand. Attributes such as authorship of text can be easily extracted by training an inference model on just a handful of labeled embedding vectors. Third, embedding models leak moderate amount of membership information for infrequent training data inputs. We extensively evaluate our attacks on various state-of-the-art embedding models in the text domain. We also propose and evaluate defenses that can prevent the leakage to some extent at a minor cost in utility.

Authors (2)

Congzheng Song (23 papers)
Ananth Raghunathan (6 papers)

Citations (227)

View on Semantic Scholar

Summary

Information Leakage in Embedding Models: An Analysis

The paper "Information Leakage in Embedding Models" addresses crucial privacy concerns associated with embedding models. As machine learning systems increasingly rely on embeddings for transfer learning, understanding potential leaks of sensitive information through embeddings becomes vital. This paper provides a comprehensive analysis of three privacy threats in embedding models: embedding inversion, sensitive attribute inference, and membership inference.

Key Insights and Methods

Embedding Inversion: The authors demonstrate that embeddings can be susceptible to inversion attacks, potentially allowing reconstruction of input data. Both white-box and black-box scenarios are explored, highlighting that existing models can retain more information about raw inputs than merely semantic representations. Notable results show that inversion attacks can recover between 50% and 70% of input words in sentence embeddings, as measured by F1 scores. The techniques used include gradient-based optimization for white-box models and a learning-based approach for black-box settings, emphasizing embeddings' vulnerability when mapped directly from raw inputs.

Attribute Inference: The paper illustrates how embeddings may inadvertently reveal sensitive input attributes not pertinent to the intended task, such as authorship. Using frameworks like dual-encoder models trained with contrastive learning, the paper shows that even subtle latent classes (like author identity) can be inferred with limited labeled data. An adversary can exploit these by training a classifier to predict attributes from embeddings, significantly outperforming traditional methods with minimal labeled data.

Membership Inference: The ability to infer whether certain data was part of the training set is a recognized privacy risk, often evaluated through membership inference attacks. This paper extends the context to embedding models, noting that even word-level embeddings can leak membership information, especially for infrequent data. Using similarity metrics, they demonstrated a 30% improvement on membership information inference over random guessing, indicating a nontrivial degree of data memorization.

Implications and Future Directions

This research uncovers substantial privacy risks associated with embedding models, urging more robust defenses in their deployment. The ability to recover sensitive input data and infer latent attributes should inform the development of embedding architectures with enforced privacy by design. Notably, the authors propose adversarial training defenses that reduce information leakage albeit with some utility loss, marking an initial step towards more secure practices.

The theoretical and empirical insights suggest multiple avenues for future work. Researchers may explore embedding techniques that naturally obscure sensitive aspects or apply privacy-preserving machine learning paradigms like differential privacy with better scalability. As advances in neural architectures continue, comprehensive evaluations of new models under privacy scrutiny will be necessary.

In conclusion, while embeddings serve as versatile tools for downstream ML tasks, they need to be handled cautiously concerning privacy. Models trained on sensitive data should incorporate safeguards against leakage risks, ensuring public and organizational trust in AI deployments. This paper's contributions provide a foundational understanding of these risks and potential defensive strategies.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/HumanLevelJen/status/1760538464578097446

https://twitter.com/briandcolwell/status/1917990447605268785

https://twitter.com/briandcolwell/status/1918007225429442575

YouTube

Show All Videos