Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy (2305.03497v1)

Published 3 May 2023 in cs.CL, cs.AI, and cs.CR

Abstract: With the increasing use of cloud-based services for training and deploying machine learning models, data privacy has become a major concern. This is particularly important for NLP models, which often process sensitive information such as personal communications and confidential documents. In this study, we propose a method for training NLP models on encrypted text data to mitigate data privacy concerns while maintaining similar performance to models trained on non-encrypted data. We demonstrate our method using two different architectures, namely Doc2Vec+XGBoost and Doc2Vec+LSTM, and evaluate the models on the 20 Newsgroups dataset. Our results indicate that both encrypted and non-encrypted models achieve comparable performance, suggesting that our encryption method is effective in preserving data privacy without sacrificing model accuracy. In order to replicate our experiments, we have provided a Colab notebook at the following address: https://t.ly/lR-TP

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Ceren Ocal Tasar (4 papers)
  2. Davut Emre Tasar (4 papers)

Summary

We haven't generated a summary for this paper yet.