Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy (2305.03497v1)

Published 3 May 2023 in cs.CL, cs.AI, and cs.CR

Abstract: With the increasing use of cloud-based services for training and deploying machine learning models, data privacy has become a major concern. This is particularly important for NLP models, which often process sensitive information such as personal communications and confidential documents. In this study, we propose a method for training NLP models on encrypted text data to mitigate data privacy concerns while maintaining similar performance to models trained on non-encrypted data. We demonstrate our method using two different architectures, namely Doc2Vec+XGBoost and Doc2Vec+LSTM, and evaluate the models on the 20 Newsgroups dataset. Our results indicate that both encrypted and non-encrypted models achieve comparable performance, suggesting that our encryption method is effective in preserving data privacy without sacrificing model accuracy. In order to replicate our experiments, we have provided a Colab notebook at the following address: https://t.ly/lR-TP

Authors (2)

Ceren Ocal Tasar (4 papers)
Davut Emre Tasar (4 papers)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy (2305.03497v1)

Summary

Related Papers