Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study (2405.06709v1)
Abstract: In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from LLMs (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional LLMs and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.
- B. Payne, “Privacy protection with ai: Survey of data-anonymization techniques,” 2020.
- P. Lison, I. Pilán, D. Sanchez, M. Batet, and L. Øvrelid, “Anonymisation models for text data: State of the art, challenges and future directions,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, Aug. 2021, pp. 4188–4203. [Online]. Available: https://aclanthology.org/2021.acl-long.323
- J. M. Mase, N. Leesakul, G. P. Figueredo, and M. T. Torres, “Facial identity protection using deep learning technologies: An application in affective computing,” AI and Ethics, vol. 3, no. 3, pp. 937–946, 2023.
- I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis, and P. Sarigiannidis, “Evaluating the effect of volatile federated timeseries on modern dnns: Attention over long/short memory,” in 2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST), 2023, pp. 1–6.
- S. Goyal, S. Doddapaneni, M. M. Khapra, and B. Ravindran, “A survey of adversarial defenses and robustness in nlp,” ACM Computing Surveys, vol. 55, no. 14s, pp. 1–39, 2023.
- A. Friebely, “Analyzing the efficacy of microsoft presidio in identifying social security numbers in unstructured text,” Ph.D. dissertation, Utica University, 2022.
- D. C. Asimopoulos, P. Radoglou-Grammatikis, I. Makris, V. Mladenov, K. E. Psannis, S. Goudos, and P. Sarigiannidis, “Breaching the defense: Investigating fgsm and ctgan adversarial attacks on iec 60870-5-104 ai-enabled intrusion detection systems,” in Proceedings of the 18th International Conference on Availability, Reliability and Security. Association for Computing Machinery, 2023, p. 8.
- R. Catelli, V. Casola, G. De Pietro, H. Fujita, and M. Esposito, “Combining contextualized word representation and sub-document level analysis through bi-lstm+ crf architecture for clinical de-identification,” Knowledge-Based Systems, vol. 213, p. 106649, 2021.
- C. Berragan, A. Singleton, A. Calafiore, and J. Morley, “Transformer based named entity recognition for place name extraction from unstructured text,” International Journal of Geographical Information Science, vol. 37, no. 4, pp. 747–766, 2023.
- R. Hanslo, “Deep learning transformer architecture for named-entity recognition on low-resourced languages: State of the art results,” in 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS). IEEE, 2022, pp. 53–60.
- S. Amin and G. Neumann, “T2ner: Transformers based transfer learning framework for named entity recognition,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021, pp. 212–220.
- D. Uliyan, A. S. Aljaloud, A. Alkhalil, H. S. Al Amer, M. A. E. A. Mohamed, and A. F. M. Alogali, “Deep learning model to predict students retention using blstm and crf,” IEEE Access, vol. 9, pp. 135 550–135 558, 2021.
- I. Pilán, P. Lison, L. Øvrelid, A. Papadopoulou, D. Sánchez, and M. Batet, “The text anonymization benchmark (tab): A dedicated corpus and evaluation framework for text anonymization,” Computational Linguistics, vol. 48, no. 4, pp. 1053–1101, 2022.
- M. Baigang and F. Yi, “A review: development of named entity recognition (ner) technology for aeronautical information intelligence,” Artificial Intelligence Review, vol. 56, no. 2, pp. 1515–1542, 2023.
- T.-M. Georgescu, B. Iancu, A. Zamfiroiu, M. Doinea, C. E. Boja, and C. Cartas, “A survey on named entity recognition solutions applied for cybersecurity-related text processing,” in Proceedings of Fifth International Congress on Information and Communication Technology: ICICT 2020, London, Volume 2. Springer, 2021, pp. 316–325.
- D. Eke, I. E. Aasebø, S. Akintoye, W. Knight, A. Karakasidis, E. Mikulan, P. Ochang, G. Ogoh, R. Oostenveld, A. Pigorini et al., “Pseudonymisation of neuroimages and data protection: Increasing access to data while retaining scientific utility,” Neuroimage: Reports, vol. 1, no. 4, p. 100053, 2021.
- S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022.