Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluating the Efficacy of AI Techniques in Textual Anonymization: A Comparative Study (2405.06709v1)

Published 9 May 2024 in cs.CL and cs.AI

Abstract: In the digital era, with escalating privacy concerns, it's imperative to devise robust strategies that protect private data while maintaining the intrinsic value of textual information. This research embarks on a comprehensive examination of text anonymisation methods, focusing on Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), Embeddings from LLMs (ELMo), and the transformative capabilities of the Transformers architecture. Each model presents unique strengths since LSTM is modeling long-term dependencies, CRF captures dependencies among word sequences, ELMo delivers contextual word representations using deep bidirectional LLMs and Transformers introduce self-attention mechanisms that provide enhanced scalability. Our study is positioned as a comparative analysis of these models, emphasising their synergistic potential in addressing text anonymisation challenges. Preliminary results indicate that CRF, LSTM, and ELMo individually outperform traditional methods. The inclusion of Transformers, when compared alongside with the other models, offers a broader perspective on achieving optimal text anonymisation in contemporary settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)
  1. B. Payne, “Privacy protection with ai: Survey of data-anonymization techniques,” 2020.
  2. P. Lison, I. Pilán, D. Sanchez, M. Batet, and L. Øvrelid, “Anonymisation models for text data: State of the art, challenges and future directions,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers).   Online: Association for Computational Linguistics, Aug. 2021, pp. 4188–4203. [Online]. Available: https://aclanthology.org/2021.acl-long.323
  3. J. M. Mase, N. Leesakul, G. P. Figueredo, and M. T. Torres, “Facial identity protection using deep learning technologies: An application in affective computing,” AI and Ethics, vol. 3, no. 3, pp. 937–946, 2023.
  4. I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis, and P. Sarigiannidis, “Evaluating the effect of volatile federated timeseries on modern dnns: Attention over long/short memory,” in 2023 12th International Conference on Modern Circuits and Systems Technologies (MOCAST), 2023, pp. 1–6.
  5. S. Goyal, S. Doddapaneni, M. M. Khapra, and B. Ravindran, “A survey of adversarial defenses and robustness in nlp,” ACM Computing Surveys, vol. 55, no. 14s, pp. 1–39, 2023.
  6. A. Friebely, “Analyzing the efficacy of microsoft presidio in identifying social security numbers in unstructured text,” Ph.D. dissertation, Utica University, 2022.
  7. D. C. Asimopoulos, P. Radoglou-Grammatikis, I. Makris, V. Mladenov, K. E. Psannis, S. Goudos, and P. Sarigiannidis, “Breaching the defense: Investigating fgsm and ctgan adversarial attacks on iec 60870-5-104 ai-enabled intrusion detection systems,” in Proceedings of the 18th International Conference on Availability, Reliability and Security.   Association for Computing Machinery, 2023, p. 8.
  8. R. Catelli, V. Casola, G. De Pietro, H. Fujita, and M. Esposito, “Combining contextualized word representation and sub-document level analysis through bi-lstm+ crf architecture for clinical de-identification,” Knowledge-Based Systems, vol. 213, p. 106649, 2021.
  9. C. Berragan, A. Singleton, A. Calafiore, and J. Morley, “Transformer based named entity recognition for place name extraction from unstructured text,” International Journal of Geographical Information Science, vol. 37, no. 4, pp. 747–766, 2023.
  10. R. Hanslo, “Deep learning transformer architecture for named-entity recognition on low-resourced languages: State of the art results,” in 2022 17th Conference on Computer Science and Intelligence Systems (FedCSIS).   IEEE, 2022, pp. 53–60.
  11. S. Amin and G. Neumann, “T2ner: Transformers based transfer learning framework for named entity recognition,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 2021, pp. 212–220.
  12. D. Uliyan, A. S. Aljaloud, A. Alkhalil, H. S. Al Amer, M. A. E. A. Mohamed, and A. F. M. Alogali, “Deep learning model to predict students retention using blstm and crf,” IEEE Access, vol. 9, pp. 135 550–135 558, 2021.
  13. I. Pilán, P. Lison, L. Øvrelid, A. Papadopoulou, D. Sánchez, and M. Batet, “The text anonymization benchmark (tab): A dedicated corpus and evaluation framework for text anonymization,” Computational Linguistics, vol. 48, no. 4, pp. 1053–1101, 2022.
  14. M. Baigang and F. Yi, “A review: development of named entity recognition (ner) technology for aeronautical information intelligence,” Artificial Intelligence Review, vol. 56, no. 2, pp. 1515–1542, 2023.
  15. T.-M. Georgescu, B. Iancu, A. Zamfiroiu, M. Doinea, C. E. Boja, and C. Cartas, “A survey on named entity recognition solutions applied for cybersecurity-related text processing,” in Proceedings of Fifth International Congress on Information and Communication Technology: ICICT 2020, London, Volume 2.   Springer, 2021, pp. 316–325.
  16. D. Eke, I. E. Aasebø, S. Akintoye, W. Knight, A. Karakasidis, E. Mikulan, P. Ochang, G. Ogoh, R. Oostenveld, A. Pigorini et al., “Pseudonymisation of neuroimages and data protection: Increasing access to data while retaining scientific utility,” Neuroimage: Reports, vol. 1, no. 4, p. 100053, 2021.
  17. S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan, and M. Shah, “Transformers in vision: A survey,” ACM computing surveys (CSUR), vol. 54, no. 10s, pp. 1–41, 2022.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets