Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards De-identification of Legal Texts (1910.03739v1)

Published 9 Oct 2019 in cs.CL

Abstract: In many countries, personal information that can be published or shared between organizations is regulated and, therefore, documents must undergo a process of de-identification to eliminate or obfuscate confidential data. Our work focuses on the de-identification of legal texts, where the goal is to hide the names of the actors involved in a lawsuit without losing the sense of the story. We present a first evaluation on our corpus of NLP tools in tasks such as segmentation, tokenization and recognition of named entities, and we analyze several evaluation measures for our de-identification task. Results are meager: 84% of the documents have at least one name not covered by NER tools, something that might lead to the re-identification of involved names. We conclude that tools must be strongly adapted for processing texts of this particular domain.

Citations (1)

Summary

We haven't generated a summary for this paper yet.