2000 character limit reached
Multilingual Normalization of Temporal Expressions with Masked Language Models (2205.10399v2)
Published 20 May 2022 in cs.CL and cs.LG
Abstract: The detection and normalization of temporal expressions is an important task and preprocessing step for many applications. However, prior work on normalization is rule-based, which severely limits the applicability in real-world multilingual settings, due to the costly creation of new rules. We propose a novel neural method for normalizing temporal expressions based on masked LLMing. Our multilingual method outperforms prior rule-based systems in many languages, and in particular, for low-resource languages with performance improvements of up to 33 F1 on average compared to the state of the art.