2000 character limit reached
CJaFr-v3 : A Freely Available Filtered Japanese-French Aligned Corpus (2208.13170v1)
Published 28 Aug 2022 in cs.CL
Abstract: We present a free Japanese-French parallel corpus. It includes 15M aligned segments and is obtained by compiling and filtering several existing resources. In this paper, we describe the existing resources, their quantity and quality, the filtering we applied to improve the quality of the corpus, and the content of the ready-to-use corpus. We also evaluate the usefulness of this corpus and the quality of our filtering by training and evaluating some standard MT systems with it.
Collections
Sign up for free to add this paper to one or more collections.