Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment (2108.09495v1)
Abstract: Document alignment techniques based on multilingual sentence representations have recently shown state of the art results. However, these techniques rely on unsupervised distance measurement techniques, which cannot be fined-tuned to the task at hand. In this paper, instead of these unsupervised distance measurement techniques, we employ Metric Learning to derive task-specific distance measurements. These measurements are supervised, meaning that the distance measurement metric is trained using a parallel dataset. Using a dataset belonging to English, Sinhala, and Tamil, which belong to three different language families, we show that these task-specific supervised distance learning metrics outperform their unsupervised counterparts, for document alignment.
- Charith Rajitha (2 papers)
- Lakmali Piyarathne (1 paper)
- Dilan Sachintha (2 papers)
- Surangika Ranathunga (34 papers)