Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Metric Learning in Multilingual Sentence Similarity Measurement for Document Alignment (2108.09495v1)

Published 21 Aug 2021 in cs.CL

Abstract: Document alignment techniques based on multilingual sentence representations have recently shown state of the art results. However, these techniques rely on unsupervised distance measurement techniques, which cannot be fined-tuned to the task at hand. In this paper, instead of these unsupervised distance measurement techniques, we employ Metric Learning to derive task-specific distance measurements. These measurements are supervised, meaning that the distance measurement metric is trained using a parallel dataset. Using a dataset belonging to English, Sinhala, and Tamil, which belong to three different language families, we show that these task-specific supervised distance learning metrics outperform their unsupervised counterparts, for document alignment.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Charith Rajitha (2 papers)
  2. Lakmali Piyarathne (1 paper)
  3. Dilan Sachintha (2 papers)
  4. Surangika Ranathunga (34 papers)
Citations (3)