Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning (2202.13196v2)

Published 26 Feb 2022 in cs.AI

Abstract: Recently, finetuning a pretrained LLM to capture the similarity between sentence embeddings has shown the state-of-the-art performance on the semantic textual similarity (STS) task. However, the absence of an interpretation method for the sentence similarity makes it difficult to explain the model output. In this work, we explicitly describe the sentence distance as the weighted sum of contextualized token distances on the basis of a transportation problem, and then present the optimal transport-based distance measure, named RCMD; it identifies and leverages semantically-aligned token pairs. In the end, we propose CLRCMD, a contrastive learning framework that optimizes RCMD of sentence pairs, which enhances the quality of sentence similarity and their interpretation. Extensive experiments demonstrate that our learning framework outperforms other baselines on both STS and interpretable-STS benchmarks, indicating that it computes effective sentence similarity and also provides interpretation consistent with human judgement. The code and checkpoint are publicly available at https://github.com/sh0416/clrcmd.

Authors (4)

Seonghyeon Lee (14 papers)
Dongha Lee (63 papers)
Seongbo Jang (7 papers)
Hwanjo Yu (57 papers)

Citations (17)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - sh0416/clrcmd: Official Repository for CLRCMD (Appear in ACL2022) (38 stars)

Toward Interpretable Semantic Textual Similarity via Optimal Transport-based Contrastive Sentence Learning (2202.13196v2)

Summary

Related Papers

GitHub