Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Collective Human Opinions in Semantic Textual Similarity (2308.04114v1)

Published 8 Aug 2023 in cs.CL

Abstract: Despite the subjective nature of semantic textual similarity (STS) and pervasive disagreements in STS annotation, existing benchmarks have used averaged human ratings as the gold standard. Averaging masks the true distribution of human opinions on examples of low agreement, and prevents models from capturing the semantic vagueness that the individual ratings represent. In this work, we introduce USTS, the first Uncertainty-aware STS dataset with ~15,000 Chinese sentence pairs and 150,000 labels, to study collective human opinions in STS. Analysis reveals that neither a scalar nor a single Gaussian fits a set of observed judgements adequately. We further show that current STS models cannot capture the variance caused by human disagreement on individual instances, but rather reflect the predictive confidence over the aggregate dataset.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Yuxia Wang (41 papers)
  2. Shimin Tao (31 papers)
  3. Ning Xie (57 papers)
  4. Hao Yang (328 papers)
  5. Timothy Baldwin (125 papers)
  6. Karin Verspoor (34 papers)
Citations (3)