Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Latte-Mix: Measuring Sentence Semantic Similarity with Latent Categorical Mixtures (2010.11351v1)

Published 21 Oct 2020 in cs.CL

Abstract: Measuring sentence semantic similarity using pre-trained LLMs such as BERT generally yields unsatisfactory zero-shot performance, and one main reason is ineffective token aggregation methods such as mean pooling. In this paper, we demonstrate under a Bayesian framework that distance between primitive statistics such as the mean of word embeddings are fundamentally flawed for capturing sentence-level semantic similarity. To remedy this issue, we propose to learn a categorical variational autoencoder (VAE) based on off-the-shelf pre-trained LLMs. We theoretically prove that measuring the distance between the latent categorical mixtures, namely Latte-Mix, can better reflect the true sentence semantic similarity. In addition, our Bayesian framework provides explanations for why models finetuned on labelled sentence pairs have better zero-shot performance. We also empirically demonstrate that these finetuned models could be further improved by Latte-Mix. Our method not only yields the state-of-the-art zero-shot performance on semantic similarity datasets such as STS, but also enjoy the benefits of fast training and having small memory footprints.

Summary

We haven't generated a summary for this paper yet.