Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Domain Generalization Through Memorization: A Study of Nearest Neighbors in Neural Duplicate Question Detection (2011.11090v1)

Published 22 Nov 2020 in cs.CL

Abstract: Duplicate question detection (DQD) is important to increase efficiency of community and automatic question answering systems. Unfortunately, gathering supervised data in a domain is time-consuming and expensive, and our ability to leverage annotations across domains is minimal. In this work, we leverage neural representations and study nearest neighbors for cross-domain generalization in DQD. We first encode question pairs of the source and target domain in a rich representation space and then using a k-nearest neighbour retrieval-based method, we aggregate the neighbors' labels and distances to rank pairs. We observe robust performance of this method in different cross-domain scenarios of StackExchange, Spring and Quora datasets, outperforming cross-entropy classification in multiple cases.

Citations (1)

Summary

We haven't generated a summary for this paper yet.