Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CollaborER: A Self-supervised Entity Resolution Framework Using Multi-features Collaboration (2108.08090v2)

Published 18 Aug 2021 in cs.DB

Abstract: Entity Resolution (ER) aims to identify whether two tuples refer to the same real-world entity and is well-known to be labor-intensive. It is a prerequisite to anomaly detection, as comparing the attribute values of two matched tuples from two different datasets provides one effective way to detect anomalies. Existing ER approaches, due to insufficient feature discovery or error-prone inherent characteristics, are not able to achieve stable performance. In this paper, we present CollaborER, a self-supervised entity resolution framework via multi-features collaboration. It is capable of (i) obtaining reliable ER results with zero human annotations and (ii) discovering adequate tuples' features in a fault-tolerant manner. CollaborER consists of two phases, i.e., automatic label generation (ALG) and collaborative ER training (CERT). In the first phase, ALG is proposed to generate a set of positive tuple pairs and a set of negative tuple pairs. ALG guarantees the high quality of the generated tuples and hence ensures the training quality of the subsequent CERT. In the second phase, CERT is introduced to learn the matching signals by discovering graph features and sentence features of tuples collaboratively. Extensive experimental results over eight real-world ER benchmarks show that CollaborER outperforms all the existing unsupervised ER approaches and is comparable or even superior to the state-of-the-art supervised ER methods.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Congcong Ge (4 papers)
  2. Pengfei Wang (176 papers)
  3. Lu Chen (245 papers)
  4. Xiaoze Liu (22 papers)
  5. Baihua Zheng (22 papers)
  6. Yunjun Gao (67 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.