Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MultiEM: Efficient and Effective Unsupervised Multi-Table Entity Matching (2308.01927v1)

Published 2 Aug 2023 in cs.DB, cs.CL, and cs.IR

Abstract: Entity Matching (EM), which aims to identify all entity pairs referring to the same real-world entity from relational tables, is one of the most important tasks in real-world data management systems. Due to the labeling process of EM being extremely labor-intensive, unsupervised EM is more applicable than supervised EM in practical scenarios. Traditional unsupervised EM assumes that all entities come from two tables; however, it is more common to match entities from multiple tables in practical applications, that is, multi-table entity matching (multi-table EM). Unfortunately, effective and efficient unsupervised multi-table EM remains under-explored. To fill this gap, this paper formally studies the problem of unsupervised multi-table entity matching and proposes an effective and efficient solution, termed as MultiEM. MultiEM is a parallelable pipeline of enhanced entity representation, table-wise hierarchical merging, and density-based pruning. Extensive experimental results on six real-world benchmark datasets demonstrate the superiority of MultiEM in terms of effectiveness and efficiency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xiaocan Zeng (2 papers)
  2. Pengfei Wang (176 papers)
  3. Yuren Mao (17 papers)
  4. Lu Chen (245 papers)
  5. Xiaoze Liu (22 papers)
  6. Yunjun Gao (67 papers)
Citations (1)