Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making (2106.04174v1)

Published 8 Jun 2021 in cs.CL and cs.AI

Abstract: Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods require many resources for training, and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained LLMing, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on 6 public datasets and 3 industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases. Our codes and datasets can be obtained from https://github.com/THU-KEG/HIF-KAT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Zijun Yao (50 papers)
  2. Chengjiang Li (4 papers)
  3. Tiansi Dong (5 papers)
  4. Xin Lv (38 papers)
  5. Jifan Yu (49 papers)
  6. Lei Hou (127 papers)
  7. Juanzi Li (144 papers)
  8. Yichi Zhang (184 papers)
  9. Zelin Dai (6 papers)
Citations (9)
Github Logo Streamline Icon: https://streamlinehq.com

GitHub