Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making (2106.04174v1)

Published 8 Jun 2021 in cs.CL and cs.AI

Abstract: Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods require many resources for training, and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained LLMing, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on 6 public datasets and 3 industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases. Our codes and datasets can be obtained from https://github.com/THU-KEG/HIF-KAT.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (9)

Zijun Yao (50 papers)
Chengjiang Li (4 papers)
Tiansi Dong (5 papers)
Xin Lv (38 papers)
Jifan Yu (49 papers)
Lei Hou (127 papers)
Juanzi Li (144 papers)
Yichi Zhang (184 papers)
Zelin Dai (6 papers)

Citations (9)

View on Semantic Scholar

GitHub

GitHub - THU-KEG/HIF-KAT (5 stars)

Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making (2106.04174v1)

Related Papers

GitHub