Interpretable and Low-Resource Entity Matching via Decoupling Feature Learning from Decision Making (2106.04174v1)
Abstract: Entity Matching (EM) aims at recognizing entity records that denote the same real-world object. Neural EM models learn vector representation of entity descriptions and match entities end-to-end. Though robust, these methods require many resources for training, and lack of interpretability. In this paper, we propose a novel EM framework that consists of Heterogeneous Information Fusion (HIF) and Key Attribute Tree (KAT) Induction to decouple feature representation from matching decision. Using self-supervised learning and mask mechanism in pre-trained LLMing, HIF learns the embeddings of noisy attribute values by inter-attribute attention with unlabeled data. Using a set of comparison features and a limited amount of annotated data, KAT Induction learns an efficient decision tree that can be interpreted by generating entity matching rules whose structure is advocated by domain experts. Experiments on 6 public datasets and 3 industrial datasets show that our method is highly efficient and outperforms SOTA EM models in most cases. Our codes and datasets can be obtained from https://github.com/THU-KEG/HIF-KAT.
- Zijun Yao (50 papers)
- Chengjiang Li (4 papers)
- Tiansi Dong (5 papers)
- Xin Lv (38 papers)
- Jifan Yu (49 papers)
- Lei Hou (127 papers)
- Juanzi Li (144 papers)
- Yichi Zhang (184 papers)
- Zelin Dai (6 papers)