Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Towards Interpretable and Learnable Risk Analysis for Entity Resolution (1912.02947v1)

Published 6 Dec 2019 in cs.DB and cs.LG

Abstract: Machine-learning-based entity resolution has been widely studied. However, some entity pairs may be mislabeled by machine learning models and existing studies do not study the risk analysis problem -- predicting and interpreting which entity pairs are mislabeled. In this paper, we propose an interpretable and learnable framework for risk analysis, which aims to rank the labeled pairs based on their risks of being mislabeled. We first describe how to automatically generate interpretable risk features, and then present a learnable risk model and its training technique. Finally, we empirically evaluate the performance of the proposed approach on real data. Our extensive experiments have shown that the learning risk model can identify the mislabeled pairs with considerably higher accuracy than the existing alternatives.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Zhaoqiang Chen (7 papers)
  2. Qun Chen (28 papers)
  3. Boyi Hou (5 papers)
  4. Tianyi Duan (4 papers)
  5. Zhanhuai Li (9 papers)
  6. Guoliang Li (126 papers)
Citations (20)