Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution (1602.02334v3)

Published 7 Feb 2016 in cs.DB, cs.AI, and cs.LG

Abstract: Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called "matching dependencies" (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this work we show the process and the benefits of integrating four components of ER: (a) Building a classifier for duplicate/non-duplicate record pairs built using ML techniques; (b) Use of MDs for supporting the blocking phase of ML; (c) Record merging on the basis of the classifier results; and (d) The use of the declarative language "LogiQL" -an extended form of Datalog supported by the "LogicBlox" platform- for all activities related to data processing, and the specification and enforcement of MDs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Zeinab Bahmani (3 papers)
  2. Leopoldo Bertossi (57 papers)
  3. Nikolaos Vasiloglou (12 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.