Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Discovering Matching Dependencies (0903.3317v2)

Published 19 Mar 2009 in cs.DB

Abstract: The concept of matching dependencies (mds) is recently pro- posed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), mds can also be applied to various data quality applications such as violation detection. In this paper, we study the problem of discovering matching dependencies from a given database instance. First, we formally define the measures, support and confidence, for evaluating utility of mds in the given database instance. Then, we study the discovery of mds with certain utility requirements of support and confidence. Exact algorithms are developed, together with pruning strategies to improve the time performance. Since the exact algorithm has to traverse all the data during the computation, we propose an approximate solution which only use some of the data. A bound of relative errors introduced by the approximation is also developed. Finally, our experimental evaluation demonstrates the efficiency of the proposed methods.

Citations (56)

Summary

We haven't generated a summary for this paper yet.