Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Performance Bounds for Graphical Record Linkage (1703.02679v1)

Published 8 Mar 2017 in math.ST, cs.IT, math.IT, stat.ME, stat.ML, and stat.TH

Abstract: Record linkage involves merging records in large, noisy databases to remove duplicate entities. It has become an important area because of its widespread occurrence in bibliometrics, public health, official statistics production, political science, and beyond. Traditional linkage methods directly linking records to one another are computationally infeasible as the number of records grows. As a result, it is increasingly common for researchers to treat record linkage as a clustering task, in which each latent entity is associated with one or more noisy database records. We critically assess performance bounds using the Kullback-Leibler (KL) divergence under a Bayesian record linkage framework, making connections to Kolchin partition models. We provide an upper bound using the KL divergence and a lower bound on the minimum probability of misclassifying a latent entity. We give insights for when our bounds hold using simulated data and provide practical user guidance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Rebecca C. Steorts (28 papers)
  2. Matt Barnes (12 papers)
  3. Willie Neiswanger (68 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.