Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness (1606.09632v3)

Published 30 Jun 2016 in cs.LG, cs.AI, cs.IT, math.IT, and stat.ML

Abstract: The task of aggregating and denoising crowd-labeled data has gained increased significance with the advent of crowdsourcing platforms and massive datasets. We propose a permutation-based model for crowd labeled data that is a significant generalization of the classical Dawid-Skene model, and introduce a new error metric by which to compare different estimators. We derive global minimax rates for the permutation-based model that are sharp up to logarithmic factors, and match the minimax lower bounds derived under the simpler Dawid-Skene model. We then design two computationally-efficient estimators: the WAN estimator for the setting where the ordering of workers in terms of their abilities is approximately known, and the OBI-WAN estimator where that is not known. For each of these estimators, we provide non-asymptotic bounds on their performance. We conduct synthetic simulations and experiments on real-world crowdsourcing data, and the experimental results corroborate our theoretical findings.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Nihar B. Shah (73 papers)
  2. Sivaraman Balakrishnan (80 papers)
  3. Martin J. Wainwright (141 papers)
Citations (44)

Summary

We haven't generated a summary for this paper yet.