A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness (1606.09632v3)

Published 30 Jun 2016 in cs.LG, cs.AI, cs.IT, math.IT, and stat.ML

Abstract: The task of aggregating and denoising crowd-labeled data has gained increased significance with the advent of crowdsourcing platforms and massive datasets. We propose a permutation-based model for crowd labeled data that is a significant generalization of the classical Dawid-Skene model, and introduce a new error metric by which to compare different estimators. We derive global minimax rates for the permutation-based model that are sharp up to logarithmic factors, and match the minimax lower bounds derived under the simpler Dawid-Skene model. We then design two computationally-efficient estimators: the WAN estimator for the setting where the ordering of workers in terms of their abilities is approximately known, and the OBI-WAN estimator where that is not known. For each of these estimators, we provide non-asymptotic bounds on their performance. We conduct synthetic simulations and experiments on real-world crowdsourcing data, and the experimental results corroborate our theoretical findings.

Authors (3)

Nihar B. Shah (73 papers)
Sivaraman Balakrishnan (80 papers)
Martin J. Wainwright (141 papers)

Citations (44)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness (1606.09632v3)

Summary

Related Papers