Adaptive Crowdsourcing Via Self-Supervised Learning (2401.13239v2)

Published 24 Jan 2024 in cs.LG and cs.HC

Abstract: Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across crowdworkers or their estimates correlate, the weighted sum offers a more accurate group estimate than the average. Existing algorithms such as expectation maximization can, at least in principle, produce similarly accurate group estimates. However, their computational requirements become onerous when complex models, such as neural networks, are required to express relationships among crowdworkers. Predict-each-worker accommodates such complexity as well as many other practical challenges. We analyze the efficacy of predict-each-worker through theoretical and computational studies. Among other things, we establish asymptotic optimality as the number of engagements per crowdworker grows.

References (44)

Authors (5)

Anmol Kagrecha (7 papers)
Henrik Marklund (9 papers)
Benjamin Van Roy (88 papers)
Hong Jun Jeon (15 papers)
Richard Zeckhauser (2 papers)

Summary

Introduction

The aggregation of crowdworker estimates plays a pivotal role across diverse application domains, from intelligence prediction to the development of AI systems. Traditional approaches hinge upon equal weighting of such inputs, a method known as averaging, which presumes independent and identically skilled crowdworkers. This assumption often does not hold, leading to suboptimal consensus outcomes, particularly when the available crowdworker pool is limited, thus inflating the cost or practicality of engagement. The body of work under discussion presents an innovative methodology, termed 'just-predict-others', which seeks to refine aggregate estimates by capitalizing on self-supervised learning (SSL) to ascertain patterns in crowdworker data.

Self-Supervised Learning Framework

The just-predict-others method introduces a two-phased approach encompassing SSL to uncover inherent patterns in crowdworker estimates. Initially, individual SSL models are crafted to predict each worker's estimate, grounded on the responses from their counterparts. The learned models aim to mirror the clairvoyant conditional probability that a crowdworker’s estimate, given the information from others, aligns with a skill level and independence factor. Unlike traditional expectation maximization (EM) algorithms, just-predict-others can integrate complex dependencies through flexible machine learning architectures, including neural networks, without imposing burdensome computational demands.

Aggregation Mechanism

The aggregation extends beyond mere prediction, integrating the estimated predictability and inter-worker influence into a novel weighting scheme. The weights evolve adaptively, reflecting the deduced skill and autonomy of individual crowdworkers based on their SSL-predicted performance. Under Gaussian data-generating conditions, just-predict-others showcases asymptotic optimality, a finding supported by both theoretical proofs and simulation studies.

Empirical Evaluation

Simulated experiments substantiate the superiority of the just-predict-others algorithm over averaging, exhibiting equivalency to EM in large-sample situations and attaining near-optimal performance asymptotically. It also proves capable of executing more complex models and flexibly adapts to different scenarios, accepting contexts and crowdworker metadata into account.

Extensions and Applications

Just-predict-others embodies both theoretical innovation and practical versatility, making it well-suited for various real-world crowdsourcing tasks. Future research might explore categorical estimate extensions, amalgamate this method with mechanisms that reduce error covariance among crowdworkers, or apply the approach to real-world datasets. The potential to employ SSL within the framework opens avenues to enrich the understanding and utility of crowdsourced data in AI development and beyond.

In conclusion, the just-predict-others approach signifies a significant step forward in the methodology of aggregating crowdworker information. It outperforms traditional methods by harnessing the untapped potential of patterns within crowdworker estimates and wield self-supervised learning models to glean more accurate and reliable aggregated outputs.

PDF Markdown

Related Papers

Tweets

https://twitter.com/fly51fly/status/1750512865197592613