Adaptive Crowdsourcing Via Self-Supervised Learning (2401.13239v2)
Abstract: Common crowdsourcing systems average estimates of a latent quantity of interest provided by many crowdworkers to produce a group estimate. We develop a new approach -- predict-each-worker -- that leverages self-supervised learning and a novel aggregation scheme. This approach adapts weights assigned to crowdworkers based on estimates they provided for previous quantities. When skills vary across crowdworkers or their estimates correlate, the weighted sum offers a more accurate group estimate than the average. Existing algorithms such as expectation maximization can, at least in principle, produce similarly accurate group estimates. However, their computational requirements become onerous when complex models, such as neural networks, are required to express relationships among crowdworkers. Predict-each-worker accommodates such complexity as well as many other practical challenges. We analyze the efficacy of predict-each-worker through theoretical and computational studies. Among other things, we establish asymptotic optimality as the number of engagements per crowdworker grows.
- Superforecasting: The art and science of prediction. Random House, 2016.
- A solution to the single-question crowd wisdom problem. Nature, 541(7638):532–535, 2017.
- Harnessing the wisdom of crowds. Management Science, 66(5):1847–1867, 2020.
- Jennifer Wortman Vaughan. Making better use of the crowd: How crowdsourcing can advance machine learning research. J. Mach. Learn. Res., 18(1):7026–7071, 2017.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2:13, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Google. Bard, 2023. https://bard.google.com/, 2023.
- Anthropic. Introducing claude. https://www.anthropic.com/index/introducing-claude, 2023.
- Learning from crowdsourced labeled data: a survey. Artificial Intelligence Review, 46:543–576, 2016a.
- Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment, 10(5):541–552, 2017.
- Self-supervised learning: Generative or contrastive. IEEE transactions on knowledge and data engineering, 35(1):857–876, 2021.
- Intuitions about combining opinions: Misappreciation of the averaging principle. Management science, 52(1):111–127, 2006.
- Short text similarity with word embeddings. In Proceedings of the 24th ACM international on conference on information and knowledge management, pages 1411–1420, 2015.
- Mourad Mars. From word embeddings to pre-trained language models: A state-of-the-art walkthrough. Applied Sciences, 12(17):8805, 2022.
- Machine learning with crowdsourcing: A brief summary of the past research and future directions. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 9837–9843, 2019.
- A survey on programmatic weak supervision. arXiv preprint arXiv:2202.05433, 2022.
- A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, 8(4):425–436, 2014.
- Modeling annotator behaviors for crowd labeling. Neurocomputing, 160:141–156, 2015.
- Variational bayesian inference for crowdsourcing predictions. In 2020 59th IEEE Conference on Decision and Control (CDC), pages 3166–3172. IEEE, 2020.
- Maximum likelihood estimation of observer error-rates using the em algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1):20–28, 1979.
- Spectral methods meet em: A provably optimal algorithm for crowdsourcing. The Journal of Machine Learning Research, 17(1):3537–3580, 2016b.
- Bayesian nonparametric crowdsourcing. Journal of Machine Learning Research, 2015.
- Learning the structure of generative models without labeled data. In International Conference on Machine Learning, pages 273–282. PMLR, 2017.
- Exploiting worker correlation for label aggregation in crowdsourcing. In International conference on machine learning, pages 3886–3895. PMLR, 2019.
- Learning from crowds. Journal of machine learning research, 11(4), 2010.
- Deep learning from crowds. In Proceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Crowdsourcing aggregation with deep Bayesian learning. Science China Information Sciences, 64:1–11, 2021.
- Structured probabilistic end-to-end learning from crowds. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 1512–1518, 2021.
- How many labelers do you have? a closer look at gold-standard labels. arXiv preprint arXiv:2206.12041, 2022.
- Semi-supervised consensus labeling for crowdsourcing. In SIGIR 2011 workshop on crowdsourcing for information retrieval (CIR), pages 1–6, 2011.
- Semi-supervised learning from crowds using deep generative models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Training deep neural nets to aggregate crowdsourced responses. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence. AUAI Press, volume 242251, 2016.
- Identifying expertise to extract the wisdom of crowds. Management science, 61(2):267–280, 2015.
- A neglected dimension of good forecasting judgment: The questions we choose also matter. International Journal of Forecasting, 33(4):817–832, 2017.
- Bias, information, noise: The bin model of forecasting. Management Science, 67(12):7599–7618, 2021.
- Extracting the wisdom of crowds when information is shared. Management Science, 65(5):2291–2309, 2019.
- Boosting the wisdom of crowds within a single judgment problem: Weighted averaging based on peer predictions. Management Science, 2023.
- Vincenz Frey and Arnout Van de Rijt. Social influence undermines the wisdom of the crowd in sequential decision making. Management science, 67(7):4273–4286, 2021.
- Modeling and aggregation of complex annotations via annotation distances. In Proceedings of The Web Conference 2020, pages 1807–1818, 2020.
- Markov chains and Monte Carlo methods. African Institute for Mathematical Sciences, 2010.
- Ming Yuan. High dimensional inverse covariance matrix estimation via linear programming. The Journal of Machine Learning Research, 11:2261–2286, 2010.
- Clarence CY Kwan. A regression-based interpretation of the inverse of the sample covariance matrix. Spreadsheets in Education, 7(1), 2014.
- Jean H Gallier. Notes on the Schur complement. 2010.
- Anmol Kagrecha (7 papers)
- Henrik Marklund (9 papers)
- Benjamin Van Roy (88 papers)
- Hong Jun Jeon (15 papers)
- Richard Zeckhauser (2 papers)