Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diversified Ensembling: An Experiment in Crowdsourced Machine Learning (2402.10795v1)

Published 16 Feb 2024 in cs.LG, cs.CY, and cs.HC

Abstract: Crowdsourced machine learning on competition platforms such as Kaggle is a popular and often effective method for generating accurate models. Typically, teams vie for the most accurate model, as measured by overall error on a holdout set, and it is common towards the end of such competitions for teams at the top of the leaderboard to ensemble or average their models outside the platform mechanism to get the final, best global model. In arXiv:2201.10408, the authors developed an alternative crowdsourcing framework in the context of fair machine learning, in order to integrate community feedback into models when subgroup unfairness is present and identifiable. There, unlike in classical crowdsourced ML, participants deliberately specialize their efforts by working on subproblems, such as demographic subgroups in the service of fairness. Here, we take a broader perspective on this work: we note that within this framework, participants may both specialize in the service of fairness and simply to cater to their particular expertise (e.g., focusing on identifying bird species in an image classification task). Unlike traditional crowdsourcing, this allows for the diversification of participants' efforts and may provide a participation mechanism to a larger range of individuals (e.g. a machine learning novice who has insight into a specific fairness concern). We present the first medium-scale experimental evaluation of this framework, with 46 participating teams attempting to generate models to predict income from American Community Survey data. We provide an empirical analysis of teams' approaches, and discuss the novel system architecture we developed. From here, we give concrete guidance for how best to deploy such a framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Benetech - Making Graphs Accessible. https://kaggle.com/competitions/benetech-making-graphs-accessible
  2. Avrim Blum and Moritz Hardt. 2015. The Ladder: A Reliable Leaderboard for Machine Learning Competitions. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 1006–1014. https://proceedings.mlr.press/v37/blum15.html
  3. ICR - Identifying Age-Related Conditions. https://kaggle.com/competitions/icr-identify-age-related-conditions
  4. US Census. October 20, 2022. 2021 ACS PUMS Data Dictionary. Data Dictionary. US Census. https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2021.pdf
  5. Democratizing data science. In Proceedings of the KDD 2014 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA. 24–27.
  6. Google - American Sign Language Fingerspelling Recognition. https://kaggle.com/competitions/asl-fingerspelling
  7. Image Matching Challenge 2023. https://kaggle.com/competitions/image-matching-challenge-2023
  8. Rumman Chowdhury and Jutta Williams. 2021. Introducing Twitter’s First Algorithmic Bias Bounty Challenge. https://blog.twitter.com/engineering/en_us/topics/insights/2021/algorithmic-bias-bounty-challenge
  9. Minimax group fairness: Algorithms and experiments. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 66–76.
  10. Retiring Adult: New Datasets for Fair Machine Learning. Advances in Neural Information Processing Systems 34 (2021).
  11. Predict Student Performance from Game Play. https://kaggle.com/competitions/predict-student-performance-from-game-play
  12. An algorithmic framework for bias bounties. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 1106–1124.
  13. GoDaddy - Microbusiness Density Forecasting. https://kaggle.com/competitions/godaddy-microbusiness-density-forecasting
  14. HuBMAP - Hacking the Human Vasculature. https://kaggle.com/competitions/hubmap-hacking-the-human-vasculature
  15. Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data (TKDD) 6, 4 (2012), 1–21.
  16. Using the Open Meta Kaggle Dataset to Evaluate Tripartite Recommendations in Data Markets. arXiv preprint arXiv:1908.04017 (2019).
  17. Vesuvius Challenge - Ink Detection. https://kaggle.com/competitions/vesuvius-challenge-ink-detection
  18. Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning. PMLR, 6755–6764.
  19. Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge. In The 2011 International Joint Conference on Neural Networks. 1825–1834. https://doi.org/10.1109/IJCNN.2011.6033446
  20. Google Research - Identify Contrails to Reduce Global Warming. https://kaggle.com/competitions/google-research-identify-contrails-reduce-global-warming
  21. Amit Elazari Bar On. 2018. We Need Bug Bounties for Bad Algorithms. https://www.vice.com/en/article/8xkyj3/we-need-bug-bounties-for-bad-algorithms
  22. A meta-analysis of overfitting in machine learning. Advances in Neural Information Processing Systems 32 (2019).
  23. Austin Carson Sven Cattell, Rumman Chowdhury. 2023. https://aivillage.org/generative%20red%20team/generative-red-team/
  24. Christopher J Tosh and Daniel Hsu. 2022. Simple and near-optimal algorithms for hidden stratification and multi-group learning. In International Conference on Machine Learning. PMLR, 21633–21657.
  25. 2023 Kaggle AI Report. https://kaggle.com/competitions/2023-kaggle-ai-report

Summary

  • The paper shows that an ensembled global model outperforms individual submissions by integrating specialized subgroup improvements.
  • It highlights how teams used domain expertise to focus on specific model weaknesses, enhancing fairness and performance.
  • The study introduces a novel algorithmic framework and system design that democratizes model development and guides future ML competitions.

Diversifying Approaches in Crowdsourced Machine Learning through Specialization and Ensembling

Introduction

The proliferation of machine learning competitions on platforms like Kaggle represents a significant shift in how predictive models are developed, often harnessing the collective expertise of a global community. However, the traditional competitive framework, where the singular best model wins, may not optimally utilize the diverse strengths of all participants. A paper titled "Diversified Ensembling: An Experiment in Crowdsourced Machine Learning" examines an alternative crowdsourcing model that encourages participants to specialize and focuses on ensemble techniques to integrate these specialized models into a cohesive and potentially superior global model. This approach not only democratizes participation by leveraging unique insights from a broader spectrum of individuals but also addresses fairness concerns by enabling competitors to target subgroup improvements directly.

Experimental Setup and Key Findings

The paper outlines an empirical evaluation of this novel crowdsourcing framework through a medium-scale experiment involving 46 teams attempting to predict income based on demographic data. The competition diverged from the standard model by allowing teams to submit model-group pairs, with the idea that these pairs could be ensembled into a global model that benefits from specialized local improvements.

The key outcomes underscore the potential of this diversified ensemble approach:

  • Performance Gains: The ensembled global model outperformed all individual models, suggesting that the whole is greater than the sum of its parts in machine learning competitions.
  • Specialization Benefits: Teams specialized using both algorithmic methods and domain insight, addressing specific subgroups where they suspected or identified model underperformance.
  • Quality Insights Through Diversity: By diversifying the types of improvements sought (across subgroups), the competition unearthed a richer set of insights into the model performance, particularly regarding fairness.

Theoretical and System Contributions

Beyond the empirical findings, this work offers several contributions to the field of machine learning and crowdsourcing:

  • Algorithmic Framework: It validates the theoretical algorithmic framework put forth by previous research, emphasizing the importance of specialization and ensemble learning in improving model performance and fairness.
  • System Design: A novel system architecture capable of handling this new competition format is described, providing a blueprint for future implementations.
  • Practical Guidance: The paper offers concrete guidance for deploying such a framework effectively, informing both academic experiments and industry applications.

Implications and Future Directions

This research has notable implications for both the theory and practice of machine learning:

  • Broader Participation: By enabling specialization, this framework can engage a wider community in model development, including individuals with unique domain knowledge but perhaps limited machine learning expertise.
  • Focus on Fairness: The ability to target subgroup improvements directly provides a structured way to address fairness, a growing concern in AI applications.
  • Methodological Innovation: The approach encourages methodological innovation, as participants experiment with both manual and automated techniques to identify subgroups and improve model performance in these areas.

Looking ahead, the paper also highlights areas for future exploration, including adapting the framework for more complex prediction tasks and further refining the system design for scalability and security.

Conclusion

The "Diversified Ensembling" experiment provides a compelling case for rethinking how machine learning competitions are structured, offering a more inclusive and potentially more effective model for crowdsourced AI development. By harnessing the collective strengths of a diverse set of participants and emphasizing fairness through direct subgroup targeting, this approach represents a meaningful step towards democratizing machine learning and addressing some of its most pressing ethical challenges.