Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes (2307.05862v2)
Abstract: Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. For example, ecosystem-level analysis in hiring recognizes that a job candidate's outcomes are not only determined by a single hiring algorithm or firm but instead by the collective decisions of all the firms they applied to. Across three modalities (text, images, speech) and 11 datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve at the population level over time, we find these improvements rarely reduce the prevalence of systemic failure. Instead, the benefits of these improvements predominantly accrue to individuals who are already correctly classified by other models. In light of these trends, we consider medical imaging for dermatology where the costs of systemic failure are especially high. While traditional analyses reveal racial performance disparities for both models and humans, ecosystem-level analysis reveals new forms of racial disparity in model predictions that do not present in human predictions. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
- HAPI: A Large-scale Longitudinal Dataset of Commercial ML API Predictions. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24571–24585. Curran Associates, Inc., 2022a. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/9bcd0bdb2777fe8c729b682f07e993f1-Paper-Datasets_and_Benchmarks.pdf.
- Disparities in dermatology ai performance on a diverse, curated clinical image set. Science Advances, 8(32):eabq6147, 2022. doi: 10.1126/sciadv.abq6147. URL https://www.science.org/doi/abs/10.1126/sciadv.abq6147.
- Picking on the same person: Does algorithmic monoculture lead to outcome homogenization? In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=-H6kKm4DVo.
- Alex Engler. Enrollment algorithms are contributing to the crises of higher education. report, The Brookings Institution, September 2021. URL https://www.brookings.edu/research/enrollment-algorithms-are-contributing-to-the-crises-of-higher-education/.
- The algorithmic leviathan: Arbitrariness, fairness, and opportunity in algorithmic decision-making systems. Canadian Journal of Philosophy, 52(1):26–43, 2022. doi: 10.1017/can.2022.3. URL https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3786377.
- Should attention be all we need? the epistemic and ethical implications of unification in machine learning. 2022 ACM Conference on Fairness, Accountability, and Transparency, 2022. URL https://dl.acm.org/doi/fullHtml/10.1145/3531146.3533206.
- Overcoming bias in pretrained models by manipulating the finetuning dataset. arXiv preprint arXiv:2303.06167, 2023. URL https://arxiv.org/abs/2303.06167.
- Algorithmic pluralism: A structural approach towards equal opportunity. ArXiv, abs/2305.08157, 2023. URL https://arxiv.org/abs/2305.08157.
- Big data’s disparate impact. California Law Review, 104:671, 2016. URL https://www.jstor.org/stable/24758720.
- Gender shades: Intersectional accuracy disparities in commercial gender classification. In FAT, 2018. URL https://proceedings.mlr.press/v81/buolamwini18a/buolamwini18a.pdf.
- Racial disparities in automated speech recognition. Proceedings of the National Academy of Sciences of the United States of America, 117:7684 – 7689, 2020. URL https://www.pnas.org/doi/10.1073/pnas.1915768117.
- Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements. Psychological Science in the Public Interest, 20(1):1–68, 2019. doi: 10.1177/1529100619832930. URL https://doi.org/10.1177/1529100619832930. PMID: 31313636.
- Professional actors demonstrate variability, not stereotypical expressions, when portraying emotional states in photographs. Nature Communications, 12(1), August 2021. doi: 10.1038/s41467-021-25352-6. URL https://doi.org/10.1038/s41467-021-25352-6.
- How did the model change? efficiently assessing machine learning API shifts. In International Conference on Learning Representations, 2022b. URL https://openreview.net/forum?id=gFDFKC4gHL4.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021. URL https://arxiv.org/abs/2012.07421.
- Manipulation-proof machine learning, 2020. URL https://arxiv.org/abs/2004.03865.
- Augment intelligence dermatology : Deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. Journal of Investigative Dermatology, 140, 03 2020. doi: 10.1016/j.jid.2020.01.019. URL https://www.sciencedirect.com/science/article/pii/S0022202X20301366?via%3Dihub.
- Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542, 01 2017. doi: 10.1038/nature21056. URL https://www.nature.com/articles/nature21056.
- The ham10000 dataset: A large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific Data, 5, 08 2018. doi: 10.1038/sdata.2018.161. URL https://www.nature.com/articles/sdata2018161.
- Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, 2019. URL https://www.science.org/doi/10.1126/science.aax2342.
- Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nature medicine, 27(12):2176–2182, 2021. URL https://www.nature.com/articles/s41591-021-01595-0.
- Distribution of race and fitzpatrick skin types in data sets for deep learning in dermatology: A systematic review. Journal of the American Academy of Dermatology, 87(2):460–461, August 2022. doi: 10.1016/j.jaad.2021.10.010. URL https://doi.org/10.1016/j.jaad.2021.10.010.
- Patient race or ethnicity and the use of diagnostic imaging: A systematic review. Journal of the American College of Radiology, 19(4):521–528, April 2022. doi: 10.1016/j.jacr.2022.01.008. URL https://doi.org/10.1016/j.jacr.2022.01.008.
- Hidden in plain sight — reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine, 383(9):874–882, August 2020. doi: 10.1056/nejmms2004740. URL https://doi.org/10.1056/nejmms2004740.
- Racial bias in health care and health. JAMA, 314(6):555, August 2015. doi: 10.1001/jama.2015.9260. URL https://doi.org/10.1001/jama.2015.9260.
- Racial and ethnic disparities in the quality of health care. Annual Review of Public Health, 37(1):375–394, March 2016. doi: 10.1146/annurev-publhealth-032315-021439. URL https://doi.org/10.1146/annurev-publhealth-032315-021439.
- Explaining and harnessing adversarial examples. arXiv 1412.6572, 12 2014. URL https://arxiv.org/abs/1412.6572.
- Ifeoma Ajunwa. The paradox of automation as anti-bias intervention. Cardozo L. Rev., 41:1671, 2019. URL https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2746078.
- On the opportunities and risks of foundation models. ArXiv, abs/2108.07258, 2021. URL https://crfm.stanford.edu/assets/report.pdf.
- Algorithmic monoculture and social welfare. Proceedings of the National Academy of Sciences, 118(22):e2018340118, 2021. doi: 10.1073/pnas.2018340118. URL https://www.pnas.org/doi/abs/10.1073/pnas.2018340118.
- Ecosystem graphs: The social footprint of foundation models. arXiv preprint arXiv:2303.15772, 2023. URL https://arxiv.org/abs/2303.15772.
- Aleksander Mądry. Advances in ai: Are we ready for a tech revolution? Cybersecurity, Information Technology, and Government Innovation Subcommittee, 2023. URL https://oversight.house.gov/wp-content/uploads/2023/03/madry_written_statement100.pdf.
- Paul Voigt and Axel von dem Bussche. The eu general data protection regulation (gdpr). 2017. URL https://link.springer.com/book/10.1007/978-3-319-57959-7.
- The right to be an exception to a data-driven rule. 2023. URL https://arxiv.org/abs/2212.13995.
- Training deep networks for facial expression recognition with crowd-sourced label distribution. CoRR, abs/1608.01041, 2016. URL http://arxiv.org/abs/1608.01041.
- An investigation of why overparameterization exacerbates spurious correlations. In Hal Daumé III and Aarti Singh, editors, Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pages 8346–8356. PMLR, 13–18 Jul 2020. URL https://proceedings.mlr.press/v119/sagawa20a.html.
- Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2584–2593. IEEE Computer Society, 2017. doi: 10.1109/CVPR.2017.277. URL https://doi.org/10.1109/CVPR.2017.277.
- Affectnet: A database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput., 10(1):18–31, 2019. doi: 10.1109/TAFFC.2017.2740923. URL https://doi.org/10.1109/TAFFC.2017.2740923.
- From facial expression recognition to interpersonal relation prediction. CoRR, abs/1609.06426, 2016. URL http://arxiv.org/abs/1609.06426.
- Speech model pre-training for end-to-end spoken language understanding. In Gernot Kubin and Zdravko Kacic, editors, Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, pages 814–818. ISCA, 2019. doi: 10.21437/Interspeech.2019-2396. URL https://doi.org/10.21437/Interspeech.2019-2396.
- Jakobovski/free-spoken-digit-dataset: v1.0.8, August 2018. URL https://doi.org/10.5281/zenodo.1342401.
- Interpreting and explaining deep neural networks for classification of audio signals. CoRR, abs/1807.03418, 2018. URL http://arxiv.org/abs/1807.03418.
- Learning word vectors for sentiment analysis. In Dekang Lin, Yuji Matsumoto, and Rada Mihalcea, editors, The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, pages 142–150. The Association for Computer Linguistics, 2011. URL https://aclanthology.org/P11-1015/.