REFORMS: Reporting Standards for Machine Learning Based Science (2308.07832v2)
Abstract: Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.
- Machine learning methods that economists should know about. Annual Review of Economics, 11(1):685–725, 2019. _eprint: https://doi.org/10.1146/annurev-economics-080217-053433.
- Supervised machine learning for population genetics: A new paradigm. Trends in Genetics, 34(4):301–312, April 2018.
- Applications of machine learning in animal behaviour studies. Animal Behaviour, 124:203–220, February 2017.
- Machine learning, statistical learning and the future of biological research in psychiatry. Psychological Medicine, 46(12):2455–2465, September 2016. Publisher: Cambridge University Press.
- Big data methods: Leveraging modern data analytic techniques to build organizational science. Organizational Research Methods, 21(3):525–547, July 2018. Publisher: SAGE Publications Inc.
- Choosing prediction Over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6):1100–1122, November 2017. Publisher: SAGE Publications Inc.
- Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24(1):395–419, 2021. _eprint: https://doi.org/10.1146/annurev-polisci-053119-015921.
- Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. Science Advances, 8(42):eabk1942, October 2022.
- Machine learning in epidemiology and health outcomes research. Annual Review of Public Health, 41(1):21–36, April 2020.
- Hal R. Varian. Big data: New tricks for econometrics. Journal of Economic Perspectives, 28(2):3–28, May 2014.
- Machine learning: An applied econometric approach. Journal of Economic Perspectives, 31(2):87–106, May 2017.
- Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199–231, August 2001. Publisher: Institute of Mathematical Statistics.
- The worst of both worlds: A comparative analysis of errors in learning from data in psychology and machine learning. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–348, Oxford United Kingdom, July 2022. ACM.
- Challenges to the reproducibility of machine learning models in health care. JAMA, 323(4):305, January 2020.
- Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nature Machine Intelligence, 3(3):199–217, March 2021. Number: 3 Publisher: Nature Publishing Group.
- Unreproducible research is reproducible. In International Conference on Machine Learning, pages 725–734. PMLR, May 2019. ISSN: 2640-3498.
- Reproducibility in machine learning for health research: Still a ways to go. Science Translational Medicine, 13(586), March 2021. Publisher: American Association for the Advancement of Science Section: Perspective.
- State of the art: Reproducibility in artificial intelligence. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018.
- Machine learning for medical imaging: Methodological failures and recommendations for the future. NPJ digital medicine, 5(1):48, April 2022.
- Descending through a crowded valley - benchmarking deep learning optimizers. In Proceedings of the 38th International Conference on Machine Learning, pages 9367–9376. PMLR, July 2021. ISSN: 2640-3498.
- Accounting for variance in machine learning benchmarks. page 23, 2020.
- Meaningless comparisons lead to false optimism in medical machine learning. PLOS ONE, 12(9):e0184604, September 2017.
- Winner’s curse? on pace, progress, and empirical rigor. June 2018.
- Successes and struggles with computational reproducibility: Lessons from the fragile families challenge. Socius: Sociological Research for a Dynamic World, 5:237802311984980, January 2019.
- Navigating the development challenges in creating complex data systems. Nature Machine Intelligence, 5(7):681–686, June 2023.
- Do machine learning platforms provide out-of-the-box reproducibility? Future Generation Computer Systems, 126:34–47, January 2022.
- Prediction and explanation in social systems. Science, 355(6324):486–488, February 2017.
- John Banja. AI hype and radiology: A plea for realism and accuracy. Radiology: Artificial Intelligence, 2(4):e190223, July 2020.
- On the reproducibility of psychological science. Journal of the American Statistical Association, 112(517):1–10, January 2017.
- Perspectives on machine learning from psychology’s reproducibility crisis, April 2021. arXiv:2104.08878 [cs].
- Improving reproducibility in machine learning research (a report from the NeurIPS 2019 reproducibility program). The Journal of Machine Learning Research, 22(1):164:7459–164:7478, July 2022.
- Nonreplicable publications are cited more than replicable ones. Science Advances, 7(21):eabd1705, May 2021.
- Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences, 117(15):8398–8403, 2020.
- Checklist for artificial intelligence in medical imaging (CLAIM): A guide for authors and reviewers. Radiology: Artificial Intelligence, 2(2):e200029, March 2020. Publisher: Radiological Society of North America.
- STARD 2015: An updated list of essential items for reporting diagnostic accuracy studies. BMJ, page h5527, October 2015.
- Strengthening the Reporting of Observational Studies in Epidemiology for respondent-driven sampling studies: “STROBE-RDS” statement. Journal of Clinical Epidemiology, 68(12):1463–1471, December 2015.
- How is model-related uncertainty quantified and reported in different disciplines? 2022.
- Leakage and the reproducibility crisis in ML-based science, July 2022. arXiv:2207.07048 [cs, stat].
- Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): The TRIPOD statement. BMC Medicine, 13(1):1, January 2015.
- Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. The Medical Journal of Australia, 185(5):263–267, September 2006.
- A checklist is associated with increased quality of reporting preclinical biomedical research: A systematic review. PLoS ONE, 12(9):e0183591, September 2017.
- Principles and guidelines for reporting preclinical research. https://www.nih.gov/research-training/rigor-reproducibility/principles-guidelines-reporting-preclinical-research, August 2015.
- Reporting guidelines. The EQUATOR Network. https://www.equator-network.org/reporting-guidelines/.
- ‘just what do you think you’re doing, Dave?’ A checklist for responsible data use in NLP. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 4821–4833, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
- On reproducible AI: Towards reproducible research, open science, and digital scholarship in AI publications. AI Magazine, 39(3):56–68, September 2018.
- David Donoho. 50 years of data science. Journal of Computational and Graphical Statistics, 26(4):745–766, October 2017.
- On the opportunities and risks of foundation models, 2022.
- OpenAI’s policies hinder reproducible research on language models, 2022.
- Integrating explanation and prediction in computational social science. Nature, 595(7866):181–188, July 2021. Bandiera_abtest: a Cg_type: Nature Research Journals Number: 7866 Primary_atype: Reviews Publisher: Nature Publishing Group Subject_term: Interdisciplinary studies;Scientific community Subject_term_id: interdisciplinary-studies;scientific-community.
- Eric Winsberg. Science in the age of computer simulation. University of Chicago Press, 2010.
- Simulating the dynamics of socio-economic systems. In Betina Hollstein, Wenzel Matiaske, and Kai-Uwe Schnapp, editors, Networked governance: New research perspectives, pages 143–161. Springer International Publishing, 2017.
- Machine learning phases of matter. Nature Physics, 13(5):431–434, May 2017.
- Nature. Reporting standards and availability of data, materials, code and protocols.
- Science. Science journals: Editorial policies.
- The MDAR (Materials Design Analysis Reporting) framework for transparent reporting in the life sciences. Proceedings of the National Academy of Sciences, 118(17):e2103238118, April 2021.
- What is your estimand? Defining the target quantity connects statistical evidence to theory. American Sociological Review, 86(3):532–565, June 2021.
- Leland Wilkinson. Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54:594–604, 1999.
- Describing populations and samples in doctoral student research. International Journal of Doctoral Studies, 16:339–362, 2021.
- Quality of reporting of observational longitudinal research. American Journal of Epidemiology, 161(3):280–288, February 2005.
- Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science, 12(6):1123–1128, August 2017.
- Text as data: A new framework for Machine Learning and the Social Sciences. Princeton University Press, 2022.
- Leo Breiman. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science, 16(3):199 – 231, 2001.
- Researcher reasoning meets computational capacity: Machine learning for social science. Social Science Research, 108:102807, 2022.
- Discussion of breiman's ”two cultures”: From two cultures to one. Observational Studies, 7(1):171–174, 2021.
- Galit Shmueli. Comment on breiman's ”two cultures” (2002): From two cultures to multicultural. Observational Studies, 7(1):197–201, 2021.
- Reasoning using data: Two old ways and one new. Observational Studies, 7(1):3–12, 2021.
- Causal modelling: The two cultures. Observational Studies, 7(1):179–183, 2021.
- Machine learning for sociology. Annual Review of Sociology, 45(1):27–45, 2019.
- Machine learning for social science: An agnostic approach. Annual Review of Political Science, 24, 05 2021.
- Machine learning in health care and laboratory medicine: General overview of supervised learning and auto-ML. International Journal of Laboratory Hematology, 43(S1):15–22, July 2021.
- Big data and machine learning in health care. JAMA, 319(13):1317, April 2018.
- Believing in black boxes: machine learning for healthcare does not need explainability to be evidence-based. Journal of Clinical Epidemiology, 142:252–257, 2022.
- Machine learning and its applications to biology. PLoS Computational Biology, 3(6):e116, June 2007.
- Victoria Stodden. Reproducing statistical results. Annual Review of Statistics and Its Application, 2(1):1–19, April 2015.
- Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff. Cambridge Journal of Economics, 38(2):257–279, March 2014.
- Retraction note: A mechanistic model of the neural entropy increase elicited by psychedelic drugs. Scientific Reports, 12(1):15500, September 2022.
- May R. Berenbaum. Retraction for Shu et al., Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end. Proceedings of the National Academy of Sciences, 118(38):e2115397118, September 2021.
- How cross-validation can go wrong and what to do about it. Political Analysis, 27(1):101–106, January 2019.
- Expanding the scope of reproducibility research through data analysis replications. Organizational Behavior and Human Decision Processes, 164:192–202, May 2021.
- Overly optimistic prediction results on imbalanced data: A case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine, 111:101987, January 2021.
- Repeatability of published microarray gene expression analyses. Nature Genetics, 41(2):149–155, February 2009.
- Overfitting to ‘predict’ suicidal ideation. Nature Human Behaviour, pages 1–2, April 2023.
- Transparency and reproducibility in artificial intelligence. Nature, 586(7829):E14–E16, October 2020.
- An empirical analysis of journal policy effectiveness for computational reproducibility. Proceedings of the National Academy of Sciences, 115(11):2584–2589, March 2018.
- Many researchers were not compliant with their published data sharing statement: A mixed-methods study. Journal of Clinical Epidemiology, 150:33–41, October 2022.
- Reproducible and reusable research: Are journal data sharing policies meeting the mark? PeerJ, 5:e3208, April 2017.
- Deep Reinforcement Learning That Matters. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1), April 2018. Number: 1.
- A Metric Learning Reality Check. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, Lecture Notes in Computer Science, pages 681–699, Cham, 2020. Springer International Publishing.
- AAAI reproducibility checklist. https://aaai.org/conference/aaai/aaai-23/reproducibility-checklist/.
- NeurIPS 2023 paper guidelines. https://neurips.cc/public/guides/PaperChecklist.
- ICML 2023 paper guidelines. https://icml.cc/Conferences/2023/PaperGuidelines.
- The Journal of Politics: Guidelines for data replication https://www.journals.uchicago.edu/journals/jop/data-replication.
- Promoting an open research culture. Science, 348(6242):1422–1425, June 2015.
- Data and code availability standard. December 2022.
- Reviewing computational methods. Nature Methods, 12(12):1099–1099, December 2015.
- Mitigating dataset harms requires stewardship: Lessons from 1000 papers. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 1, December 2021.
- Data dictionaries, U.S. Geological Survey. https://www.usgs.gov/data-management/data-dictionaries.
- Datasheets for datasets. Communications of the ACM, 64(12):86–92, December 2021.
- Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, October 2019.
- synthpop : Bespoke creation of synthetic data in R. Journal of Statistical Software, 74(11), 2016.
- Ten simple rules for reproducible computational research. PLoS Computational Biology, 9(10):e1003285, October 2013.
- The availability of research data declines rapidly with article age. Current Biology, 24(1):94–97, 2014.
- Scientists losing data at a rapid rate. Nature, December 2013.
- Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. 2(1):e21, July 2014.
- A template README for social science replication packages. December 2020.
- Matias Singers. Awesome README.
- Harbert. Bash scripting, October 2018.
- Computational reproducibility via containers in psychology. Meta-Psychology, 3, November 2019.
- Completeness of reporting of clinical prediction models developed using supervised machine learning: A systematic review. BMC Medical Research Methodology, 22(1), January 2022.
- Reporting quality of studies using machine learning models for medical diagnosis: A systematic review. BMJ Open, 10(3), 2020.
- Garbage in, garbage out: Data collection, quality assessment and reporting standards for social media data use in health research, infodemiology and digital disease detection. Journal of Medical Internet Research, 18(2):e41, February 2016.
- Garbage in, garbage out? In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, January 2020.
- Inclusion and exclusion criteria and the problem of describing homogeneity of study populations in clinical trials. BMJ Evidence - Based Medicine, 24(3):92, 06 2019.
- Matthew Salganik. Bit by bit: Social research in the Digital age. Princeton University Press, 2019.
- Fairness and Machine Learning: Limitations and Opportunities. fairmlbook.org, 2019. http://www.fairmlbook.org.
- Measurement and fairness. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 375–385, New York, NY, USA, 2021. Association for Computing Machinery.
- Three cheers for descriptive statistics—and five more reasons why they matter. Industrial and Organizational Psychology, 14(4):486–488, 2021.
- Reporting and interpreting quantitative research findings: What gets reported and recommendations for the field. Language Learning, 65(S1):127–159, May 2015.
- Unrepresentative big surveys significantly overestimated US vaccine uptake. Nature, 600(7890):695–700, December 2021.
- Luke Plonsky. Study quality in SLA: An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 35(4):655–687, 2013.
- Missing Data : A Gentle Introduction. Guilford Publications, 2007.
- Missing data in educational research: A review of reporting practices and suggestions for improvement. Review of Educational Research, 74(4):525–556, 2004.
- Supplementary information for: Measuring the predictability of life outcomes with a scientific mass collaboration. Proceedings of the National Academy of Sciences, 117(15):8398–8403, March 2020.
- Managing missing data in patient registries: Addendum to registries for evaluating patient outcomes: A user’s guide (Third edition). 2018. https://www.ncbi.nlm.nih.gov/books/NBK493611/.
- Missing data is poorly handled and reported in prediction model studies using machine learning: A literature review. Journal of Clinical Epidemiology, 142:218–229, February 2022.
- On the joys of missing data. Journal of Pediatric Psychology, 39(2):151–162, July 2013.
- Attrition in developmental psychology. International Journal of Behavioral Development, 41(1):143–153, July 2016.
- William R. Sterner. What is missing in counseling research? reporting missing data. Journal of Counseling and Development, 89(1):56–62, January 2011.
- Quality of missing data reporting and handling in palliative care trials demonstrates that further development of the CONSORT statement is required: a systematic review. Journal of Clinical Epidemiology, 88:81–91, August 2017.
- Prediction and explanation in social systems. Science, 355(6324):486–488, 2017.
- Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 international conference on management of data, pages 2201–2206, 2016.
- Methods to detect low quality data and its implication for psychological research. Behavior research methods, 50:2586–2596, 2018.
- Overly optimistic prediction results on imbalanced data: A case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine, 111:101987, 2021.
- Classification of datasets with imputed missing values: Does imputation quality matter? arXiv preprint arXiv:2206.08478, 2022.
- Deep learning for automatic brain tumour segmentation on mri: Evaluation of recommended reporting criteria via a reproduction and replication study. BMJ open, 12(7):e059000, 2022.
- Edward Raff. A step toward quantifying independently reproducible machine learning research. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
- Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency, pages 220–229, 2019.
- The theory is predictive, but is it complete? An application to human perception of randomness. In Proceedings of the 2017 ACM Conference on Economics and Computation, pages 125–126, 2017.
- Explainable machine learning for scientific insights and discoveries. Ieee Access, 8:42200–42216, 2020.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
- Sebastian Raschka. Model evaluation, model selection, and algorithm selection in machine learning, November 2020. arXiv:1811.12808 [cs, stat].
- Gavin C. Cawley and Nicola L. C. Talbot. On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11(70):2079–2107, 2010.
- Predictive Multiplicity in Classification. In Proceedings of the 37th International Conference on Machine Learning, pages 6765–6774. PMLR, November 2020. ISSN: 2640-3498.
- Predictive Multiplicity in Probabilistic Classification | Proceedings of the AAAI Conference on Artificial Intelligence. 2023.
- Model multiplicity: Opportunities, concerns, and solutions. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, June 2022.
- Show your work: Improved reporting of experimental results. arXiv preprint arXiv:1909.03004, 2019.
- Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. arXiv preprint arXiv:1708.04133, 2017.
- Hyperparameter Optimization Is Deceiving Us, and How to Stop It. In Advances in Neural Information Processing Systems, volume 34, pages 3081–3095. Curran Associates, Inc., 2021.
- Optimizer Benchmarking Needs to Account for Hyperparameter Tuning. In Proceedings of the 37th International Conference on Machine Learning, pages 9036–9045. PMLR, November 2020. ISSN: 2640-3498.
- Benchmarking Neural Network Training Algorithms, June 2023. arXiv:2306.07179 [cs, stat].
- Tunability: Importance of hyperparameters of machine learning algorithms. The Journal of Machine Learning Research, 20(1):1934–1965, 2019.
- Jimmy Lin. The neural hype and comparisons against weak baselines. ACM SIGIR Forum, 52(2):40–51, January 2019.
- Michael A. Lones. How to avoid machine learning pitfalls: A guide for academic researchers, February 2023. arXiv:2108.02497 [cs].
- Leakage in data mining: Formulation, detection, and avoidance. ACM Transactions on Knowledge Discovery from Data, 6(4):15:1–15:21, December 2012.
- Casey Ross. Epic’s sepsis algorithm is going off the rails in the real world. The use of these variables may explain why, September 2021.
- Applied Predictive Modeling. Springer-Verlag, New York, 2013.
- Establishment of best practices for evidence for prediction: A review. JAMA psychiatry, 77(5):534–540, May 2020.
- Training machine learning models on patient level data segregation is crucial in practical clinical applications, April 2020.
- Momin M. Malik. A hierarchy of limitations in machine learning, February 2020. arXiv:2002.05193 [cs, econ, math, stat].
- On the use of cross-validation for time series predictor evaluation. Information Sciences, 191:192–213, May 2012.
- Let’s (not) stick together: Pairwise similarity biases cross-validation in activity recognition. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, pages 1041–1051, Osaka Japan, September 2015. ACM.
- Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8):913–929, August 2017.
- Data leakage in health outcomes prediction with machine learning. Comment on “Prediction of incident hypertension within the next year: Prospective study using statewide electronic health records and machine learning”. Journal of Medical Internet Research, 23(2):e10969, February 2021.
- Prediction of incident hypertension within the next year: Prospective study using statewide electronic health records and machine learning. Journal of Medical Internet Research, 20(1):e22, January 2018.
- Model Selection’s Disparate Impact in Real-World Deep Learning Applications, September 2021. arXiv:2104.00606 [cs].
- Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, pages 401–413, New York, NY, USA, July 2021. Association for Computing Machinery.
- Is My Prediction Arbitrary? The Confounding Effects of Variance in Fair Classification Benchmarks, August 2023. arXiv:2301.11562 [cs, stat].
- Are My Deep Learning Systems Fair? An Empirical Study of Fixed-Seed Training. In Advances in Neural Information Processing Systems, volume 34, pages 30211–30227. Curran Associates, Inc., 2021.
- Cristobal Young. Model Uncertainty and the Crisis in Science. Socius, 4:2378023117737206, January 2018. Publisher: SAGE Publications.
- Learning with skewed class distributions. Advances in Logic, Artificial Intelligence and Robotics, 85:173–180, 2002.
- Auc: A misleading measure of the performance of predictive distribution models. Global ecology and Biogeography, 17(2):145–151, 2008.
- E-mail spam filtering: A review of techniques and trends. Advances in Electronics, Communication and Computing: ETAEERE-2016, pages 583–590, 2018.
- How is model-related uncertainty quantified and reported in different disciplines? arXiv preprint arXiv:2206.12179, 2022.
- Scientists rise up against statistical significance. Nature, 567(7748):305–307, 2019.
- Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, 2 edition, 2001.
- Elements of external validity: Framework, design, and analysis. American Political Science Review, 117(3):1070–1088, October 2022.
- The fallacy of AI functionality. In 2022 ACM Conference on Fairness, Accountability, and Transparency, pages 959–972, Seoul Republic of Korea, June 2022. ACM.
- Are we learning yet? A meta review of evaluation failures across machine learning. Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, 1, December 2021.
- Manipulative tactics are the norm in political emails: Evidence from 300k emails from the 2020 US election cycle. Big Data & Society, 10(1):205395172211453, January 2023.
- Jason Brownlee. Difference between algorithm and model in machine learning, April 2020. https://machinelearningmastery.com/difference-between-algorithm-and-model-in-machine-learning/.
- Influenza forecasting with Google Flu trends. PLoS ONE, 8(2):e56176, February 2013.
- A fine-grained analysis on distribution shift, 2021. https://arxiv.org/abs/2110.11328.
- WILDS: A benchmark of in-the-wild distribution shifts, 2020. https://arxiv.org/abs/2012.07421.
- Improving refugee integration through data-driven algorithmic assignment. Science, 359(6373):325–329, January 2018.
- Reporting of demographic data and representativeness in machine learning models using electronic health records. Journal of the American Medical Informatics Association, 27(12):1878–1884, September 2020.
- Systematic review finds “spin” practices and poor reporting standards in studies on machine learning-based prediction models. Journal of Clinical Epidemiology, 158:99–110, June 2023.
- The clinician and dataset shift in artificial intelligence. New England Journal of Medicine, 385(3):283–286, July 2021.
- Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, November 2020.
- Sayash Kapoor (24 papers)
- Emily Cantrell (1 paper)
- Kenny Peng (12 papers)
- Thanh Hien Pham (1 paper)
- Christopher A. Bail (2 papers)
- Odd Erik Gundersen (6 papers)
- Jake M. Hofman (14 papers)
- Jessica Hullman (46 papers)
- Michael A. Lones (15 papers)
- Momin M. Malik (5 papers)
- Priyanka Nanayakkara (10 papers)
- Inioluwa Deborah Raji (25 papers)
- Michael Roberts (25 papers)
- Matthew J. Salganik (7 papers)
- Marta Serra-Garcia (1 paper)
- Brandon M. Stewart (9 papers)
- Gilles Vandewiele (10 papers)
- Arvind Narayanan (48 papers)
- Russell A. Poldrack (18 papers)