How to avoid machine learning pitfalls: a guide for academic researchers (2108.02497v5)
Abstract: Mistakes in machine learning practice are commonplace, and can result in a loss of confidence in the findings and products of machine learning. This guide outlines common mistakes that occur when using machine learning, and what can be done to avoid them. Whilst it should be accessible to anyone with a basic understanding of machine learning techniques, it focuses on issues that are of particular concern within academic research, such as the need to do rigorous comparisons and reach valid conclusions. It covers five stages of the machine learning process: what to do before model building, how to reliably build models, how to robustly evaluate models, how to compare models fairly, and how to report results.
- On reporting and interpreting statistical significance and p values in medical research. BMJ Evidence-Based Medicine, 26(2):39–42, 2021. URL http://dx.doi.org/10.1136/bmjebm-2019-111264.
- Explainable artificial intelligence: an analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(5):e1424, 2021. URL https://eprints.lancs.ac.uk/id/eprint/157114.
- A survey of cross-validation procedures for model selection. Statistics surveys, 4:40–79, 2010. URL https://doi.org/10.1214/09-SS054.
- Time for a change: a tutorial for comparing multiple classifiers through bayesian analysis. The Journal of Machine Learning Research, 18(1):2653–2688, 2017. URL https://arxiv.org/abs/1606.04316.
- R. A. Betensky. The p-value requires context, not a threshold. The American Statistician, 73(sup1):115–117, 2019. URL https://www.tandfonline.com/doi/full/10.1080/00031305.2018.1529624.
- Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. arXiv preprint arXiv:2107.05847, 2021. URL https://arxiv.org/abs/2107.05847.
- A critical analysis of metrics used for measuring progress in artificial intelligence, 2020. URL https://arxiv.org/abs/2008.02577.
- Mlj: A julia package for composable machine learning. Journal of Open Source Software, 5(55):2704, 2020. URL https://doi.org/10.21105/joss.02704.
- Feature selection in machine learning: A new perspective. Neurocomputing, 300:70–79, 2018. URL https://doi.org/10.1016/j.neucom.2017.11.077.
- Recent trends in the use of statistical tests for comparing swarm and evolutionary computing algorithms: Practical guidelines and a critical review. Swarm and Evolutionary Computation, 54:100665, 2020. URL https://doi.org/10.1016/j.swevo.2020.100665.
- On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 11:2079–2107, 2010. URL https://www.jmlr.org/papers/volume11/cawley10a/cawley10a.pdf.
- Evaluating time series forecasting models: An empirical study on performance estimation methods. Machine Learning, 109(11):1997–2028, 2020. URL https://doi.org/10.1007/s10994-020-05910-7.
- Developments in mlflow: A system to accelerate the machine learning lifecycle. In Proceedings of the fourth international workshop on data management for end-to-end machine learning, pages 1–4, 2020. URL https://cs.stanford.edu/~matei/papers/2020/deem_mlflow.pdf.
- V. Cox. Exploratory data analysis. In Translating Statistics to Make Decisions, pages 47–74. Springer, 2017.
- A survey on ensemble learning. Frontiers of Computer Science, 14(2):241–258, 2020. URL https://doi.org/10.1007/s11704-019-8208-z.
- E. Gibney. Is AI fuelling a reproducibility crisis in science. Nature, 608:250–251, 2022. URL https://doi.org/10.1038/d41586-022-02035-w.
- Why do tree-based models still outperform deep learning on tabular data? arXiv preprint arXiv:2207.08815, 2022. URL https://arxiv.org/abs/2207.08815.
- Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73:220–239, 2017. URL https://doi.org/10.1016/j.eswa.2016.12.035.
- Pre-trained models: Past, present and future. AI Open, 2:225–250, 2021. URL https://arxiv.org/abs/2106.07139.
- Automl: A survey of the state-of-the-art. Knowledge-Based Systems, 212:106622, 2021. URL https://arxiv.org/abs/1908.00709.
- The extent and consequences of p-hacking in science. PLoS Biol, 13(3):e1002106, 2015. URL https://doi.org/10.1371/journal.pbio.1002106.
- Forecast evaluation for data scientists: common pitfalls and best practices. Data Mining and Knowledge Discovery, 37(2):788–832, 2023. URL https://arxiv.org/abs/2203.10716.
- I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data. Neuroscience & Biobehavioral Reviews, 119:456–467, 2020. URL https://www.biorxiv.org/content/10.1101/078816v2.abstract.
- B. K. Iwana and S. Uchida. An empirical survey of data augmentation for time series classification with neural networks. Plos one, 16(7):e0254841, 2021. URL https://arxiv.org/abs/2007.15951.
- S. Kapoor and A. Narayanan. Leakage and the reproducibility crisis in ml-based science. arXiv preprint arXiv:2207.07048, 2022. URL https://arxiv.org/abs/2207.07048.
- Reforms: Reporting standards for machine learning based science. arXiv preprint arXiv:2308.07832, 2023. URL https://arxiv.org/abs/2308.07832.
- Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1–41, 2022. URL https://arxiv.org/abs/2101.01169.
- M. Kuhn. A short introduction to the caret package. R Found Stat Comput, 1, 2015. URL https://cran.r-project.org/web/packages/caret/vignettes/caret.html.
- Privacy in large language models: Attacks, defenses and future directions. arXiv preprint arXiv:2310.10383, 2023. URL https://arxiv.org/abs/2310.10383.
- A survey of data-driven and knowledge-aware explainable ai. IEEE Transactions on Knowledge and Data Engineering, 2020. URL https://doi.org/10.1109/TKDE.2020.2983930.
- A survey of convolutional neural networks: analysis, applications, and prospects. IEEE transactions on neural networks and learning systems, 2021. URL https://arxiv.org/abs/2004.02806.
- Are we learning yet? a meta review of evaluation failures across machine learning. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021. URL https://openreview.net/forum?id=mPducS1MsEK.
- A survey of transformers. AI Open, 2022. URL https://arxiv.org/abs/2106.04554.
- A survey on evolutionary neural architecture search. IEEE transactions on neural networks and learning systems, 2021. URL https://arxiv.org/abs/2008.10937.
- General pitfalls of model-agnostic interpretation methods for machine learning models. In International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, pages 39–68. Springer, 2020. URL https://arxiv.org/abs/2007.04131.
- Data and its (dis)contents: A survey of dataset development and use in machine learning research, 2020. URL https://arxiv.org/abs/2012.05345.
- Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program), 2020. URL https://arxiv.org/abs/2003.12206.
- S. Raschka. Model evaluation, model selection, and algorithm selection in machine learning, 2020. URL https://arxiv.org/abs/1811.12808.
- Common pitfalls and recommendations for using machine learning to detect and prognosticate for covid-19 using chest radiographs and ct scans. Nature Machine Intelligence, 3(3):199–217, 2021. URL https://doi.org/10.1038/s42256-021-00307-0.
- C. Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. URL https://arxiv.org/abs/1811.10154.
- S. L. Salzberg. On comparing classifiers: Pitfalls to avoid and a recommended approach. Data mining and knowledge discovery, 1(3):317–328, 1997. URL https://doi.org/10.1023/A:1009752403260.
- J. Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015. URL https://arxiv.org/abs/1404.7828.
- Hidden technical debt in machine learning systems. Advances in neural information processing systems, 28:2503–2511, 2015. URL https://papers.nips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf.
- Operationalizing machine learning: An interview study. arXiv preprint arXiv:2209.09125, 2022. URL https://arxiv.org/abs/2209.09125.
- C. Shorten and T. M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):1–48, 2019. URL https://doi.org/10.1186/s40537-019-0197-0.
- Recommendations for reporting machine learning analyses in clinical research. Circulation: Cardiovascular Quality and Outcomes, 13(10):e006556, 2020. URL https://doi.org/10.1161/CIRCOUTCOMES.120.006556.
- D. L. Streiner. Best (but oft-forgotten) practices: the multiple problems of multiplicity—whether and how to correct for many statistical tests. The American journal of clinical nutrition, 102(4):721–728, 2015. URL https://doi.org/10.3945/ajcn.115.113548.
- D. A. Tamburri. Sustainable MLOps: Trends and challenges. In 2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pages 17–23. IEEE, 2020. URL https://doi.org/10.1109/SYNASC51798.2020.00015.
- Overly optimistic prediction results on imbalanced data: a case study of flaws and benefits when applying over-sampling. Artificial Intelligence in Medicine, 111:101987, 2021. URL https://arxiv.org/abs/2001.06296.
- Scikit-learn: Machine learning without learning the machinery. GetMobile: Mobile Computing and Communications, 19(1):29–33, 2015. URL https://doi.org/10.1145/2786984.2786995.
- W. Wang and J. Ruf. Information leakage in backtesting. Available at SSRN 3836631, 2022. URL https://dx.doi.org/10.2139/ssrn.3836631.
- D. H. Wolpert. The supervised learning no-free-lunch theorems. Soft computing and industry, pages 25–42, 2002. URL https://doi.org/10.1007/978-1-4471-0123-9_3.
- Understanding data augmentation for classification: when to warp? In 2016 international conference on digital image computing: techniques and applications (DICTA), pages 1–6. IEEE, 2016. URL https://arxiv.org/abs/1609.08764.
- L. Yang and A. Shami. On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415:295–316, 2020. URL https://doi.org/10.1016/j.neucom.2020.07.061.
- Are transformers effective for time series forecasting? arXiv preprint arXiv:2205.13504, 2022. URL https://arxiv.org/abs/2205.13504.
- Dive into deep learning. arXiv preprint arXiv:2106.11342, 2021. URL https://arxiv.org/abs/2106.11342.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023. URL https://arxiv.org/abs/2302.09419.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.