Machine Learning Predicts Upper Secondary Education Dropout as Early as the End of Primary School (2403.14663v1)
Abstract: Education plays a pivotal role in alleviating poverty, driving economic growth, and empowering individuals, thereby significantly influencing societal and personal development. However, the persistent issue of school dropout poses a significant challenge, with its effects extending beyond the individual. While previous research has employed machine learning for dropout classification, these studies often suffer from a short-term focus, relying on data collected only a few years into the study period. This study expanded the modeling horizon by utilizing a 13-year longitudinal dataset, encompassing data from kindergarten to Grade 9. Our methodology incorporated a comprehensive range of parameters, including students' academic and cognitive skills, motivation, behavior, well-being, and officially recorded dropout data. The machine learning models developed in this study demonstrated notable classification ability, achieving a mean area under the curve (AUC) of 0.61 with data up to Grade 6 and an improved AUC of 0.65 with data up to Grade 9. Further data collection and independent correlational and causal analyses are crucial. In future iterations, such models may have the potential to proactively support educators' processes and existing protocols for identifying at-risk students, thereby potentially aiding in the reinvention of student retention and success strategies and ultimately contributing to improved educational outcomes.
- Keeping children in school: Effects of household and context characteristics on school dropout in 363 districts of 30 developing countries. \JournalTitleSAGE Open 5, 2158244015609666, DOI: 10.1177/2158244015609666 (2015).
- Breton, T. R. Can institutions or education explain world poverty? An augmented Solow model provides some insights. \JournalTitleThe Journal of Socio-Economics 33, 45–69, DOI: 10.1016/j.socec.2003.12.004 (2004).
- World, B. The Human Capital Index 2020 Update: Human Capital in the Time of COVID-19 (The World Bank, 2021). _eprint: https://elibrary.worldbank.org/doi/pdf/10.1596/978-1-4648-1552-2.
- Bäckman, O. High school dropout, resource attainment, and criminal convictions. \JournalTitleJournal of Research in Crime and Delinquency 54, 715–749, DOI: 10.1177/0022427817697441 (2017).
- Bjerk, D. Re-examining the impact of dropping out on criminal and labor outcomes in early adulthood. \JournalTitleEconomics of Education Review 31, 110–122, DOI: 10.1016/j.econedurev.2011.09.003 (2012).
- Labour market outcomes and skill acquisition of high-school dropouts. \JournalTitleJournal of Labor Research 31, 39–52, DOI: 10.1007/s12122-009-9074-5 (2010).
- High school dropout and the intergenerational transmission of crime. \JournalTitleIZA Discussion Paper 14129, DOI: 10.2139/ssrn.3794075 (2021).
- Catterall, J. S. The societal benefits and costs of school dropout recovery. \JournalTitleEducation Research International 2011, 957303, DOI: 10.1155/2011/957303 (2011).
- Reframing school dropout as a public health issue. \JournalTitlePreventing Chronic Disease 4, A107 (2007).
- Cumulative socio-economic disadvantage and secondary education in Finland. \JournalTitleEuropean Sociological Review 32, 649–661, DOI: 10.1093/esr/jcw021 (2016).
- Risk Factors for School Absenteeism and Dropout: A Meta-Analytic Review. \JournalTitleJournal of Youth and Adolescence 48, 1637–1667, DOI: 10.1007/s10964-019-01072-5 (2019).
- EUROSTAT. Early leavers from education and training. (2021).
- Official Statistics of Finland (OSF). Discontinuation of education (2022).
- Deep learning. \JournalTitleNature 521, 436–444 (2015).
- Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. \JournalTitleNature 542, 115–118, DOI: 10.1038/nature21056 (2017).
- Liu, X. et al. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. \JournalTitleThe Lancet Digital Health 1, e271–e297, DOI: 10.1016/S2589-7500(19)30123-2 (2019).
- Synthesizing bidirectional temporal states of knee osteoarthritis radiographs with cycle-consistent generative adversarial neural networks. \JournalTitlearXiv preprint arXiv:2311.05798 (2023).
- DeepFake knee osteoarthritis X-rays from generative adversarial neural networks deceive medical experts and offer augmentation potential to automatic classification. \JournalTitleScientific Reports 12, 18573, DOI: 10.1038/s41598-022-23081-4 (2022).
- Prezja, F. et al. Improving performance in colorectal cancer histology decomposition using deep and ensemble machine learning. \JournalTitlearXiv preprint arXiv:2310.16954 (2023).
- Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. \JournalTitleNature Medicine 25, 44–56, DOI: 10.1038/s41591-018-0300-7 (2019).
- Wornow, M. et al. The shaky foundations of clinical foundation models: A survey of large language models and foundation models for emrs. \JournalTitlearXiv preprint arXiv:2303.12961 (2023).
- Peng, Z. et al. Kosmos-2: Grounding multimodal large language models to the world. \JournalTitlearXiv preprint arXiv:2306.14824 (2023).
- Livne, M. et al. nach0: Multimodal natural and chemical languages foundation model. \JournalTitlearXiv preprint arXiv:2311.12410 (2023).
- Luo, Y. et al. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. \JournalTitlearXiv preprint arXiv:2308.09442 (2023).
- Bernardo, A. B. I. et al. Profiling low-proficiency science students in the Philippines using machine learning. \JournalTitleHumanities and Social Sciences Communications 10, 192, DOI: 10.1057/s41599-023-01705-y (2023).
- The role of demographic and academic features in a student performance prediction. \JournalTitleScientific Reports 12, 12508, DOI: 10.1038/s41598-022-15880-6 (2022).
- An explainable machine learning approach for student dropout prediction. \JournalTitleExpert Systems with Applications 233, 120933, DOI: 10.1016/j.eswa.2023.120933 (2023).
- High-school dropout prediction using machine learning: A danish large-scale study. In ESANN, vol. 2015, 23rd (2015).
- Dropout early warning systems for high school students using machine learning. \JournalTitleChildren and Youth Services Review 96, 346–353, DOI: https://doi.org/10.1016/j.childyouth.2018.11.030 (2019).
- The machine learning-based dropout early warning system for improving the performance of dropout prediction. \JournalTitleApplied Sciences 9, DOI: 10.3390/app9153093 (2019).
- Sansone, D. Beyond early warning indicators: High school dropout and machine learning. \JournalTitleOxford Bulletin of Economics and Statistics 81, 456–485, DOI: 10.1111/obes.12277 (2019).
- Aguiar, E. et al. Who, when, and why: A machine learning approach to prioritizing students at risk of not graduating high school on time. In Proceedings of the Fifth International Conference on Learning Analytics And Knowledge, LAK ’15, 93–102, DOI: 10.1145/2723576.2723619 (Association for Computing Machinery, New York, NY, USA, 2015).
- School dropout prediction and feature importance exploration in Malawi using household panel data: machine learning approach. \JournalTitleJournal of Computational Social Science 6, 245–287, DOI: 10.1007/s42001-022-00195-3 (2023).
- Sorensen, L. C. “Big Data” in educational administration: An application for predicting school dropout risk. \JournalTitleEducational Administration Quarterly 55, 404–446, DOI: 10.1177/0013161X18799439 (2019).
- Schoeneberger, J. A. Longitudinal attendance patterns: Developing high school dropouts. \JournalTitleThe Clearing House: A Journal of Educational Strategies, Issues and Ideas 85, 7–14, DOI: 10.1080/00098655.2011.603766 (2012).
- Preventing student disengagement and keeping students on the graduation path in urban middle-grades schools: Early identification and effective interventions. \JournalTitleEducational Psychologist 42, 223–235, DOI: 10.1080/00461520701621079 (2007).
- Knowles, J. E. Of needles and haystacks: Building an accurate statewide dropout early warning system in wisconsin. \JournalTitleJournal of Educational Data Mining 7, 18–67, DOI: 10.5281/zenodo.3554725 (2015).
- Rumberger, R. W. Why Students Drop Out of High School and What Can Be Done About It (Harvard University Press, Cambridge, MA and London, England, 2012).
- Developmental dynamics of math performance from preschool to Grade 2. \JournalTitleJournal of Educational Psychology 96, 699–713, DOI: 10.1037/0022-0663.96.4.699 (2004).
- Reading and oral vocabulary development in early adolescence. \JournalTitleScientific Studies of Reading 24, 380–396, DOI: 10.1080/10888438.2019.1689244 (2020).
- Prediction of the development of reading comprehension: a longitudinal study. \JournalTitleApplied Cognitive Psychology 22, 407–423, DOI: 10.1002/acp.1414 (2008).
- Khanolainen, D. et al. Longitudinal effects of the home learning environment and parental difficulties on reading and math development across Grades 1–9. \JournalTitleFrontiers in Psychology 11, DOI: 10.3389/fpsyg.2020.577981 (2020).
- Psyridou, M. et al. Developmental profiles of arithmetic fluency skills from grades 1 to 9 and their early identification. \JournalTitleDevelopmental Psychology 59, 2379–2396, DOI: 10.1037/dev0001622 (2023).
- Psyridou, M. et al. Developmental profiles of reading fluency and reading comprehension from grades 1 to 9 and their early identification. \JournalTitleDevelopmental Psychology 57, 1840–1854, DOI: 10.1037/dev0000976 (2021).
- Lerkkanen, M.-K. et al. The first steps study [alkuportaat] (2006-2016).
- The school path: from first steps to secondary and higher education study [koulupolku: Alkuportailta jatko-opintoihin] (2016-).
- Official Statistics of Finland (OSF). Statistical databases (2007).
- Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. \JournalTitleJournal of Machine Learning Research 18, 1–5 (2017).
- Breiman, L. Random forests. \JournalTitleMachine learning 45, 5–32 (2001).
- Exploratory undersampling for class-imbalance learning. \JournalTitleIEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39, 539–550 (2008).
- A decision-theoretic generalization of on-line learning and an application to boosting. \JournalTitleJournal of computer and system sciences 55, 119–139 (1997).
- Breiman, L. Bagging predictors. \JournalTitleMachine learning 24, 123–140 (1996).
- Pedregosa, F. et al. Scikit-learn: Machine learning in Python. \JournalTitleJournal of Machine Learning Research 12, 2825–2830 (2011).
- Quinlan, J. R. Induction of decision trees. \JournalTitleMachine learning 1, 81–106 (1986).
- The balanced accuracy and its posterior distribution. In 2010 20th international conference on pattern recognition, 3121–3124 (IEEE, 2010).
- Kohavi, R. et al. A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, vol. 14, 1137–1145 (Montreal, Canada, 1995).
- Prezja, F. Deep fast vision: A python library for accelerated deep transfer learning vision prototyping. \JournalTitlearXiv preprint arXiv:2311.06169 (2023).
- Maria Psyridou (1 paper)
- Fabi Prezja (7 papers)
- Minna Torppa (1 paper)
- Marja-Kristiina Lerkkanen (1 paper)
- Anna-Maija Poikkeus (1 paper)
- Kati Vasalampi (1 paper)