Temporal and Between-Group Variability in College Dropout Prediction (2401.06498v1)
Abstract: Large-scale administrative data is a common input in early warning systems for college dropout in higher education. Still, the terminology and methodology vary significantly across existing studies, and the implications of different modeling decisions are not fully understood. This study provides a systematic evaluation of contributing factors and predictive performance of machine learning models over time and across different student groups. Drawing on twelve years of administrative data at a large public university in the US, we find that dropout prediction at the end of the second year has a 20% higher AUC than at the time of enroLLMent in a Random Forest model. Also, most predictive factors at the time of enroLLMent, including demographics and high school performance, are quickly superseded in predictive importance by college performance and in later stages by enroLLMent behavior. Regarding variability across student groups, college GPA has more predictive value for students from traditionally disadvantaged backgrounds than their peers. These results can help researchers and administrators understand the comparative value of different data sources when building early warning systems and optimizing decisions under specific policy goals.
- Using learning analytics to develop early-warning system for at-risk students. International Journal of Educational Technology in Higher Education 16, 1 (2019), 1–20. https://doi.org/10.1186/s41239-019-0172-z
- Joseph J. Allaire and François Chollet. 2023. keras: R Interface to ’Keras’. https://CRAN.R-project.org/package=keras R package version 2.11.1.
- Joseph J. Allaire and Yuan Tang. 2022. tensorflow: R Interface to ’TensorFlow’. https://CRAN.R-project.org/package=tensorflow R package version 2.11.0.
- Individual, social, and family factors associated with high school dropout among low-SES youth: Differential effects as a function of immigrant status. British Journal of Educational Psychology 87, 3 (2017), 456–477. https://doi.org/10.1111/bjep.12159
- Kimberly E. Arnold and Matthew D. Pistilli. 2012. Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. 267–270. https://doi.org/10.1145/2330601.2330666
- A framework for measuring undergraduate learning and growth. Change: The Magazine of Higher Learning 53, 6 (2021), 51–59. https://doi.org/10.1080/00091383.2021.1987810
- Mining University Registrar Records to Predict First-Year Undergraduate Attrition. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019). 9–18.
- Cédric Beaulac and Jeffrey S. Rosenthal. 2019. Predicting University Students’ Academic Success and Major Using Random Forests. Research in Higher Education 60, 7 (2019), 1048–1064. https://doi.org/10.1007/s11162-019-09546-y
- Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods. Journal of Educational Data Mining 11, 3 (2019), 1–41. https://doi.org/10.5281/zenodo.3594771
- Predicting achievement and providing support before STEM majors begin to fail. Computers & Education 158 (2020), 103999. https://doi.org/10.1016/j.compedu.2020.103999
- Bringing Transparency to Predictive Analytics: A Systematic Comparison of Predictive Modeling Methods in Higher Education. AERA Open 7 (2021). https://doi.org/10.1177/23328584211037630
- What and when: the role of course type and timing in students’ academic performance. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. 459–468. https://doi.org/10.1145/2883851.2883907
- Factors that determine the persistence and dropout of university students. Psicothema 30, 4 (2018), 408–414. https://doi.org/10.7334/psicothema2018.155
- Mohamed Amine Chatti and Arham Muslim. 2019. The PERLA Framework: Blending Personalization and Learning Analytics. International Review of Research in Open and Distributed Learning 20, 1 (2019). https://doi.org/10.19173/irrodl.v20i1.3936
- Predicting students drop out: A case study. In Proceedings of the 2nd International Conference on Educational Data Mining, EDM 2009, July 1-3, 2009. Cordoba, Spain. 41–50.
- Student Dropout Prediction. Springer, Cham, 129–140. https://doi.org/10.1007/978-3-030-52237-7_11
- Dursun Delen. 2010. A comparative analysis of machine learning techniques for student retention management. Decision Support Systems 49, 4 (2010), 498–506. https://doi.org/10.1016/j.dss.2010.06.003
- A Temporal Investigation of Factors Related to Timely Degree Completion. The Journal of Higher Education 73, 5 (2002), 555–581. https://doi.org/10.1353/jhe.2002.0042
- Affective and cognitive variables involved in structural prediction of university dropout. Psicothema 31, 4 (2019), 429–436. https://doi.org/10.7334/psicothema2019.124
- Sandra L Dika and Mark M D’Amico. 2016. Early experiences and integration in the persistence of first-generation college students in STEM and non-STEM majors. Journal of Research in Science Teaching 53, 3 (2016), 368–383. https://doi.org/10.1002/tea.21301
- Improving Underrepresented Minority Student Persistence in STEM. CBE—Life Sciences Education 15, 3 (2016), es5. https://doi.org/10.1187/cbe.16-01-0038
- The role of non-cognitive variables in identifying community college students in need of targeted supports. Research in Higher Education 61 (2020), 725–763. https://doi.org/10.1007/s11162-020-09588-7
- Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Education and Information Technologies 27, 3 (2022), 3743–3775. https://doi.org/10.1007/s10639-021-10741-7
- Mining Big Data in Education: Affordances and Challenges. Review of Research in Education 44, 1 (2020), 130–160. https://doi.org/10.3102/0091732X20903304
- Bobby J Franklin and Stephen B Trouard. 2016. Comparing dropout predictors for two state-level panels using Grade 6 and Grade 8 data. The Journal of Educational Research 109, 6 (2016), 631–639. https://doi.org/10.1080/00220671.2015.1016601
- Tuning model parameters in class-imbalanced learning with precision-recall curve. Biometrical Journal 61, 3 (2019), 652–664. https://doi.org/10.1002/bimj.201800148
- Self-fulfilling prophecies in the classroom: Teacher expectations, teacher feedback and student achievement. Learning and Instruction 66 (2020), 101296.
- Learning Factor Models of Students at Risk of Failing in the Early Stage of Tertiary Education. Journal of Learning Analytics 3, 2 (2016), 330–372. https://doi.org/10.18608/jla.2016.32.20
- Karin Hartl. 2019. The Application Potential of Data Mining in Higher Education Management: A Case Study Based on German Universities. Ph. D. Dissertation. Karlsruher Institut für Technologie (KIT). https://doi.org/10.5445/IR/1000096613
- Predicting Academic Performance: A Systematic Literature Review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (Larnaca, Cyprus). Association for Computing Machinery, New York, NY, USA, 175–199. https://doi.org/10.1145/3293881.3295783
- Serge Herzog. 2005. Measuring Determinants of Student Return VS. Dropout/Stopout VS. Transfer: A First-to-Second Year Analysis of New Freshmen. Research in Higher Education 46, 8 (2005), 883–928. https://doi.org/10.1007/s11162-005-6933-7
- A latent profile analysis of undergraduates’ achievement motivations and metacognitive behaviors, and their relations to achievement in science. Journal of Educational Psychology 112, 7 (2020), 1409. https://doi.org/10.1037/edu0000445
- Contrasting prediction methods for early warning systems at undergraduate level. The Internet and Higher Education 37 (2018), 66–75. https://doi.org/10.1016/j.iheduc.2018.02.001
- Putting learning back into learning analytics: actions for policy makers, researchers, and practitioners. Educational Technology Research and Development (2020), 1–20. https://doi.org/10.1007/s11423-020-09909-8
- Terry T. Ishitani. 2016. Time-varying effects of academic and social integration on student persistence for first and second years in college: National data approach. Journal of College Student Retention: Research, Theory & Practice 18, 3 (2016), 263–286. https://doi.org/10.1177/1521025115622781
- Terry T. Ishitani and Stephen L. DesJardins. 2002. A longitudinal investigation of dropout from college in the United States. Journal of College Student Retention: Research, Theory & Practice 4, 2 (2002), 173–201. https://doi.org/10.2190/V4EN-NW42-742Q-2NTL
- Early Alert of Academically At-Risk Students: An Open Source Analytics Initiative. Journal of Learning Analytics 1, 1 (2014), 6–47. https://doi.org/10.18608/jla.2014.11.3
- A prospective investigation of students’ academic achievement and dropout in higher education: a Self-Determination Theory approach. Educational Psychology 38, 9 (2018), 1163–1184. https://doi.org/10.1080/01443410.2018.1502412
- Predicting student dropout: A machine learning approach. European Journal of Higher Education 10, 1 (2020), 28–47. https://doi.org/10.1080/21568235.2020.1718520
- Preventing student dropout in distance learning using machine learning techniques. In Knowledge-Based Intelligent Information and Engineering Systems: 7th International Conference, KES 2003, Oxford, UK, September 2003. Proceedings, Part II 7. Springer, 267–274. https://doi.org/10.1007/978-3-540-45226-3_37
- Zlatko J. Kovacic. 2010. Early Prediction of Student Success: Mining Students Enrolment Data. In InSITE Conference. Informing Science Institute, 647–665. https://doi.org/10.28945/1281
- Andy Liaw and Matthew Wiener. 2002. Classification and Regression by randomForest. R News 2, 3 (2002), 18–22. https://CRAN.R-project.org/doc/Rnews/
- Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression. Research in Higher Education 59, 3 (2018), 382–400. https://doi.org/10.1007/s11162-017-9473-z
- Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics. Scientific Reports 13, 1 (2023), 5705. https://doi.org/10.1038/s41598-023-32484-w
- e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071 R package version 1.7-13.
- University student retention: Best time and data to identify undergraduate students at risk of dropout. Innovations in Education and Teaching International 57, 1 (2020), 74–85. https://doi.org/10.1080/14703297.2018.1502090
- A practical evaluation of Web analytics. Internet Research 14, 4 (2004), 284–293. https://doi.org/10.1108/10662240410555306
- L. Reisel and I. Brekke. 2010. Minority Dropout in Higher Education: A Comparison of the United States and Norway Using Competing Risk Event History Analysis. European Sociological Review 26, 6 (2010), 691–712. https://doi.org/10.1093/esr/jcp045
- Ido Roll and Philip H. Winne. 2015. Understanding, evaluating, and supporting self-regulated learning using learning analytics. Journal of Learning Analytics 2, 1 (2015), 7–12. https://doi.org/10.18608/jla.2015.21.2
- Note to first-year university students: Just do it! In the end, the fact that you study may be more important than how you study. Uniped 46, 1 (2023), 28–42. https://doi.org/10.18261/uniped.46.1.4
- Student retention using educational data mining and predictive analytics: a systematic literature review. IEEE Access (2022), 72480 – 503. https://doi.org/10.1109/access.2022.3188767
- Perspectives to predict dropout in university students with machine learning. In 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, 1–6. https://doi.org/10.1109/iwobi.2018.8464191
- Predictors of categorical at-risk high school dropouts. Journal of Counseling & Development 85, 2 (2007), 196–203. https://doi.org/10.1002/j.1556-6678.2007.tb00463.x
- Hasan Tanvir and Irene-Angelica Chounta. 2021. Exploring the Importance of Factors Contributing to Dropouts in Higher Education over Time. In Proceedings of The 14th International Conference on Educational Data Mining. 502–509.
- Vincent Tinto. 1975. Dropout from Higher Education: A Theoretical Synthesis of Recent Research. Review of Educational Research 45, 1 (1975), 89–125. https://doi.org/10.3102/00346543045001089
- Stef van Buuren and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45, 3 (2011), 1–67. https://doi.org/10.18637/jss.v045.i03
- W. N. Venables and B. D. Ripley. 2002. Modern Applied Statistics with S. Springer New York. https://doi.org/10.1007/978-0-387-21706-2
- Accuracy of a cross-program model for dropout prediction in higher education. In Companion Proceedings of the 10th International Learning Analytics & Knowledge Conference (LAK 2020). 744–749.
- Should College Dropout Prediction Models Include Protected Attributes?. In Proceedings of the Eighth ACM Conference on Learning @ Scale. ACM, New York, NY, USA, 91–100. https://doi.org/10.1145/3430895.3460139
- Identifying Longitudinal Attendance Patterns through Student Subpopulation Distribution Comparison. In Proceedings of the 15th International Conference on Educational Data Mining. 640. https://doi.org/10.5281/zenodo.6853034
- Dominik Glandorf (2 papers)
- Hye Rin Lee (1 paper)
- Gabe Avakian Orona (1 paper)
- Marina Pumptow (1 paper)
- Renzhe Yu (12 papers)
- Christian Fischer (16 papers)