Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal and Between-Group Variability in College Dropout Prediction (2401.06498v1)

Published 12 Jan 2024 in cs.CY and cs.LG

Abstract: Large-scale administrative data is a common input in early warning systems for college dropout in higher education. Still, the terminology and methodology vary significantly across existing studies, and the implications of different modeling decisions are not fully understood. This study provides a systematic evaluation of contributing factors and predictive performance of machine learning models over time and across different student groups. Drawing on twelve years of administrative data at a large public university in the US, we find that dropout prediction at the end of the second year has a 20% higher AUC than at the time of enroLLMent in a Random Forest model. Also, most predictive factors at the time of enroLLMent, including demographics and high school performance, are quickly superseded in predictive importance by college performance and in later stages by enroLLMent behavior. Regarding variability across student groups, college GPA has more predictive value for students from traditionally disadvantaged backgrounds than their peers. These results can help researchers and administrators understand the comparative value of different data sources when building early warning systems and optimizing decisions under specific policy goals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. Using learning analytics to develop early-warning system for at-risk students. International Journal of Educational Technology in Higher Education 16, 1 (2019), 1–20. https://doi.org/10.1186/s41239-019-0172-z
  2. Joseph J. Allaire and François Chollet. 2023. keras: R Interface to ’Keras’. https://CRAN.R-project.org/package=keras R package version 2.11.1.
  3. Joseph J. Allaire and Yuan Tang. 2022. tensorflow: R Interface to ’TensorFlow’. https://CRAN.R-project.org/package=tensorflow R package version 2.11.0.
  4. Individual, social, and family factors associated with high school dropout among low-SES youth: Differential effects as a function of immigrant status. British Journal of Educational Psychology 87, 3 (2017), 456–477. https://doi.org/10.1111/bjep.12159
  5. Kimberly E. Arnold and Matthew D. Pistilli. 2012. Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge. 267–270. https://doi.org/10.1145/2330601.2330666
  6. A framework for measuring undergraduate learning and growth. Change: The Magazine of Higher Learning 53, 6 (2021), 51–59. https://doi.org/10.1080/00091383.2021.1987810
  7. Mining University Registrar Records to Predict First-Year Undergraduate Attrition. In Proceedings of The 12th International Conference on Educational Data Mining (EDM 2019). 9–18.
  8. Cédric Beaulac and Jeffrey S. Rosenthal. 2019. Predicting University Students’ Academic Success and Major Using Random Forests. Research in Higher Education 60, 7 (2019), 1048–1064. https://doi.org/10.1007/s11162-019-09546-y
  9. Early Detection of Students at Risk - Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods. Journal of Educational Data Mining 11, 3 (2019), 1–41. https://doi.org/10.5281/zenodo.3594771
  10. Predicting achievement and providing support before STEM majors begin to fail. Computers & Education 158 (2020), 103999. https://doi.org/10.1016/j.compedu.2020.103999
  11. Bringing Transparency to Predictive Analytics: A Systematic Comparison of Predictive Modeling Methods in Higher Education. AERA Open 7 (2021). https://doi.org/10.1177/23328584211037630
  12. What and when: the role of course type and timing in students’ academic performance. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge. 459–468. https://doi.org/10.1145/2883851.2883907
  13. Factors that determine the persistence and dropout of university students. Psicothema 30, 4 (2018), 408–414. https://doi.org/10.7334/psicothema2018.155
  14. Mohamed Amine Chatti and Arham Muslim. 2019. The PERLA Framework: Blending Personalization and Learning Analytics. International Review of Research in Open and Distributed Learning 20, 1 (2019). https://doi.org/10.19173/irrodl.v20i1.3936
  15. Predicting students drop out: A case study. In Proceedings of the 2nd International Conference on Educational Data Mining, EDM 2009, July 1-3, 2009. Cordoba, Spain. 41–50.
  16. Student Dropout Prediction. Springer, Cham, 129–140. https://doi.org/10.1007/978-3-030-52237-7_11
  17. Dursun Delen. 2010. A comparative analysis of machine learning techniques for student retention management. Decision Support Systems 49, 4 (2010), 498–506. https://doi.org/10.1016/j.dss.2010.06.003
  18. A Temporal Investigation of Factors Related to Timely Degree Completion. The Journal of Higher Education 73, 5 (2002), 555–581. https://doi.org/10.1353/jhe.2002.0042
  19. Affective and cognitive variables involved in structural prediction of university dropout. Psicothema 31, 4 (2019), 429–436. https://doi.org/10.7334/psicothema2019.124
  20. Sandra L Dika and Mark M D’Amico. 2016. Early experiences and integration in the persistence of first-generation college students in STEM and non-STEM majors. Journal of Research in Science Teaching 53, 3 (2016), 368–383. https://doi.org/10.1002/tea.21301
  21. Improving Underrepresented Minority Student Persistence in STEM. CBE—Life Sciences Education 15, 3 (2016), es5. https://doi.org/10.1187/cbe.16-01-0038
  22. The role of non-cognitive variables in identifying community college students in need of targeted supports. Research in Higher Education 61 (2020), 725–763. https://doi.org/10.1007/s11162-020-09588-7
  23. Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Education and Information Technologies 27, 3 (2022), 3743–3775. https://doi.org/10.1007/s10639-021-10741-7
  24. Mining Big Data in Education: Affordances and Challenges. Review of Research in Education 44, 1 (2020), 130–160. https://doi.org/10.3102/0091732X20903304
  25. Bobby J Franklin and Stephen B Trouard. 2016. Comparing dropout predictors for two state-level panels using Grade 6 and Grade 8 data. The Journal of Educational Research 109, 6 (2016), 631–639. https://doi.org/10.1080/00220671.2015.1016601
  26. Tuning model parameters in class-imbalanced learning with precision-recall curve. Biometrical Journal 61, 3 (2019), 652–664. https://doi.org/10.1002/bimj.201800148
  27. Self-fulfilling prophecies in the classroom: Teacher expectations, teacher feedback and student achievement. Learning and Instruction 66 (2020), 101296.
  28. Learning Factor Models of Students at Risk of Failing in the Early Stage of Tertiary Education. Journal of Learning Analytics 3, 2 (2016), 330–372. https://doi.org/10.18608/jla.2016.32.20
  29. Karin Hartl. 2019. The Application Potential of Data Mining in Higher Education Management: A Case Study Based on German Universities. Ph. D. Dissertation. Karlsruher Institut für Technologie (KIT). https://doi.org/10.5445/IR/1000096613
  30. Predicting Academic Performance: A Systematic Literature Review. In Proceedings Companion of the 23rd Annual ACM Conference on Innovation and Technology in Computer Science Education (Larnaca, Cyprus). Association for Computing Machinery, New York, NY, USA, 175–199. https://doi.org/10.1145/3293881.3295783
  31. Serge Herzog. 2005. Measuring Determinants of Student Return VS. Dropout/Stopout VS. Transfer: A First-to-Second Year Analysis of New Freshmen. Research in Higher Education 46, 8 (2005), 883–928. https://doi.org/10.1007/s11162-005-6933-7
  32. A latent profile analysis of undergraduates’ achievement motivations and metacognitive behaviors, and their relations to achievement in science. Journal of Educational Psychology 112, 7 (2020), 1409. https://doi.org/10.1037/edu0000445
  33. Contrasting prediction methods for early warning systems at undergraduate level. The Internet and Higher Education 37 (2018), 66–75. https://doi.org/10.1016/j.iheduc.2018.02.001
  34. Putting learning back into learning analytics: actions for policy makers, researchers, and practitioners. Educational Technology Research and Development (2020), 1–20. https://doi.org/10.1007/s11423-020-09909-8
  35. Terry T. Ishitani. 2016. Time-varying effects of academic and social integration on student persistence for first and second years in college: National data approach. Journal of College Student Retention: Research, Theory & Practice 18, 3 (2016), 263–286. https://doi.org/10.1177/1521025115622781
  36. Terry T. Ishitani and Stephen L. DesJardins. 2002. A longitudinal investigation of dropout from college in the United States. Journal of College Student Retention: Research, Theory & Practice 4, 2 (2002), 173–201. https://doi.org/10.2190/V4EN-NW42-742Q-2NTL
  37. Early Alert of Academically At-Risk Students: An Open Source Analytics Initiative. Journal of Learning Analytics 1, 1 (2014), 6–47. https://doi.org/10.18608/jla.2014.11.3
  38. A prospective investigation of students’ academic achievement and dropout in higher education: a Self-Determination Theory approach. Educational Psychology 38, 9 (2018), 1163–1184. https://doi.org/10.1080/01443410.2018.1502412
  39. Predicting student dropout: A machine learning approach. European Journal of Higher Education 10, 1 (2020), 28–47. https://doi.org/10.1080/21568235.2020.1718520
  40. Preventing student dropout in distance learning using machine learning techniques. In Knowledge-Based Intelligent Information and Engineering Systems: 7th International Conference, KES 2003, Oxford, UK, September 2003. Proceedings, Part II 7. Springer, 267–274. https://doi.org/10.1007/978-3-540-45226-3_37
  41. Zlatko J. Kovacic. 2010. Early Prediction of Student Success: Mining Students Enrolment Data. In InSITE Conference. Informing Science Institute, 647–665. https://doi.org/10.28945/1281
  42. Andy Liaw and Matthew Wiener. 2002. Classification and Regression by randomForest. R News 2, 3 (2002), 18–22. https://CRAN.R-project.org/doc/Rnews/
  43. Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression. Research in Higher Education 59, 3 (2018), 382–400. https://doi.org/10.1007/s11162-017-9473-z
  44. Using machine learning to predict student retention from socio-demographic characteristics and app-based engagement metrics. Scientific Reports 13, 1 (2023), 5705. https://doi.org/10.1038/s41598-023-32484-w
  45. e1071: Misc Functions of the Department of Statistics, Probability Theory Group (Formerly: E1071), TU Wien. https://CRAN.R-project.org/package=e1071 R package version 1.7-13.
  46. University student retention: Best time and data to identify undergraduate students at risk of dropout. Innovations in Education and Teaching International 57, 1 (2020), 74–85. https://doi.org/10.1080/14703297.2018.1502090
  47. A practical evaluation of Web analytics. Internet Research 14, 4 (2004), 284–293. https://doi.org/10.1108/10662240410555306
  48. L. Reisel and I. Brekke. 2010. Minority Dropout in Higher Education: A Comparison of the United States and Norway Using Competing Risk Event History Analysis. European Sociological Review 26, 6 (2010), 691–712. https://doi.org/10.1093/esr/jcp045
  49. Ido Roll and Philip H. Winne. 2015. Understanding, evaluating, and supporting self-regulated learning using learning analytics. Journal of Learning Analytics 2, 1 (2015), 7–12. https://doi.org/10.18608/jla.2015.21.2
  50. Note to first-year university students: Just do it! In the end, the fact that you study may be more important than how you study. Uniped 46, 1 (2023), 28–42. https://doi.org/10.18261/uniped.46.1.4
  51. Student retention using educational data mining and predictive analytics: a systematic literature review. IEEE Access (2022), 72480 – 503. https://doi.org/10.1109/access.2022.3188767
  52. Perspectives to predict dropout in university students with machine learning. In 2018 IEEE International Work Conference on Bioinspired Intelligence (IWOBI). IEEE, 1–6. https://doi.org/10.1109/iwobi.2018.8464191
  53. Predictors of categorical at-risk high school dropouts. Journal of Counseling & Development 85, 2 (2007), 196–203. https://doi.org/10.1002/j.1556-6678.2007.tb00463.x
  54. Hasan Tanvir and Irene-Angelica Chounta. 2021. Exploring the Importance of Factors Contributing to Dropouts in Higher Education over Time. In Proceedings of The 14th International Conference on Educational Data Mining. 502–509.
  55. Vincent Tinto. 1975. Dropout from Higher Education: A Theoretical Synthesis of Recent Research. Review of Educational Research 45, 1 (1975), 89–125. https://doi.org/10.3102/00346543045001089
  56. Stef van Buuren and Karin Groothuis-Oudshoorn. 2011. mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software 45, 3 (2011), 1–67. https://doi.org/10.18637/jss.v045.i03
  57. W. N. Venables and B. D. Ripley. 2002. Modern Applied Statistics with S. Springer New York. https://doi.org/10.1007/978-0-387-21706-2
  58. Accuracy of a cross-program model for dropout prediction in higher education. In Companion Proceedings of the 10th International Learning Analytics & Knowledge Conference (LAK 2020). 744–749.
  59. Should College Dropout Prediction Models Include Protected Attributes?. In Proceedings of the Eighth ACM Conference on Learning @ Scale. ACM, New York, NY, USA, 91–100. https://doi.org/10.1145/3430895.3460139
  60. Identifying Longitudinal Attendance Patterns through Student Subpopulation Distribution Comparison. In Proceedings of the 15th International Conference on Educational Data Mining. 640. https://doi.org/10.5281/zenodo.6853034
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Dominik Glandorf (2 papers)
  2. Hye Rin Lee (1 paper)
  3. Gabe Avakian Orona (1 paper)
  4. Marina Pumptow (1 paper)
  5. Renzhe Yu (12 papers)
  6. Christian Fischer (16 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets