Moderating effects of pre-trained and zero-shot ML on automatic scoring performance
Determine how pre-trained language models (e.g., BERT, fine-tuned GPT-3.5) and zero-shot learning approaches (e.g., Matching Exemplar as Next Sentence Prediction, MeNSP) moderate the performance of machine learning-based automatic scoring systems for science assessments, specifically in terms of machine-human score agreements and scoring accuracy across tasks and contexts.
References
Although several technical features have been examined by Zhai, Shi et al. (2021), the most updated ML, such as pre-trained or zero-shot approaches – have not been thoroughly investigated. As such, little is currently known about how the most updated ML algorithms moderate machine-based assessment performance.
                — AI and Machine Learning for Next Generation Science Assessments
                
                (2405.06660 - Zhai, 23 Apr 2024) in Section: A Framework Accounting for Automatic Scoring Accuracy