Augmenting traditional REF score prediction with ChatGPT

Determine whether ChatGPT can augment the traditional machine learning approach for predicting Research Excellence Framework (REF) 2021 output quality scores—specifically the metadata- and citation-based model of Thelwall et al. (2023a)—by (i) providing predictions for articles for which the machine learning model reports low confidence in its score and (ii) providing predictions for Units of Assessment in which the traditional approach does not work at all.

Background

The paper evaluates ChatGPT-4’s ability to score research outputs under REF 2021 criteria and finds only moderate correlation with the author’s judgments, with evidence that averaging multiple runs improves stability. Prior work by Thelwall et al. (2023a) shows that a traditional machine learning model using citation data and metadata (not full text) can predict REF scores with higher correlations in some Units of Assessment (UoAs).

Given these complementary strengths and limitations, the authors explicitly note that it is not known whether ChatGPT could be used alongside the traditional model, for example to handle low-confidence cases or UoAs where the traditional model performs poorly. Establishing whether such hybridization is effective remains unresolved.

References

It is not clear whether ChatGPT could augment the traditional machine learning approach, for example by providing score predictions for articles that the machine learning reports a low confidence in its score or for UoAs where the traditional approach does not work at all.

— Can ChatGPT evaluate research quality? (2402.05519 - Thelwall, 8 Feb 2024) in Section 5.3 (Potential applications)

Augmenting traditional REF score prediction with ChatGPT

Sponsor

Background

References

Related Problems