Cause of near-maximal alignment between ChatGPT and departmental REF correlations
Determine the underlying causes for cases in which the Spearman correlation between ChatGPT 4o-mini’s article-level quality scores (derived from titles and abstracts) and departmental average REF2021 scores is implausibly close to the estimated Spearman correlation between actual individual article REF scores and departmental averages; assess whether this closeness is driven by content-based evaluation of abstracts, department-linked metadata, or other field-specific factors.
References
It is not clear why the ChatGPT correlations with departmental averages were sometimes implausibly close to the estimated correlations between article scores and departmental averages (Figure 1).
— In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results
(2409.16695 - Thelwall et al., 25 Sep 2024) in Section: High ChatGPT correlation with departmental average scores compared to estimated correlation between article scores and departmental averages (Figure 5 discussion)