Did ChatGPT leverage public REF2021 departmental score profiles when assigning quality scores?

Establish whether ChatGPT 4o-mini, when assigning article-level quality scores from titles and abstracts, leverages public REF2021 departmental score profiles or other external institutional information, and develop empirical tests or auditing methods to detect or rule out such behavior.

Background

Because REF departmental score profiles are publicly available, a potential alternative explanation for positive correlations is that ChatGPT might indirectly use institutional information rather than solely evaluating abstract content.

Although the outputs did not explicitly reference institutional quality, the authors cannot rule out the possibility that ChatGPT connected articles to institutions and their REF performance; hence, a method is needed to determine whether external metadata influences the scores.

References

Recall that, for the current study, it couldn't be shown that ChatGPT did not cheat by leveraging indirect information about departmental REF score profiles when assigning quality scores.

In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results (2409.16695 - Thelwall et al., 25 Sep 2024) in Discussion