Whether ChatGPT directly detects technical excellence in UoA 4 (Psychology, Psychiatry and Neuroscience)

Ascertain whether ChatGPT 4o-mini’s quality scoring within REF2021 Unit of Assessment 4 directly reflects detection of the technical excellence of neuroscience article abstracts (e.g., those from Imperial College London’s Division of Brain Sciences) or is instead influenced by non-content cues such as claim definiteness, abstract style, or institutional characteristics.

Background

UoA 4 exhibited the highest reported correlation between ChatGPT scores and departmental averages. The top-scoring department’s outputs were predominantly technical neuroscience articles from Imperial College London, contrasted with more social-science-oriented contributions from lower-scoring institutions.

The authors note that although these neuroscience articles appear exceptionally impressive to a non-expert, it remains uncertain whether ChatGPT is directly detecting this technical impressiveness versus relying on other cues present in abstracts.

References

At an inexpert subjective level, the ICL articles look extremely impressive, but it is not clear that ChatGPT is directly detecting this.

— In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results (2409.16695 - Thelwall et al., 25 Sep 2024) in Section: Reasons for high correlations (UoA 4 case analysis)

Whether ChatGPT directly detects technical excellence in UoA 4 (Psychology, Psychiatry and Neuroscience)

Sponsor

Background

References

Related Problems