Susceptibility of ChatGPT’s research quality estimates to training-data gaming

Determine the extent to which ChatGPT 4o-mini’s research quality scores can be manipulated through alterations to its training data (e.g., targeted web content injection) that bias scores for specific articles, approaches, or institutions, and develop detection and mitigation strategies.

Background

The paper highlights a lack of transparency in LLM training data, raising the concern that malicious or strategic content uploads could bias ChatGPT’s evaluations.

Quantifying and mitigating this potential for gaming is critical before considering ChatGPT-derived quality indicators in research assessment contexts.

References

Moreover, the extent to which ChatGPT can be gamed through its training data to inflate or deflate article scores is unknown.

— In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results (2409.16695 - Thelwall et al., 2024) in Conclusion

Susceptibility of ChatGPT’s research quality estimates to training-data gaming

Background

References

Related Problems