Can LLMs detect originality, rigour, and impact across fields?

Evaluate the capability, accuracy, and biases of large language models to detect and assess originality, methodological rigour, and impact in journal articles across diverse disciplines, relative to expert judgements.

Background

Traditional machine learning using bibliometrics has shown some promise for predicting article quality, especially in certain fields. Whether LLMs can meaningfully appraise key quality dimensions from text remains unclear and may depend on complex field norms and tacit knowledge.

Robust evaluation requires field-stratified benchmarks and careful bias analysis.

References

It is not clear whether LLMs could be reasonably effective at detecting originality, rigour, or impact in any or all fields.

— Quantitative Methods in Research Evaluation Citation Indicators, Altmetrics, and Artificial Intelligence (2407.00135 - Thelwall, 28 Jun 2024) in Section 13.4

Can LLMs detect originality, rigour, and impact across fields?

Sponsor

Background

References

Related Problems