Google Scholar is manipulatable

Published 7 Feb 2024 in cs.CE, cs.DL, cs.SI, and physics.soc-ph | (2402.04607v1)

Abstract: Citations are widely considered in scientists' evaluation. As such, scientists may be incentivized to inflate their citation counts. While previous literature has examined self-citations and citation cartels, it remains unclear whether scientists can purchase citations. Here, we compile a dataset of ~1.6 million profiles on Google Scholar to examine instances of citation fraud on the platform. We survey faculty at highly-ranked universities, and confirm that Google Scholar is widely used when evaluating scientists. Intrigued by a citation-boosting service that we unravelled during our investigation, we contacted the service while undercover as a fictional author, and managed to purchase 50 citations. These findings provide conclusive evidence that citations can be bought in bulk, and highlight the need to look beyond citation counts.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (4)

View on Semantic Scholar

Summary

The paper demonstrates that Google Scholar's citation metrics are vulnerable to manipulation through practices like citation purchasing and non-peer-reviewed sources.
It analyzed over 1.6 million profiles to uncover suspicious citation behaviors and anomalies linked to external platforms.
The study highlights the need for robust moderation and new evaluation indices to uphold academic integrity in research assessments.

Analysis of Citation Manipulation on Google Scholar

The research paper titled "Google Scholar is manipulatable" by Hazem Ibrahim, Fengyuan Liu, Yasir Zaki, and Talal Rahwan presents an in-depth investigation into the manipulability of citation metrics on Google Scholar. This work thoroughly examines the integrity of academic citation counts, emphasizing the potential for malfeasance through a practice identified as "citation purchasing."

Key Findings

The paper presents several critical findings:

Survey Results on Citation Usage:
- A survey conducted among faculty members across top-ranked universities revealed that Google Scholar is the predominant source of citation metrics, surpassing other databases combined.
- Over 60% of the surveyed faculty members reported using Google Scholar for citation information, which highlights the critical reliance on this platform in scientific evaluations.
Identification of Suspicious Authors:
- By analyzing over 1.6 million Google Scholar profiles, the authors identified patterns indicative of citation manipulation, specifically evidently inflated citations that could not be solely attributed to self-citations or citation cartels.
- Five authors with highly irregular citation patterns were further scrutinized and found to exhibit significant anomalies suggestive of purchased citations.
Evaluation of the Citation Manipulation Mechanism:
- The study identified the prevalent use of non-peer-reviewed sources for manipulating citations. It demonstrated that citations can be obtained through pre-print servers like ResearchGate, Authorea, and OSF.
- By creating a fictional author profile and generating AI-synthesized research articles, the authors illustrated the ease of inflating citation counts through such pre-print servers.
- Citation continuity even after the deletion of hosted articles on pre-print servers was noted, emphasizing the inadequacies in Google Scholar’s moderation.
Citation Purchasing:
- The authors uncovered a citation-boosting service that sold citations in bulk. By engaging with this service, they purchased 50 citations for a fictional author, thereby providing direct evidence of the feasibility of buying citations.
- Analysis of the purchased citations revealed instances where up to 90% of references in a manuscript did not appear in the main text, suggesting the planted nature of these references.

Implications and Future Directions

The implications of this research are both practical and theoretical, extending beyond the particular case of Google Scholar:

Practical Implications:
- Research Evaluation: The findings question the reliance on citation metrics for research evaluation, a practice prevalent in hiring, promotion, and funding decisions. It underscores the need for more robust evaluation criteria that go beyond simplistic quantitative metrics.
- Database Moderation: There is an exigent need for better moderation practices on platforms like Google Scholar. Enhanced scrutiny of citation sources and algorithms capable of identifying anomalous citation patterns are imperative.
Theoretical Implications:
- Scientometrics: This study contributes to the field of scientometrics by proposing novel indices such as the citation concentration index ( $c^2$ -index) and adjusted $c^2$ -index, which can serve as indicators of potential citation manipulation.
- Academic Integrity: The work also highlights the broader issue of academic integrity and the susceptibility of bibliometric databases to novel forms of abuse facilitated by technological advancements.

Speculation on Future Developments in AI

The research suggests that with the advancement of AI technologies, the potential for generating realistic but fraudulent research articles will only increase. Future developments may need to focus on creating AI tools for detecting such fraudulent activities. This includes:

Developing robust AI-driven systems capable of identifying patterns of citation manipulation.
Enhancing the scrutiny in AI-generated content on pre-print servers and peer-reviewed journals to mitigate the risks of citation and publication fraud.

Conclusion

This paper provides substantial evidence of citation manipulation on Google Scholar, with significant implications for the academic community. By demonstrating vulnerabilities and proposing new metrics for detecting suspicious activities, the authors call for a reevaluation of how citations are used in assessing scholarly impact. The study underscores the importance of integrity in academic publishing, an issue that is becoming increasingly complex with the advent of advanced AI technologies. Future efforts should augment the transparency and reliability of bibliometric databases to uphold the credibility of scientific research and evaluations.

Markdown Report Issue