Systematic Comparison of Citation Databases: Google Scholar, Web of Science, and Scopus
Introduction
The systematic comparison of citation counts among Google Scholar (GS), Web of Science (WoS), and Scopus has surfaced as an essential topic within bibliometrics and scientometrics. The breadth and depth of coverage of these databases often drive decisions regarding research evaluations and academic assessments. In their paper, Martín-Martín et al. investigate the disparities and overlaps in citations found by these three major bibliographic databases, providing insights into the comprehensiveness and utility of each for academic purposes.
Study Overview
Using a sizeable dataset comprising 2,448,055 citations to 2,299 highly-cited documents across 252 subject categories from Google Scholar Classic Papers (GSCP), the authors conducted an extensive comparative analysis between GS, WoS, and Scopus. The selected sample spanned multiple disciplines and included only English-language documents published in 2006. Key aspects addressed include the overlap in citations, the types of citing documents, language distribution, and correlation in citation counts across the databases.
Key Findings
Citation Overlap and Coverage
Google Scholar exhibited the highest coverage, uncovering 93%-96% of all citations within the dataset, distinctly ahead of both Scopus (35%-77%) and WoS (27%-73%). GS proved to be a near superset, capturing 95% and 92% of citations found by WoS and Scopus respectively. GS’s extensive crawling of non-journal sources, including theses, books, conference papers, and unpublished materials, largely contributed to this comprehensive coverage. These non-journal sources accounted for 48%-65% of unique GS citations. Despite this expansive reach, citation overlaps between GS and the other databases varied by subject, with higher overlaps in the natural sciences and lower in the humanities and social sciences.
Document Types and Languages of Unique Citations
The majority of non-unique citations across databases were journal publications, highlighting a significant uniformity in traditional academic publishing. However, GS’s unique citations included a high percentage of theses, books, conference papers, and unpublished documents not indexed by WoS or Scopus. Notably, language analysis revealed that 19%-38% of GS’s unique citations were non-English, demonstrating GS's broader linguistic inclusivity.
Citation Counts Correlation
High Spearman correlations (0.78-0.99) between GS and WoS/Scopus citation counts suggest substantial consistency in citation rankings across databases, despite GS’s broader citation base. These correlations diminished slightly in humanities disciplines but generally indicated that while GS captures more citations, the relative scholarly impact indicated by citation rankings remains fairly stable across databases.
Implications for Research Evaluation
The implications of these findings on research evaluation are multifaceted:
- Inclusivity of Citation Sources: GS’s capability to capture a wider array of document types and languages presents a more holistic view of academic influence, particularly for disciplines or regions underrepresented in WoS and Scopus.
- Citation Analysis Robustness: Given the high correlation in citation counts, GS could be a valuable tool for large-scale citation analysis, providing robust results akin to those obtained from WoS and Scopus.
- Data Accessibility and Use: Despite its comprehensive coverage, the use of GS for systematic citation analysis faces obstacles due to limited metadata and bulk data extraction restrictions. This hinders its suitability for large-scale automated studies compared to the more accessible WoS and Scopus datasets.
Future Research Directions
Further developments in AI and natural language processing could enhance the extraction and processing of GS data, making it a more practical tool for comprehensive bibliometric studies. Additionally, examining the qualitative differences between unique citations found in GS versus those in more established databases could provide deeper insights into the academic impact of various free-access publications, preprints, and non-traditional academic outputs.
Conclusion
Martín-Martín et al.'s paper underscores Google Scholar’s dominance in citation coverage, outperforming WoS and Scopus across disciplines. While GS encompasses a wider array of citation sources, the practical challenges of data extraction and the variability in citation types merit careful consideration when conducting citation-based research evaluations. The paper’s findings reinforce the need for diversified citation databases to ensure comprehensive and equitable academic assessments.