Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Google Scholar, Web of Science, and Scopus: a systematic comparison of citations in 252 subject categories (1808.05053v4)

Published 15 Aug 2018 in cs.DL

Abstract: Despite citation counts from Google Scholar (GS), Web of Science (WoS), and Scopus being widely consulted by researchers and sometimes used in research evaluations, there is no recent or systematic evidence about the differences between them. In response, this paper investigates 2,448,055 citations to 2,299 English-language highly-cited documents from 252 GS subject categories published in 2006, comparing GS, the WoS Core Collection, and Scopus. GS consistently found the largest percentage of citations across all areas (93%-96%), far ahead of Scopus (35%-77%) and WoS (27%-73%). GS found nearly all the WoS (95%) and Scopus (92%) citations. Most citations found only by GS were from non-journal sources (48%-65%), including theses, books, conference papers, and unpublished materials. Many were non-English (19%-38%), and they tended to be much less cited than citing sources that were also in Scopus or WoS. Despite the many unique GS citing sources, Spearman correlations between citation counts in GS and WoS or Scopus are high (0.78-0.99). They are lower in the Humanities, and lower between GS and WoS than between GS and Scopus. The results suggest that in all areas GS citation data is essentially a superset of WoS and Scopus, with substantial extra coverage.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
Citations (1,219)

Summary

Systematic Comparison of Citation Databases: Google Scholar, Web of Science, and Scopus

Introduction

The systematic comparison of citation counts among Google Scholar (GS), Web of Science (WoS), and Scopus has surfaced as an essential topic within bibliometrics and scientometrics. The breadth and depth of coverage of these databases often drive decisions regarding research evaluations and academic assessments. In their paper, Martín-Martín et al. investigate the disparities and overlaps in citations found by these three major bibliographic databases, providing insights into the comprehensiveness and utility of each for academic purposes.

Study Overview

Using a sizeable dataset comprising 2,448,055 citations to 2,299 highly-cited documents across 252 subject categories from Google Scholar Classic Papers (GSCP), the authors conducted an extensive comparative analysis between GS, WoS, and Scopus. The selected sample spanned multiple disciplines and included only English-language documents published in 2006. Key aspects addressed include the overlap in citations, the types of citing documents, language distribution, and correlation in citation counts across the databases.

Key Findings

Citation Overlap and Coverage

Google Scholar exhibited the highest coverage, uncovering 93%-96% of all citations within the dataset, distinctly ahead of both Scopus (35%-77%) and WoS (27%-73%). GS proved to be a near superset, capturing 95% and 92% of citations found by WoS and Scopus respectively. GS’s extensive crawling of non-journal sources, including theses, books, conference papers, and unpublished materials, largely contributed to this comprehensive coverage. These non-journal sources accounted for 48%-65% of unique GS citations. Despite this expansive reach, citation overlaps between GS and the other databases varied by subject, with higher overlaps in the natural sciences and lower in the humanities and social sciences.

Document Types and Languages of Unique Citations

The majority of non-unique citations across databases were journal publications, highlighting a significant uniformity in traditional academic publishing. However, GS’s unique citations included a high percentage of theses, books, conference papers, and unpublished documents not indexed by WoS or Scopus. Notably, language analysis revealed that 19%-38% of GS’s unique citations were non-English, demonstrating GS's broader linguistic inclusivity.

Citation Counts Correlation

High Spearman correlations (0.78-0.99) between GS and WoS/Scopus citation counts suggest substantial consistency in citation rankings across databases, despite GS’s broader citation base. These correlations diminished slightly in humanities disciplines but generally indicated that while GS captures more citations, the relative scholarly impact indicated by citation rankings remains fairly stable across databases.

Implications for Research Evaluation

The implications of these findings on research evaluation are multifaceted:

  • Inclusivity of Citation Sources: GS’s capability to capture a wider array of document types and languages presents a more holistic view of academic influence, particularly for disciplines or regions underrepresented in WoS and Scopus.
  • Citation Analysis Robustness: Given the high correlation in citation counts, GS could be a valuable tool for large-scale citation analysis, providing robust results akin to those obtained from WoS and Scopus.
  • Data Accessibility and Use: Despite its comprehensive coverage, the use of GS for systematic citation analysis faces obstacles due to limited metadata and bulk data extraction restrictions. This hinders its suitability for large-scale automated studies compared to the more accessible WoS and Scopus datasets.

Future Research Directions

Further developments in AI and natural language processing could enhance the extraction and processing of GS data, making it a more practical tool for comprehensive bibliometric studies. Additionally, examining the qualitative differences between unique citations found in GS versus those in more established databases could provide deeper insights into the academic impact of various free-access publications, preprints, and non-traditional academic outputs.

Conclusion

Martín-Martín et al.'s paper underscores Google Scholar’s dominance in citation coverage, outperforming WoS and Scopus across disciplines. While GS encompasses a wider array of citation sources, the practical challenges of data extraction and the variability in citation types merit careful consideration when conducting citation-based research evaluations. The paper’s findings reinforce the need for diversified citation databases to ensure comprehensive and equitable academic assessments.