Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large-scale comparison of bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic (2005.10732v2)

Published 21 May 2020 in cs.DL

Abstract: We present a large-scale comparison of five multidisciplinary bibliographic data sources: Scopus, Web of Science, Dimensions, Crossref, and Microsoft Academic. The comparison considers scientific documents from the period 2008-2017 covered by these data sources. Scopus is compared in a pairwise manner with each of the other data sources. We first analyze differences between the data sources in the coverage of documents, focusing for instance on differences over time, differences per document type, and differences per discipline. We then study differences in the completeness and accuracy of citation links. Based on our analysis, we discuss strengths and weaknesses of the different data sources. We emphasize the importance of combining a comprehensive coverage of the scientific literature with a flexible set of filters for making selections of the literature.

Analysis of Differences in Coverage and Citation Linking in Major Bibliographic Data Sources

The paper discusses a comparative analysis of five major multidiscipline bibliographic data sources: Scopus, Web of Science (WoS), Dimensions, Crossref, and Microsoft Academic. The paper focuses on the period from 2008 to 2017 and aims to identify variations in document coverage across these platforms, as well as differences in the completeness and accuracy of the citation links they provide. Consistently using Scopus as the baseline for the analysis, the paper supports researchers in understanding which data source might be best suited for their particular analytic purposes.

The analysis first considers the breadth of document coverage in each dataset. Microsoft Academic demonstrates the most extensive coverage, significantly surpassing the other sources. However, a closer inspection indicates that Microsoft Academic also includes non-scientific content, though the proportion is relatively small. When examining strictly scientific content, Microsoft Academic, followed by Dimensions and Crossref, covers a wider array than Scopus or WoS, with notable inclusions of more book chapters and proceedings papers that might be otherwise omitted. On the other hand, WoS takes the lead in selectivity, offering focused curation, which could be beneficial for specialized academic inquiries.

With respect to the citation links, Scopus and WoS offer more accurate citation data than Dimensions and Microsoft Academic. This is attributed to the more stringent data curation policies employed by the former two databases which enhance the quality of citation linking reliability, but at the cost of potentially narrower document inclusivity. Such leaning towards curatorial precision is illustrated by the presence of discrepancies in citation link coverage, where Microsoft Academic and Dimensions suffer from a noticeable absence of many citation links that are present in Scopus and WoS. Part of this stems from Microsoft Academic and Dimensions not systematically recording references that haven’t been directly matched to a cited document.

From a broader research perspective, the implications of this paper are multi-faceted. In practical terms, researchers are provided with data to help decide which resource strikes the desired balance between comprehensive literature coverage and rigorous citation verification for their work. Moreover, the findings suggest that Microsoft Academic and Dimensions may serve as alternative sources for emerging areas or regions where comprehensive coverage is more valuable than citation precision.

Theoretically, the paper highlights the inherent trade-offs in data source selection, pointing towards the potential utility of combining data sources to achieve both comprehensiveness and citation precision. Importantly, the development of effective filters on data like those suggested in Dimensions could further bridge the gap between in-depth coverage and selective precision. Future advancements may revolve around the integration of citations from open access initiatives into these platforms, enhancing citation link accuracy without sacrificing document diversity.

In conclusion, this paper illuminates the distinctive features and inherent trade-offs among these major bibliographic data sources. Researchers are encouraged to carefully consider these factors based on the specific demands of their bibliometric analyses and academic inquiries. Moving forward, the field could benefit from increased transparency and accessibility, possibly driven by initiatives aimed at open scholarly data exchange, which would enhance the versatility and applicability of these datasets.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Martijn Visser (2 papers)
  2. Nees Jan van Eck (43 papers)
  3. Ludo Waltman (58 papers)
Citations (442)