Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus (2401.16359v3)

Published 29 Jan 2024 in cs.DL

Abstract: OpenAlex is a promising open source of scholarly metadata, and competitor to established proprietary sources, such as the Web of Science and Scopus. As OpenAlex provides its data freely and openly, it permits researchers to perform bibliometric studies that can be reproduced in the community without licensing barriers. However, as OpenAlex is a rapidly evolving source and the data contained within is expanding and also quickly changing, the question naturally arises as to the trustworthiness of its data. In this report, we will study the reference coverage and selected metadata within each database and compare them with each other to help address this open question in bibliometrics. In our large-scale study, we demonstrate that, when restricted to a cleaned dataset of 16.8 million recent publications shared by all three databases, OpenAlex has average source reference numbers and internal coverage rates comparable to both Web of Science and Scopus. We further analyse the metadata in OpenAlex, the Web of Science and Scopus by journal, finding a similarity in the distribution of source reference counts in the Web of Science and Scopus as compared to OpenAlex. We also demonstrate that the comparison of other core metadata covered by OpenAlex shows mixed results when broken down by journal, capturing more ORCID identifiers, fewer abstracts and a similar number of Open Access status indicators per article when compared to both the Web of Science and Scopus.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (19)
  1. OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts, June 2022.
  2. Comparison of metadata with relevance for bibliometrics between Microsoft Academic Graph and OpenAlex until 2020. 2022. arXiv:2206.14168 [cs].
  3. New trends in bibliometric APIs: A comparative analysis. Information Processing & Management, 60(4):103385, July 2023.
  4. Global flows and rates of international migration of scholars. MPIDR Working Papers WP-2023-018, Max Planck Institute for Demographic Research, Rostock, Germany, 04 2023.
  5. Google scholar, microsoft academic, scopus, dimensions, web of science, and OpenCitations’ COCI: a multidisciplinary comparison of coverage via citations. Scientometrics, 126(1):871–906.
  6. Large-scale comparison of bibliographic data sources: Scopus, web of science, dimensions, crossref, and microsoft academic. Quantitative Science Studies, 2(1):20–41.
  7. The Journal Coverage of Web of Science, Scopus and Dimensions: A Comparative Analysis. Scientometrics, 126(6):5113–5142, 2021.
  8. Quantitative comparison of publication metadata in eight free-access databases.
  9. Henk F Moed. Citation analysis in research evaluation. Springer, 2005.
  10. Anthony van Raan. Measuring science: Basic principles and application of advanced bibliometrics. In Wolfgang Glänzel, Henk F. Moed, Ulrich Schmoch, and Mike Thelwall, editors, Springer Handbook of Science and Technology Indicators, pages 237–280. Springer International Publishing.
  11. Nees Jan van Eck and Ludo Waltman. Crossref as a source of open bibliographic metadata. September 2023.
  12. Orcid: a system to uniquely identify researchers. Learned Publishing, 25(4):259–264, 2012.
  13. Funding covid-19 research: Insights from an exploratory analysis using open data infrastructures. Quantitative Science Studies, 3(3):560–582, 2022.
  14. The availability and completeness of open funder metadata: Case study for publications funded by the dutch research council. Quantitative Science Studies, 3(3):583–599, 2022.
  15. Holly Else. How unpaywall is transforming open science. Nature, 560(7718):290–291, August 2018.
  16. Errors in DOI indexing by bibliometric databases. Scientometrics, 102:2181–2186, 2015.
  17. Bianca Kramer. I4oa hall of fame - 2023 edition. Crossref Blog, 2024.
  18. Evaluation of the citation matching algorithms of cwts and ifq in comparison to the web of science. Journal of the Association for Information Science and Technology, 67(10):2550–2564, OCT 2016.
  19. Janusz Hołyst and et al. Protect our environment from information overload. 2024.
Citations (9)

Summary

  • The paper finds that OpenAlex exhibits competitive reference coverage, outperforming WoS and Scopus for recent publications (2015-2022) in a shared dataset of over 16 million records.
  • It employs quantitative analysis using DOI-matched publications to robustly compare reference and metadata details across the three databases.
  • The findings underscore OpenAlex’s potential as a viable open bibliometric tool while also revealing challenges in abstract and ORCID metadata accuracy.

Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus

The paper "Reference Coverage Analysis of OpenAlex compared to Web of Science and Scopus" provides a comprehensive empirical investigation of OpenAlex, a scholarly metadata source, in comparison with two established proprietary databases, Web of Science (WoS) and Scopus. The authors aim to assess the efficacy of OpenAlex in terms of reference and metadata coverage, ultimately determining its viability as an open and free alternative for bibliometric research.

Key Findings

The paper is centered around a quantitative analysis of reference and metadata coverage across the three databases. A crucial aspect of this research lies in comparing the average source reference counts from a cleaned dataset comprising 16,788,282 common publications between the platforms. Remarkably, when observing the shared dataset, OpenAlex demonstrates comparable, if not superior, performance in reference count to both WoS and Scopus, particularly for recent publications from 2015 to 2022.

Reference Coverage Findings:

  • Average Source Reference Count: OpenAlex achieved a higher average source reference count in the shared dataset compared to both WoS and Scopus. However, this advantage diminishes when restricting to publications with references published between 1996 and 2022, where Scopus slightly outperforms OpenAlex.
  • Internal Coverage: Although OpenAlex shows a significant corpus size, its internal coverage does not surpass that of Scopus, highlighting possible limitations inherent to its current reference-matching approach.

Metadata Coverage Findings:

  • Abstract Availability: The comparative analysis reveals that OpenAlex lags in abstract coverage behind WoS and Scopus, with WoS and Scopus showing over 92% coverage for articles, versus 87% for OpenAlex.
  • ORCID Identifier Coverage: OpenAlex excels in ORCID coverage, claiming over 92% of articles with at least one ORCID present. However, the paper uncovers that this high coverage might be inflated due to generous, yet potentially erroneous, author disambiguation practices.
  • Open Access Information: The Open Access status available in OpenAlex is slightly better than WoS and Scopus, possibly indicating a lag in update cycles for WoS and Scopus via Unpaywall integration, although all databases hover around 49% coverage.

Data and Methodological Considerations

The paper utilizes a comprehensive methodology by creating a Shared Corpus based on DOI matches to ensure fair cross-comparison between OpenAlex, WoS, and Scopus. The research identifies discrepancies in reported reference numbers, specifically in Scopus and OpenAlex, attributing these to data ingestion irregularities and the presence of deleted source references in OpenAlex.

Implications and Future Prospects

This research offers evidence suggesting that OpenAlex is a viable open-source alternative for extensive bibliometric analysis. It provides a significant level of reference coverage that is competitive with proprietary databases, particularly valuable for researchers and institutions with restrictions on accessing closed data sources.

From a theoretical perspective, the paper propels discussions on the importance of thoroughly understanding the limitations and strengths of open bibliometric data sources. Practically, the findings underscore the importance of continuing development and refinement in OpenAlex to resolve identified discrepancies and optimize metadata richness and accuracy.

Future research could further delve into refining reference matching algorithms and enhancing data integrity, particularly considering the diverse array of research publications covered by OpenAlex. Additionally, addressing issues in ORCID assignment accuracy could enhance the reliability of author disambiguation practices within OpenAlex, fortifying its standing as a critical resource in the context of reproducible bibliometrics.

In sum, this comparative paper positions OpenAlex as a substantial competitor in the landscape of scholarly metadata databases, with promising avenues for contributing to the evolution of open science.