Analysis of Differences in Coverage and Citation Linking in Major Bibliographic Data Sources
The paper discusses a comparative analysis of five major multidiscipline bibliographic data sources: Scopus, Web of Science (WoS), Dimensions, Crossref, and Microsoft Academic. The paper focuses on the period from 2008 to 2017 and aims to identify variations in document coverage across these platforms, as well as differences in the completeness and accuracy of the citation links they provide. Consistently using Scopus as the baseline for the analysis, the paper supports researchers in understanding which data source might be best suited for their particular analytic purposes.
The analysis first considers the breadth of document coverage in each dataset. Microsoft Academic demonstrates the most extensive coverage, significantly surpassing the other sources. However, a closer inspection indicates that Microsoft Academic also includes non-scientific content, though the proportion is relatively small. When examining strictly scientific content, Microsoft Academic, followed by Dimensions and Crossref, covers a wider array than Scopus or WoS, with notable inclusions of more book chapters and proceedings papers that might be otherwise omitted. On the other hand, WoS takes the lead in selectivity, offering focused curation, which could be beneficial for specialized academic inquiries.
With respect to the citation links, Scopus and WoS offer more accurate citation data than Dimensions and Microsoft Academic. This is attributed to the more stringent data curation policies employed by the former two databases which enhance the quality of citation linking reliability, but at the cost of potentially narrower document inclusivity. Such leaning towards curatorial precision is illustrated by the presence of discrepancies in citation link coverage, where Microsoft Academic and Dimensions suffer from a noticeable absence of many citation links that are present in Scopus and WoS. Part of this stems from Microsoft Academic and Dimensions not systematically recording references that haven’t been directly matched to a cited document.
From a broader research perspective, the implications of this paper are multi-faceted. In practical terms, researchers are provided with data to help decide which resource strikes the desired balance between comprehensive literature coverage and rigorous citation verification for their work. Moreover, the findings suggest that Microsoft Academic and Dimensions may serve as alternative sources for emerging areas or regions where comprehensive coverage is more valuable than citation precision.
Theoretically, the paper highlights the inherent trade-offs in data source selection, pointing towards the potential utility of combining data sources to achieve both comprehensiveness and citation precision. Importantly, the development of effective filters on data like those suggested in Dimensions could further bridge the gap between in-depth coverage and selective precision. Future advancements may revolve around the integration of citations from open access initiatives into these platforms, enhancing citation link accuracy without sacrificing document diversity.
In conclusion, this paper illuminates the distinctive features and inherent trade-offs among these major bibliographic data sources. Researchers are encouraged to carefully consider these factors based on the specific demands of their bibliometric analyses and academic inquiries. Moving forward, the field could benefit from increased transparency and accessibility, possibly driven by initiatives aimed at open scholarly data exchange, which would enhance the versatility and applicability of these datasets.