Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 84 tok/s
Gemini 2.5 Pro 37 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 86 tok/s Pro
GPT OSS 120B 468 tok/s Pro
Kimi K2 229 tok/s Pro
2000 character limit reached

Validating and monitoring bibliographic and citation data in OpenCitations collections (2504.12195v1)

Published 16 Apr 2025 in cs.DL

Abstract: Purpose. The increasing emphasis on data quantity in research infrastructures has highlighted the need for equally robust mechanisms ensuring data quality, particularly in bibliographic and citation datasets. This paper addresses the challenge of maintaining high-quality open research information within OpenCitations, a community-guided Open Science Infrastructure, by introducing tools for validating and monitoring bibliographic metadata and citation data. Methods. We developed a custom validation tool tailored to the OpenCitations Data Model (OCDM), designed to detect and explain ingestion errors from heterogeneous sources, whether due to upstream data inconsistencies or internal software bugs. Additionally, a quality monitoring tool was created to track known data issues post-publication. These tools were applied in two scenarios: (1) validating metadata and citations from Matilda, a potential future source, and (2) monitoring data quality in the existing OpenCitations Meta dataset. Results. The validation tool successfully identified a variety of structural and semantic issues in the Matilda dataset, demonstrating its precision. The monitoring tool enabled the detection of recurring problems in the OpenCitations Meta collection, as well as their quantification. Together, these tools proved effective in enhancing the reliability of OpenCitations' published data. Conclusion. The presented validation and monitoring tools represent a step toward ensuring high-quality bibliographic data in open research infrastructures, though they are limited to the data model adopted by OpenCitations. Future developments are aimed at expanding to additional data sources, with particular regard to crowdsourced data.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.