Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph (2404.08443v1)
Abstract: Search engines these days can serve datasets as search results. Datasets get picked up by search technologies based on structured descriptions on their official web pages, informed by metadata ontologies such as the Dataset content type of schema.org. Despite this promotion of the content type dataset as a first-class citizen of search results, a vast proportion of datasets, particularly research datasets, still need to be made discoverable and, therefore, largely remain unused. This is due to the sheer volume of datasets released every day and the inability of metadata to reflect a dataset's content and context accurately. This work seeks to improve this situation for a specific class of datasets, namely research datasets, which are the result of research endeavors and are accompanied by a scholarly publication. We propose the ORKG-Dataset content type, a specialized branch of the Open Research Knowledge Graoh (ORKG) platform, which provides descriptive information and a semantic model for research datasets, integrating them with their accompanying scholarly publications. This work aims to establish a standardized framework for recording and reporting research datasets within the ORKG-Dataset content type. This, in turn, increases research dataset transparency on the web for their improved discoverability and applied use. In this paper, we present a proposal -- the minimum FAIR, comparable, semantic description of research datasets in terms of salient properties of their supporting publication. We design a specific application of the ORKG-Dataset semantic model based on 40 diverse research datasets on scientific information extraction.
- Structured data on the web. Communications of the ACM. 2011;54(2):72-9.
- Verhulst S, Young A. Open data impact when demand and supply meet key findings of the open data impact case studies. Available at SSRN 3141474. 2016.
- Understanding Data Retrieval Practices: A Social Informatics Perspective. Arxiv.org; 2018.
- Mayer-Schönberger V, Cukier K. Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt; 2013.
- Google Dataset Search: Building a search engine for datasets in an open Web ecosystem. In: The World Wide Web Conference; 2019. p. 1365-75.
- The OpenAIRE research graph data model. Zenodo. 2019.
- Dataset search: a survey. The VLDB Journal. 2020;29(1):251-72.
- Everything you always wanted to know about a dataset: Studies in data summarisation. International Journal of Human-Computer Studies. 2020;135:102367.
- Piwowar HA, Chapman WW. Public sharing of research datasets: a pilot study of associations. Journal of informetrics. 2010;4(2):148-56.
- How data workers cope with uncertainty: A task characterisation study. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems; 2017. p. 3645-56.
- Searching data: a review of observational data retrieval practices in selected disciplines. Journal of the Association for Information Science and Technology. 2019;70(5):419-32.
- Kern D, Mathiak B. Are there any differences in data set retrieval compared to well-known literature retrieval? In: Research and Advanced Technology for Digital Libraries: 19th International Conference on Theory and Practice of Digital Libraries, TPDL 2015, Poznań, Poland, September 14-18, 2015, Proceedings 19. Springer; 2015. p. 197-208.
- Thoegersen JL, Borlund P. Researcher attitudes toward data sharing in public data repositories: a meta-evaluation of studies on researcher data sharing. Journal of Documentation. 2021;78(7):1-17.
- The Trials and Tribulations of Working with Structured Data: -a Study on Information Seeking Behaviour. In: Proceedings of the 2017 CHI conference on human factors in computing systems; 2017. p. 1277-89.
- The FAIR Guiding Principles for scientific data management and stewardship. Scientific data. 2016;3(1):1-9.
- Shotton D. Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing. 2009;22(2):85-94.
- Improving access to scientific literature with knowledge graphs. Bibliothek Forschung und Praxis. 2020;44(3):516-29.
- Generate FAIR Literature Surveys with Scholarly Knowledge Graphs. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020. JCDL ’20. New York, NY, USA: Association for Computing Machinery; 2020. p. 97–106. Available from: https://doi.org/10.1145/3383583.3398520.
- Datasheets for datasets. Communications of the ACM. 2021;64(12):86-92.
- Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture; 2019. p. 243-6.
- Persistent Identification and Interlinking of FAIR Scholarly Knowledge. arXiv preprint arXiv:220908789. 2022.
- Information extraction from scientific articles: a survey. Scientometrics. 2018;117(3):1931-90.
- McHugh ML. Interrater reliability: the kappa statistic. Biochemia medica. 2012;22(3):276-82.
- Simperl E. Reusing ontologies on the Semantic Web: A feasibility study. Data & Knowledge Engineering. 2009;68(10):905-25.
- Daw S, Pudi V. Extraction of Competing Models using Distant Supervision and Graph Ranking. In: SDU@AAAI; 2022. .
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.