Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Knowledge Graph for Microdata of Statistics Netherlands (2101.07622v1)

Published 19 Jan 2021 in cs.DL and cs.DB

Abstract: Statistics Netherlands (CBS) hosted a huge amount of data not only on the statistical level but also on the individual level. With the development of data science technologies, more and more researchers request to conduct their research by using high-quality individual data from CBS (called CBS Microdata) or combining them with other data sources. Making great use of these data for research and scientific purposes can tremendously benefit the whole society. However, CBS Microdata has been collected and maintained in different ways by different departments in and out of CBS. The representation, quality, metadata of datasets are not sufficiently harmonized. The project converts the descriptions of all CBS microdata sets into one knowledge graph with comprehensive metadata in Dutch and English using text mining and semantic web technologies. Researchers can easily query the metadata, explore the relations among multiple datasets, and find the needed variables. For example, if a researcher searches a dataset about "Age at Death" in the Health and Well-being category, all information related to this dataset will appear including keywords and variable names. "Age at Death" dataset has a keyword - "Death". This keyword will lead to other datasets such as "Date of Death". "Cause of Death", "Production statistics Health and welfare" from Population, Business categories, and Health and well-being categories. This will tremendously save time and costs for the data requester but also data maintainers.

Summary

We haven't generated a summary for this paper yet.