Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fostering the integration of European Open Data into Data Spaces through High-Quality Metadata (2402.06693v1)

Published 8 Feb 2024 in cs.DB

Abstract: The term Data Space, understood as the secure exchange of data in distributed systems, ensuring openness, transparency, decentralization, sovereignty, and interoperability of information, has gained importance during the last years. However, Data Spaces are in an initial phase of definition, and new research is necessary to address their requirements. The Open Data ecosystem can be understood as one of the precursors of Data Spaces as it provides mechanisms to ensure the interoperability of information through resource discovery, information exchange, and aggregation via metadata. However, Data Spaces require more advanced capabilities including the automatic and scalable generation and publication of high-quality metadata. In this work, we present a set of software tools that facilitate the automatic generation and publication of metadata, the modeling of datasets through standards, and the assessment of the quality of the generated metadata. We validate all these tools through the YODA Open Data Portal showing how they can be connected to integrate Open Data into Data Spaces.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. A. Sestino, A. Kahlawi, and A. De Mauro, “Decoding the data economy: a literature review of its impact on business, society and digital transformation,” EUROPEAN JOURNAL OF INNOVATION MANAGEMENT, 2023 AUG 2 2023.
  2. “A european strategy for data,” European Commission, Tech. Rep. 52020DC0066, 2020. [Online]. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A52020DC0066
  3. “Memorandum for the heads of executive departments and agencies. federal data strategy - a framework for consistency,” Executive office of the President. The White House, Tech. Rep., 2019. [Online]. Available: https://www.whitehouse.gov/wp-content/uploads/2019/06/M-19-18.pdf
  4. “Data strategy. fiscal years 2023 – 2026,” U.S. Office of Personnel Management, Tech. Rep., 2023. [Online]. Available: https://www.opm.gov/data/data-strategy/opm-data-strategy.pdf
  5. “14th five-year plan for national informatization (translation),” Chinese Central Government, Tech. Rep., 2022. [Online]. Available: https://digichina.stanford.edu/wp-content/uploads/2022/01/DigiChina-14th-Five-Year-Plan-for-National-Informatization.pdf
  6. A. Reiberg, C. Niebel, and P. Kraemer, “What is a data space,” Technical Report. Gaia-X Hub Germany, Tech. Rep., 2022.
  7. E. Commission, “From the Public Sector Information (PSI) Directive to the Open Data Directive,” 2022. [Online]. Available: https://digital-strategy.ec.europa.eu/en/policies/psi-open-data
  8. R. A. Alshawish, S. A. M. Alfagih, and M. S. Musbah, “Big data applications in smart cities,” in 2016 International Conference on Engineering & MIS (ICEMIS), 2016, pp. 1–7.
  9. A. Kirimtat, O. Krejcar, A. Kertesz, and M. F. Tasgetiren, “Future trends and current state of smart city concepts: A survey,” IEEE Access, vol. 8, pp. 86 448–86 467, 2020.
  10. J. Conde, A. Munoz-Arcentales, Á. Alonso, G. Huecas, and J. Salvachúa, “Collaboration of digital twins through linked open data: Architecture with fiware as enabling technology,” IT Professional, vol. 24, no. 6, pp. 41–46, 2022.
  11. S. C. Schmidt, F. Thiery, and M. Trognitz, “Practices of linked open data in archaeology and their realisation in wikidata,” Digital, vol. 2, no. 3, pp. 333–364, 2022.
  12. F. Ghilardi, S. D. Petris, A. Farbo, F. Sarvia, and E. Borgogno-Mondino, “Exploring stability of crops in agricultural landscape through gis tools and open data,” in Computational Science and Its Applications – ICCSA 2022 Workshops, O. Gervasi, B. Murgante, S. Misra, A. M. A. C. Rocha, and C. Garau, Eds.   Cham: Springer International Publishing, 2022, pp. 327–339.
  13. S. Rutherford, P. Sturmfels, M. Angstadt, J. Hect, J. Wiens, M. I. van den Heuvel, D. Scheinost, C. Sripada, and M. Thomason, “Automated brain masking of fetal functional mri with open data,” Neuroinformatics, vol. 20, no. 1, pp. 173–185, 2022.
  14. J. Conde, A. Munoz-Arcentales, J. Choque, G. Huecas, and Á. Alonso, “Overcoming the barriers of using linked open data in smart city applications,” Computer, vol. 55, no. 12, pp. 109–118, 2022.
  15. V. Mahajan, G. Cantelmo, R. Rothfeld, and C. Antoniou, “Predicting network flows from speeds using open data and transfer learning,” IET Intelligent Transport Systems, vol. 17, no. 4, pp. 804–824, 2023.
  16. R. Albertoni, D. Browning, S. Cox, A. Gonzalez, A. Perego, and P. Winstanley, “Data Catalog Vocabulary (DCAT) - Version 2,” W3C, Tech. Rep., 04 2020. [Online]. Available: https://www.w3.org/TR/vocab-dcat-2
  17. M. Lnenicka and A. Nikiforova, “Transparency-by-design: What is the role of open data portals?” Telematics and Informatics, vol. 61, p. 101605, 2021.
  18. J. Attard, F. Orlandi, S. Scerri, and S. Auer, “A systematic review of open government data initiatives,” Government Information Quarterly, vol. 32, no. 4, pp. 399–418, 2015.
  19. T. Berners-Lee, J. Hendler, and O. Lassila, “The semantic web: A new form of web content that is meaningful to computers will unleash a revolution of new possibilities,” ScientificAmerican.com, 05 2001.
  20. C. Bizer, T. Heath, and T. Berners-Lee, “Linked Data - The Story So Far,” Int. J. Semantic Web Inf. Syst., vol. 5, pp. 1–22, 2009.
  21. T. Berners-Lee, “Linked Data - Design Issues,” 2006, disponible en línea: https://www.w3.org/DesignIssues/LinkedData.html.
  22. “Directive 2003/98/ec of the european parliament and of the council of 17 november 2003 on the re-use of public sector information,” European Parliament, Tech. Rep. 02003L0098-20130717, 2003. [Online]. Available: https://eur-lex.europa.eu/eli/dir/2003/98/2013-07-17
  23. “Directive 2013/37/eu of the european parliament and of the council of 26 june 2013 amending directive 2003/98/ec on the re-use of public sector information,” European Parliament, Tech. Rep. 32013L0037, 2013. [Online]. Available: https://eur-lex.europa.eu/legal-content/EN/ALL/?uri=celex:32013L0037
  24. “Directive (eu) 2019/1024 of the european parliament and of the council of 20 june 2019 on open data and the re-use of public sector information (recast),” European Parliament, Tech. Rep. 32019L1024, 2019. [Online]. Available: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex:32019L1024
  25. M. Franklin, A. Halevy, and D. Maier, “From databases to dataspaces: a new abstraction for information management,” SIGMOD Rec., vol. 34, no. 4, p. 27–33, dec 2005. [Online]. Available: https://doi.org/10.1145/1107499.1107502
  26. V. K. Singh, P. Singh, M. Karmakar, J. Leta, and P. Mayr, “The journal coverage of web of science, scopus and dimensions: A comparative analysis,” Scientometrics, vol. 126, pp. 5113–5142, 2021.
  27. A. Gangemi, S. Peroni, D. Shotton, and F. Vitali, “The publishing workflow ontology (pwo),” Semantic Web, vol. 8, no. 5, pp. 703–718, 2017.
  28. J. Conde, P. Reviriego, J. Salvachúa, G. Martínez, J. A. Hernández, and F. Lombardi, “Understanding the impact of artificial intelligence in academic writing: Metadata to the rescue,” Computer, vol. 57, no. 1, pp. 105–109, 2024.
  29. C. Lagoze and H. Van de Sompel, “The making of the open archives initiative protocol for metadata harvesting,” Library hi tech, vol. 21, no. 2, pp. 118–128, 2003.
  30. L. Martín, L. Sánchez, J. Lanza, and P. Sotres, “Development and evaluation of artificial intelligence techniques for iot data quality assessment and curation,” Internet of Things, vol. 22, p. 100779, 2023.
  31. R. Y. Wang and D. M. Strong, “Beyond accuracy: What data quality means to data consumers,” Journal of Management Information Systems, vol. 12, no. 4, pp. 5–33, 1996.
  32. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne et al., “The fair guiding principles for scientific data management and stewardship,” Scientific data, vol. 3, no. 1, pp. 1–9, 2016.
  33. OpenDataSoft, “Open Data Inception,” 2024. [Online]. Available: https://data.opendatasoft.com/explore/dataset/open-data-sources%40public
  34. A. Ojo, L. Porwol, M. Waqar, A. Stasiewicz, E. Osagie, M. Hogan, O. Harney, and F. A. Zeleti, “Realizing the innovation potentials from open data: Stakeholders’ perspectives on the desired affordances of open data environment,” in Collaboration in a Hyperconnected World, H. Afsarmanesh, L. M. Camarinha-Matos, and A. Lucas Soares, Eds.   Cham: Springer International Publishing, 2016, pp. 48–59.
  35. A. Immonen, M. Palviainen, and E. Ovaska, “Requirements of an open data based business ecosystem,” IEEE Access, vol. 2, pp. 88–103, 2014.
  36. B. Van Nuffelen, “DCAT Application Profile for data portals in Europe Version 2.1.0,” European Comission, Tech. Rep., 11 2021. [Online]. Available: https://joinup.ec.europa.eu/sites/default/files/distribution/access_url/2021-12/5bf41792-1a2f-4851-aee2-6ecf43815bc1/dcat-ap_2.1.0.pdf
  37. “DCAT-US Schema v1.1,” Data.gov, Tech. Rep., 11 2014. [Online]. Available: https://resources.data.gov/resources/dcat-us/
  38. H. Van de Sompel, M. L. Nelson, C. Lagoze, and S. Warner, “Resource harvesting within the oai-pmh framework,” D-lib magazine, vol. 10, no. 12, 2004.
  39. M. Morsey, J. Lehmann, S. Auer, C. Stadler, and S. Hellmann, “Dbpedia and the live extraction of structured data from wikipedia,” Program, vol. 46, no. 2, pp. 157–181, 2012.
  40. B. Vela, J. M. Cavero, P. Cáceres, and C. E. Cuesta, “A semi-automatic data–scraping method for the public transport domain,” IEEE Access, vol. 7, pp. 105 627–105 637, 2019.
  41. “The Open Archives Initiative Protocol for Metadata Harvesting,” Open Archives Initiative, 01 2015. [Online]. Available: https://www.openarchives.org/OAI/openarchivesprotocol.html
  42. L. Nagel and D. Lycklama, “Design principles for data spaces – position paper. version 1.0,” Tech. Rep., 04 2021.
  43. C. Guasch, G. Lodi, and S. V. Dooren, “Semantic knowledge graphs for distributed data spaces: The public procurement pilot experience,” in International Semantic Web Conference.   Springer, 2022, pp. 753–769.
  44. W. S. Wibowo, D. I. Sensuse, S. Lusa, P. Adi, W. Putro, and A. Yulfitri, “A systematic literature review on open government data: Challenges and mapped solutions,” Journal of Theoretical and Applied Information Technology, vol. 101, no. 5, pp. 1806–1818, 2023.
  45. F. Kirstein, B. Dittwald, S. Dutkowski, Y. Glikman, S. Schimmler, and M. Hauswirth, “Linked Data in the European Data Portal: A Comprehensive Platform for Applying DCAT-AP,” in Proc. Electronic Government (EGOV 2019).   Cham: Springer International Publishing, 2019, pp. 192–204.
  46. J. Umbrich, S. Neumaier, and A. Polleres, “Quality assessment and evolution of open data portals,” in 2015 3rd international conference on future internet of things and cloud.   IEEE, 2015, pp. 404–411.
  47. M. Beno, K. Figl, J. Umbrich, and A. Polleres, “Open Data Hopes and Fears: Determining the Barriers of Open Data,” in Proc. 2017 Conf. for E-Democracy and Open Government (CeDEM), 2017, pp. 69–81.
  48. J.-N. Mazon, R. Brennan, and M. Helfert, “Overcoming misattribution to understand open data reuse in smart cities,” in 2021 IEEE International Conference on Big Data, ser. IEEE International Conference on Big Data, Y. Chen, H. Ludwig, Y. Tu, U. Fayyad, X. Zhu, X. Hu, S. Byna, X. Liu, J. Zhang, S. Pan, V. Papalexakis, J. Wang, A. Cuzzocrea, and C. Ordonez, Eds.   IEEE; IEEE Comp Soc; Ankura; Lyve Cloud, Seagate; NSF, 2021, pp. 5966–5968, 9th IEEE International Conference on Big Data (IEEE BigData), ELECTR NETWORK, DEC 15-18, 2021.
  49. A. Abella, M. Ortiz-de Urbina-Criado, and C. De Pablos-Heredero, “Criteria for the identification of ineffective open data portals: pretender open data portals,” El Profesional de la información, vol. 31, 02 2022.
  50. M. Page, E. Hajduk, E. Lincklaen, G. Cecconi, and S. Brinkhuis, “Open data maturity report 2023,” data.eruopa.eu, Tech. Rep., 12 2023. [Online]. Available: https://data.europa.eu/sites/default/files/odm2023_report.pdf
  51. W. Carrara, M. Nieuwenhuis, and H. Vollers, “Open data maturity in europe 2016,” European Data Portal, Tech. Rep., 01 2020. [Online]. Available: https://data.europa.eu/sites/default/files/edp_landscaping_insight_report_n2_2016.pdf
  52. W. Carrara, F. Sander, and E. van Steenbergen, “Open data maturity in europe 2015,” European Data Portal, Tech. Rep., 01 2020. [Online]. Available: https://data.europa.eu/sites/default/files/edp_landscaping_insight_report_n1_-_final.pdf
  53. “Context Information Management (CIM); NGSI-LD API,” European Telecommunications Standards Institute, 2022. [Online]. Available: https://portal.etsi.org/webapp/workprogram/Report_WorkItem.asp?WKI_ID=66918
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Javier Conde (28 papers)
  2. Alejandro Pozo (6 papers)
  3. Johnny Choque (2 papers)
  4. Álvaro Alonso (18 papers)
  5. Andrés Munoz-Arcentales (3 papers)

Summary

We haven't generated a summary for this paper yet.