Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Standardizing Knowledge Engineering Practices with a Reference Architecture (2404.03624v1)

Published 4 Apr 2024 in cs.AI and cs.SE

Abstract: Knowledge engineering is the process of creating and maintaining knowledge-producing systems. Throughout the history of computer science and AI, knowledge engineering workflows have been widely used given the importance of high-quality knowledge for reliable intelligent agents. Meanwhile, the scope of knowledge engineering, as apparent from its target tasks and use cases, has been shifting, together with its paradigms such as expert systems, semantic web, and LLMing. The intended use cases and supported user requirements between these paradigms have not been analyzed globally, as new paradigms often satisfy prior pain points while possibly introducing new ones. The recent abstraction of systemic patterns into a boxology provides an opening for aligning the requirements and use cases of knowledge engineering with the systems, components, and software that can satisfy them best. This paper proposes a vision of harmonizing the best practices in the field of knowledge engineering by leveraging the software engineering methodology of creating reference architectures. We describe how a reference architecture can be iteratively designed and implemented to associate user needs with recurring systemic patterns, building on top of existing knowledge engineering workflows and boxologies. We provide a six-step roadmap that can enable the development of such an architecture, providing an initial design and outcome of the definition of architectural scope, selection of information sources, and analysis. We expect that following through on this vision will lead to well-grounded reference architectures for knowledge engineering, will advance the ongoing initiatives of organizing the neurosymbolic knowledge engineering space, and will build new links to the software architectures and data science communities.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (98)
  1. An analysis of content gaps versus user needs in the wikidata knowledge graph. In International Semantic Web Conference, pages 354–374. Springer, 2022.
  2. A review on language models as knowledge bases. arXiv preprint arXiv:2204.06031, 2022.
  3. Identifying and consolidating knowledge engineering requirements. arXiv preprint arXiv:2306.15124, 2023. arXiv:2306.15124.
  4. Developing knowledge-based systems with mike. domain modelling for interactive systems design, pages 9–38, 1998.
  5. A classification of software reference architectures: Analyzing their success and effectiveness. In 2009 Joint Working IEEE/IFIP Conference on Software Architecture & European Conference on Software Architecture, pages 141–150. IEEE, 2009.
  6. S3: A service-oriented reference architecture. IT professional, 9(3):10–17, 2007.
  7. The state of big data reference architectures: A systematic literature review. IEEE Access, 2022.
  8. Software architecture in practice. SEI Series in Software Engineering. Addison-Wesley Professional, fourth edition, 2022.
  9. Lod laundromat: Why the semantic web needs centralization (even if we don’t like it). IEEE Internet Computing, 20(2):78–81, 2016.
  10. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, page 610–623, New York, NY, USA, 2021. Association for Computing Machinery. doi:10.1145/3442188.3445922.
  11. The semantic web. Scientific american, 284(5):34–43, 2001.
  12. Juergen Boldt. The common object request broker: Architecture and specification. Specification formal/97-02-25, Object Management Group, July 1995. URL: http://www.omg.org/cgi-bin/doc?formal/97-02-25.
  13. Emerging architectures for modern data infrastructure. https://future.com/emerging-architectures-modern-data-infrastructure/, 2020. Accessed: 2022-12-02.
  14. Combining machine learning and semantic web: A systematic mapping study. ACM Computing Surveys, 2023.
  15. Ontology merging: A practical perspective. In Information and Communication Technology for Intelligent Systems (ICTIS 2017)-Volume 2 2, pages 136–145. Springer, 2018.
  16. An overview of data warehousing and olap technology. ACM Sigmod record, 26(1):65–74, 1997.
  17. An overview of end-to-end entity resolution for big data. ACM Computing Surveys (CSUR), 53(6):1–42, 2020.
  18. The concept of reference architectures. Systems Engineering, 13(1):14–27, 2010.
  19. Schema alignment. In Big Data Integration, pages 31–61. Springer, 2015.
  20. Ontology-based data integration in multi-disciplinary engineering environments: A review. Open Journal of Information Systems, 4(1):1–26, 2017.
  21. Describing and organizing semantic web and machine learning systems in the swemls-kg. In European Semantic Web Conference, pages 372–389. Springer, 2023.
  22. Julian Ereth. Dataops-towards a definition. LWDA, 2191:104–112, 2018.
  23. Ontologies of time: Review and trends. International Journal of Computer Science & Applications, 11(3), 2014.
  24. Edward A Feigenbaum. The art of artificial intelligence: Themes and case studies of knowledge engineering. In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, volume 2. Boston, 1977.
  25. Edward A. Feigenbaum. A personal view of expert systems: Looking back and looking ahead. Expert Systems with Applications, 5(3):193–201, 1992. Special Issue: The World Congress on Expert System. URL: https://www.sciencedirect.com/science/article/pii/095741749290004C, doi:10.1016/0957-4174(92)90004-C.
  26. Methontology: from ontological art towards ontological engineering. Engineering Workshop on Ontological Engineering (AAAI97), 03 1997.
  27. DRAFT NIST Big Data Interoperability Framework. Draft nist big data interoperability framework: Volume 6, reference architecture. NIST Special Publication, 1500:6, 2015.
  28. Ontology design patterns. In Handbook on ontologies, pages 221–243. Springer, 2009.
  29. Three decades of software reference architectures: A systematic mapping study. Journal of Systems and Software, 179:111004, 2021.
  30. The evolution of protégé: an environment for knowledge-based systems development. International Journal of Human-computer studies, 58(1):89–123, 2003.
  31. What is the role of the semantic layer cake for guiding the use of knowledge representation and machine learning in the development of the semantic web? In AAAI Spring Symposium: Symbiotic Relationships between Semantic Web and Knowledge Engineering, pages 45–50, 2008.
  32. Ontological Engineering: with examples from the areas of Knowledge Management, e-Commerce and the Semantic Web. Springer Science & Business Media, 2006.
  33. Knowledge graphs and their role in the knowledge engineering of the 21st century (dagstuhl seminar 22372). Dagstuhl Reports, 12(9), 2023.
  34. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. Advances in Neural Information Processing Systems, 36:79081–79094, 2023.
  35. Rcc8 for cidoc crm: semantic modeling of mereological and topological spatial relations in notre-dame de paris. In SWODCH’23: International Workshop on Semantic Web and Ontology Design for Cultural Heritage, 2023.
  36. Olaf Hartig. Reflections on Linked Data Querying and other Related Topics. https://olafhartig.de/slides/Slides-DKG-SWSA-Talk.pdf, 2022. Accessed: 2022-03-17.
  37. Executing sparql queries over the web of linked data. In The Semantic Web-ISWC 2009: 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings 8, pages 293–309. Springer, 2009.
  38. Building expert systems. Addison-Wesley Longman Publishing Co., Inc., 1983.
  39. James A Hendler. Tonight’s dessert: Semantic web layer cakes. In European Semantic Web Conference, pages 1–1. Springer, 2009.
  40. Aidan Hogan. The semantic web: Two decades on. Semantic Web, 11(1):169–185, 2020.
  41. Analysis and design of multiagent systems using mas-commonkads. In Intelligent Agents IV Agent Theories, Architectures, and Languages: 4th International Workshop, ATAL’97 Providence, Rhode Island, USA, July 24–26, 1997 Proceedings 4, pages 313–327. Springer, 1998.
  42. Comparison of knowledge graph representations for user consumption scenarios. In International Semantic Web Conference (ISWC) Research Track, 2023.
  43. Kgtk: a toolkit for large knowledge graph manipulation and analysis. In International Semantic Web Conference, pages 278–293. Springer, 2020.
  44. Cskg: The commonsense knowledge graph. In Extended Semantic Web Conference (ESWC), 2021.
  45. Ontology alignment for linked open data. In International semantic web conference, pages 402–417. Springer, 2010.
  46. Henry Kautz. The third ai summer: Aaai robert s. engelmore memorial lecture. AI Magazine, 43(1):105–125, 2022.
  47. ATAM: Method for architecture evaluation. Carnegie Mellon University, Software Engineering Institute Pittsburgh, PA, 2000.
  48. Ontology engineering. Morgan & Claypool Publishers, 2019.
  49. Designing data governance. Communications of the ACM, 53(1):148–152, 2010.
  50. A semantic web technology index. Scientific reports, 12(1):3672, 2022.
  51. Getting from generative ai to trustworthy ai: What llms might learn from cyc. arXiv preprint arXiv:2308.04445, 2023.
  52. Pytorch-biggraph: A large scale graph embedding system. Proceedings of Machine Learning and Systems, 1:120–131, 2019.
  53. Trustworthy ai: From principles to practices. ACM Computing Surveys, 55(9):1–46, 2023.
  54. Democratizing knowledge representation with biocypher. Nature Biotechnology, pages 1–4, 2023.
  55. Democratizing knowledge representation with biocypher. Nature Biotechnology, 41(8):1056–1059, 2023.
  56. An aspect-oriented reference architecture for software engineering environments. Journal of Systems and Software, 84(10):1670–1684, 2011.
  57. Elements of a theory of human problem solving. Psychological review, 65(3):151, 1958.
  58. Ontology development 101: A guide to creating your first ontology, 2001.
  59. Industry-scale knowledge graphs: Lessons and challenges: Five diverse technology companies show how it’s done. Queue, 17(2):48–75, 2019.
  60. Data privacy in journalistic knowledge platforms. In International Conference on Information and Knowledge Management, 2020. URL: https://api.semanticscholar.org/CorpusID:224820106.
  61. A software reference architecture for journalistic knowledge platforms. Knowledge-Based Systems, 276:110750, 2023.
  62. Ontology matching: A literature review. Expert Systems with Applications, 42(2):949–971, 2015.
  63. Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic web, 8(3):489–508, 2017.
  64. Language models as knowledge bases? In Kentaro Inui, Jing Jiang, Vincent Ng, and Xiaojun Wan, editors, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2463–2473, Hong Kong, China, November 2019. Association for Computational Linguistics. URL: https://aclanthology.org/D19-1250, doi:10.18653/v1/D19-1250.
  65. Who models the world? collaborative ontology creation and user roles in wikidata. Proceedings of the ACM on Human-Computer Interaction, 2(CSCW):1–18, 2018.
  66. Lot: An industrial oriented ontology engineering framework. Engineering Applications of Artificial Intelligence, 111:104755, 2022.
  67. Alun Preece. Evaluating verification and validation methods in knowledge engineering. In Industrial knowledge management: A micro-level approach, pages 91–104. Springer, 2001.
  68. Openalex: A fully-open index of scholarly works, authors, venues, institutions, and concepts. arXiv preprint arXiv:2205.01833, 2022.
  69. Dictionary-based automated information extraction from geological documents using a deep learning algorithm. Earth and Space Science, 7(3):e2019EA000993, 2020.
  70. F.P. Ramsey. Knowledge. In F.P. Ramsey: Philosophical Papers, pages 110–111. Cambridge University Press, 1929.
  71. Data lakes: Trends and perspectives. In Database and Expert Systems Applications: 30th International Conference, DEXA 2019, Linz, Austria, August 26–29, 2019, Proceedings, Part I 30, pages 304–313. Springer, 2019.
  72. Knowledge engineering in the age of neurosymbolic systems. Neurosymbolic AI Journal (under review), 2024.
  73. Lightweight software architecture evaluation for industry: A comprehensive review. Sensors, 22(3):1252, 2022.
  74. Big data analytics on apache spark. International Journal of Data Science and Analytics, 1:145–164, 2016.
  75. Michael Schade. How ChatGPT and Our Language Models Are Developed. https://help.openai.com/en/articles/7842364-how-chatgpt-and-our-language-models-are-developed, 2023. Accessed: 2024-01-05.
  76. A decade of knowledge graphs in natural language processing: A survey. In AACL, 2022. URL: https://api.semanticscholar.org/CorpusID:252683270.
  77. Knowledge engineering and management: the CommonKADS methodology. MIT press, 2000.
  78. A study of the quality of wikidata. Journal of Web Semantics, 2021.
  79. A knowledge graph perspective on knowledge engineering. SN Computer Science, 4(1):16, 2022.
  80. Building and querying an enterprise knowledge graph. IEEE Transactions on Services Computing, 12(3):356–369, 2017.
  81. Autosar (automotive open system architecture). Automotive Software Architectures: An Introduction, pages 97–136, 2021.
  82. The pipeline for the continuous development of artificial intelligence models—current state of research and practice. Journal of Systems and Software, 199:111615, 2023.
  83. The neon methodology for ontology engineering. In Ontology engineering in a networked world, pages 9–34. Springer, 2011.
  84. Defining a knowledge graph development process through a systematic review. ACM Transactions on Software Engineering and Methodology, 2022.
  85. Software architecture: foundations, theory, and practice. John Wiley & Sons, Inc., 2010.
  86. WDQS Search Team. WDQS Backend Alternatives: The Process, Details and Results. https://www.wikidata.org/wiki/File:WDQS_Backend_Alternatives_working_paper.pdf, 2022. Accessed: 2022-08-15.
  87. Karim Tharani. Much more than a mere technology: A systematic review of wikidata in libraries. The Journal of Academic Librarianship, 47(2):102326, 2021.
  88. Using shape expressions (shex) to share rdf data models and to guide curation with rigorous validation. In The Semantic Web: 16th International Conference, ESWC 2019, Portorož, Slovenia, June 2–6, 2019, Proceedings 16, pages 606–620. Springer, 2019.
  89. Knowledge engineering for hybrid intelligence. In Proceedings of the 12th Knowledge Capture Conference 2023, pages 75–82, 2023.
  90. The internet meme knowledge graph. In ESWC, 2023.
  91. Modular design patterns for hybrid learning and reasoning systems: a taxonomy, patterns and use cases. Applied Intelligence, 51(9):6528–6546, 2021.
  92. A boxology of design patterns for hybrid learning and reasoning systems. Journal of Web Engineering, 18(1-3):97–123, 2019.
  93. Semantic web machine learning systems: An analysis of system patterns. In Compendium of Neurosymbolic Artificial Intelligence, pages 77–99. IOS Press, 2023.
  94. Kads: A modelling approach to knowledge engineering. Knowledge acquisition, 4(1):5–53, 1992.
  95. Visualization of patterns for hybrid learning and reasoning with human involvement. In New Trends in Business Information Systems and Technology: Digital Innovation and Digital Business Transformation, pages 193–204. Springer, 2020.
  96. Applying commonkads and semantic web technologies to ontology-based e-government knowledge systems. In The Semantic Web–ASWC 2006: First Asian Semantic Web Conference, Beijing, China, September 3-7, 2006. Proceedings 1, pages 336–342. Springer, 2006.
  97. Extensible and scalable entity resolution for financial datasets using rltk. In Proceedings of the 5th Workshop on Data Science for Macro-modeling with Financial and Economic Datasets, pages 1–1, 2019.
  98. All you need to know to build a product knowledge graph. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 4090–4091, 2021.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com