DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning

Published 27 Aug 2021 in cs.LG, cs.AI, and cs.CL | (2108.12370v1)

Abstract: We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and the logical constraints over outputs or latent variables can be seamlessly added to the deep models. The domain knowledge can be defined explicitly, which improves the models' explainability in addition to the performance and generalizability in the low-data regime. Several approaches for such an integration of symbolic and sub-symbolic models have been introduced; however, there is no library to facilitate the programming for such an integration in a generic way while various underlying algorithms can be used. Our library aims to simplify programming for such an integration in both training and inference phases while separating the knowledge representation from learning algorithms. We showcase various NLP benchmark tasks and beyond. The framework is publicly available at Github(https://github.com/HLR/DomiKnowS).

Abstract PDF Upgrade to Chat

Authors (6)

Citations (17)

View on Semantic Scholar

Summary

The paper introduces DomiKnowS, a library that integrates symbolic domain knowledge into deep learning to improve model performance, especially in low-data scenarios.
It leverages a Python-based specification language to represent knowledge through graphs and logical constraints, clearly separating domain definitions from learning algorithms.
The integration algorithms, including ILP-based prediction-time inference and constraint masking, show significant improvements on NLP benchmark tasks.

Integration of Domain Knowledge in Deep Learning Architectures

The paper presented explores a library named DomiKnowS, designed to facilitate integrating domain knowledge into deep learning architectures, thereby addressing limitations in generalizability and explainability that often plague such models. This research is particularly focused on implementing domain knowledge symbolically into these architectures, allowing for enhanced model performance, especially in low-data scenarios.

The library emphasizes leveraging symbolic representations of knowledge through graph declarations, supplemented with logical constraints to enable smooth addition to deep learning models. This approach separates knowledge representation from learning algorithms, making knowledge integration more straightforward for developers, particularly during the training and inference stages. The existing practices tend to hard-code this process, but the presented library abstracts this integration, offering a more generalized approach.

Core Components and Methodologies

The framework introduces several core components for integrating symbolic and sub-symbolic models. These include:

Learning Problem Specification: A specification language in Python allows users to describe the problem domain through a conceptual graph, where nodes represent concepts, and edges describe relationships. The framework provides a mechanism to declare constraints within this graph, ensuring they act as either soft or hard constraints within the training regime.
Knowledge Representation: Domain knowledge is articulated through constraints that are either generated from standard ontologies or expressed explicitly within the library’s logical constraint language. The mapping of these constraints to algebraic inequalities or soft logic sets the groundwork for learning and optimizing the relationship between inputs and outputs.
Integration Algorithms: Integration of domain knowledge during training can occur through multiple methodologies:
- Learning with Prediction-Time Inference (L+I): This utilizes ILP solvers to execute global inference under linear constraints.
- Hard Constraints During Training: The IML approach constructs a mask over local predictions to prevent erroneous updates based on conflicting true labels.
- Soft Constraints During Training: Utilizing a primal-dual formulation, this method integrates constraints by augmenting the loss function, optimizing it through a min-max strategy involving Lagrangian multipliers.

Significant Results and Implications

The research demonstrates the library's application on various NLP tasks and beyond. For instance, on the NLP benchmark task, the integration of inference in training using ILP led to a notable improvement in entity and relation extraction tasks. Additionally, the flexible program composition facilitated through the framework allows researchers to switch between different training methodologies effortlessly, whether it involves end-to-end learning or staged training approaches.

These capabilities underscore significant practical implications: DomiKnowS enables enhanced explainability and generalizability of models by using explicit symbolic knowledge integration. Moreover, it offers a potential pathway for extending this functionality across numerous domains, emphasizing tasks where data scarcity is prevalent.

Future Directions

While the paper does not venture into potential exploratory future research directions, this work paves the way for more sophisticated exploration of symbolic integration in evolving deep learning paradigms. Future research could benefit from refining and expanding integration techniques, addressing constraints’ dynamic embedding, and broadening the library's application scopes to other pertinent, complex machine learning problems.

In summary, DomiKnowS showcases an effective paradigm to eclipse traditional challenges in deep learning architectures, contributing a robust tool for research and development in AI by making domain knowledge integration more accessible and effective. Its structured approach promises advancements in the performance and transparency of models, positioning the framework as an important contribution to the ongoing evolution of AI technologies.

Markdown Report Issue