DomiKnowS: Declarative Neuro-Symbolic Library
- DomiKnowS is a declarative neuro-symbolic library that integrates explicit domain knowledge into deep learning models via symbolic, graph-oriented representations.
- It supports diverse tasks across NLP, vision, and reasoning by attaching neural modules to graph nodes and enforcing both hard and soft logical constraints.
- Its modular design combines ILP-based inference and LLM-assisted program induction to enhance explainability, sample efficiency, and automation.
DomiKnowS is a Python-based, declarative neuro-symbolic library designed for the integration of explicit, symbolic domain knowledge into deep learning models, leveraging modular graph-oriented representations and logic-based constraints. It unifies high-level knowledge representation, model construction, and algorithmic integration in a PyTorch ecosystem with support for both classic and cutting-edge neuro-symbolic methodologies. DomiKnowS is notable for its expressive symbolic abstraction, tight PyTorch integration, support for hard and soft constraint enforcement, and an ecosystem enabling both manual and automated (LLM-assisted) program generation. Its architecture and workflows address core challenges in explainability, sample efficiency, and constraint-driven learning across NLP, vision, and general machine learning domains (Faghihi et al., 2021, Sinha et al., 8 Sep 2025, Nafar et al., 2 Jan 2026, Faghihi et al., 2024).
1. Architectural Foundations and Core Abstractions
DomiKnowS is architected around a separation of concerns between:
- Knowledge Representation Layer: Users declaratively specify a “conceptual graph” whose nodes are concepts (types, labels, domain objects) and whose edges are relations (contains, is_a, has_a, etc.) using a symbolic Python API. This layer is independent of model implementation and solely describes task structure and logical relationships.
- Learning & Inference Layer: Neural modules (“learners” and “sensors”) attach to graph nodes or edges, defining how features are ingested and predictions are produced. A “Program” or “Model” object unifies the declared graph, annotated sensors/learners, training configuration (loss, metrics), data hooks, and an explicit choice of constraint-integration algorithm. Optimizers, ILP solvers, and inference backends are abstracted as plug-and-play modules (Faghihi et al., 2021, Sinha et al., 8 Sep 2025).
The core DomiKnowS API enables users to:
- Define and relate discrete or continuous concepts via symbolic declarations.
- Express first-order logical constraints using a builder API (e.g., ifL, andL, orL, existsL, notL).
- Attach PyTorch modules (e.g., CNNs, LLMClassifiers) as learners to concepts, with a clear interface for feature provision (sensors), prediction, and backpropagation.
- Switch among varied constraint satisfaction/training algorithms at the program level through a single argument.
2. Symbolic Graph Language and Logical Constraint Formalism
The core symbolic language of DomiKnowS is an embedded domain-specific language (DSL) for Python that unifies concept and relation declaration and constraint specification:
- Graph Declarations: Example syntax defines concepts and their relationships as:
Edges such as 'contains', 'is_a', and 'has_a' capture hierarchical, compositional, or semantic-relational structure. Derived concepts automatically infer “is_a” relations unless overridden (Faghihi et al., 2021, Faghihi et al., 2024).1 2 3
word = Concept(name='word') sentence = Concept(name='sentence') sentence.contains(word)
- Logical Constraints: Constraints use a builder API for first-order–style rules:
These FOL-like constraints are then compiled under the hood into linear constraints over binary or multi-valued decision variables, which can be leveraged in both training and inference, either as hard constraints (global feasibility) or as soft penalties (“primal–dual” or “IML” strategies) (Faghihi et al., 2021).1 2 3 4 5 6 7
ifL( work_for('x'), andL( people(path=('x', 'arg1')), organization(path=('x', 'arg2')) ) ) - Expressivity: The symbolic layer supports arbitrary logical/arithmetic compositions, existential/universal quantification, and mixture of constraint arities, supporting a wide array of structured learning and reasoning tasks.
3. Integration Algorithms and Constraint Solving
DomiKnowS decouples semantic declaration from computational enforcement by abstracting over several constraint-integration mechanisms:
- ILP-based Global Inference: All constraints are converted to integer linear program (ILP) formulations. At inference, DomiKnowS solves:
where encodes the constraints. This ensures hard satisfaction at prediction time (Faghihi et al., 2021, Sinha et al., 8 Sep 2025, Nafar et al., 2 Jan 2026).
- Inference-Masked Loss (IML): During training, an ILP solution for each example is used to mask out loss gradients for local predictions that can be globally corrected, improving respect for hard constraints (Faghihi et al., 2021).
- Primal–Dual (Lagrangian Relaxation): Following Nandwani et al., the loss is
(where is the hinge function), allowing soft constraint enforcement with alternating updates to both parameters and multipliers (Sinha et al., 8 Sep 2025, Faghihi et al., 2021).
- Sampling Loss (as in Scallop): Constraint violation likelihood is estimated via sampling from predicted marginals (Sinha et al., 8 Sep 2025).
Selection between these inference/training paradigms is unified under a single interface, with shared symbolic grounding but distinct solver back-ends (e.g., Gurobi, custom DP), enhancing modularity for both research and production.
4. Usage Patterns, Automated Program Induction, and Human-in-the-Loop Design
DomiKnowS has evolved from purely manual specification to enable highly automated, LLM-driven program synthesis:
- Manual Construction: Researchers define graph and constraint structure via a compact API, assign sensors/learners, and select from multiple loss and inference regimes (Faghihi et al., 2021, Sinha et al., 8 Sep 2025).
- Agentic and Conversational Front-ends: AgenticDomiKnowS (ADS) and Prompt2DeModel frameworks leverage LLM agents (GNN, LangGraph, RAG retrieval) to synthesize DomiKnowS programs directly from natural language prompts or task descriptions. These systems iteratively design the graph, generate and test code, and refine outputs with optional expert or domain-expert intervention (Nafar et al., 2 Jan 2026, Faghihi et al., 2024).
- Human-in-the-Loop Correction: Both ADS and Prompt2DeModel incorporate feedback mechanisms for intervention after failed auto-attempts (e.g., GraphReviewer, SensorHumanCoder), dramatically reducing semantic errors and enabling non-specialists to produce high-quality neuro-symbolic programs in under 20 minutes per task (Nafar et al., 2 Jan 2026, Faghihi et al., 2024).
This convergence of symbolic expressivity and automation enables rapid, data-efficient injection of domain constraints into deep models, lowering the knowledge engineering barrier and improving correctness and interpretability.
5. Empirical Capabilities, Tasks, and Performance
DomiKnowS supports a broad range of learning settings:
- Target Domains: Structured prediction (NER, event extraction, sequence labeling), vision tasks (image classification with disjoint constraints, MNIST Sum), multi-relational reasoning (VQA, Math inference), and arbitrary hybrid tasks combining perception and symbolic logic (Sinha et al., 8 Sep 2025).
- Low-Resource Regimes: Use of symbolic constraints enables measurable gains in F1 and accuracy with reduced labeled data. For example, on EMR using 25% of data, ILP provides +1.3% overall F1; with full data, the gain persists at +0.6%. For WIQA, integrating ILP lifts absolute accuracy from ≈74% to >79% (Faghihi et al., 2021).
- Sample Efficiency & Model Explainability: Because all constraint firings and intermediate concept predictions are recorded, researchers can audit the provenance of each output, inspect lagrange multipliers, and obtain a full symbolic trace.
- Empirical Benchmarks: Compared to DeepProbLog and Scallop, DomiKnowS matches or slightly exceeds inference speed (when optimized), is comparable or slower in training (especially under ILP), and remains memory-efficient, with a manageable footprint even on large reasoning tasks (Sinha et al., 8 Sep 2025).
- Program Induction Efficiency: With ADS and Prompt2DeModel, end-to-end system generation time for non-experts is typically 10–20 minutes, a marked reduction from manual DomiKnowS coding (2–4 hours), with high success rates and interpretability (Nafar et al., 2 Jan 2026, Faghihi et al., 2024).
6. Expressivity, Extensibility, and Limitations
DomiKnowS fundamentally supports:
- Full Pythonic Symbolic Layer: Arbitrary graph/constraint structure, fine-grained supervision (losses attached to intermediate nodes), and dynamic concept typing.
- Pluggable Solvers & Neural Modules: Any PyTorch module can be used as a learner; solvers for ILP, dynamic programming, or probabilistic inference can be swapped with minimal surface-code changes (Faghihi et al., 2021, Sinha et al., 8 Sep 2025).
- Combinatorial and Arithmetic Constraints: Arithmetic compositions (MNIST Sum), relational patterns, existential and universal quantifications, and multi-task learning within a unified graph architecture.
Limitations include:
- Potential training slowdown due to repeated ILP solves on large graphs.
- Lack of commitment to a single formal semantics (unlike Prolog or Datalog), complicating formal verification.
- Requirement for advance declaration of all concepts, even if some remain inactive during training (Sinha et al., 8 Sep 2025).
- Automated program induction systems (ADS, Prompt2DeModel) occasionally require 2–3 refinement loops due to LLM unfamiliarity with the DSL, though feedback pipelines mitigate the majority of such issues (Nafar et al., 2 Jan 2026, Faghihi et al., 2024).
7. Installation, Ecosystem, and Comparative Positioning
DomiKnowS is distributed as an open-source repository:
- Installation: Standard setup includes cloning the repository, installing requirements, and an editable pip install:
Key modules include1 2 3 4
git clone https://github.com/HLR/DomiKnowS.git cd DomiKnowS pip install -r requirements.txt pip install -e .
domiknows/graph.py(concept and edge API),constraints.py(logical builders),program.py(core class), and backends for inference and algorithmic integration (Faghihi et al., 2021). - Comparative Analysis: Benchmarked against DeepProbLog and Scallop, DomiKnowS offers greater symbolic expressivity and modularity, at the cost of occasional training speed—especially under ILP. It does not seek fixed-logical groundings, trading some formalism for practical flexibility. Its ecosystem, including ADS and Prompt2DeModel, positions it as a leading platform for researchers seeking to combine declarative domain knowledge with state-of-the-art neural learners (Sinha et al., 8 Sep 2025, Nafar et al., 2 Jan 2026, Faghihi et al., 2024).
- Resources and Documentation: Tutorials, interactive Colab demos, and extended documentation are available at https://hlr.github.io/domiknows-nlp/ and the main repository.
DomiKnowS provides a comprehensive, flexible, and extensible framework for neuro-symbolic learning, distinguished by its symbolic abstraction, modular implementation, and ecosystem supporting both manual and fully automated knowledge-driven model generation. Its impact is documented in rigorous empirical evaluations and comparative studies spanning the field of neuro-symbolic AI.