- The paper introduces a hybrid XAI approach that integrates LLMs with knowledge graphs to generate contextualized, post-hoc explanations for manufacturing ML predictions.
- It employs a dialog-driven, multi-turn graph traversal protocol to iteratively gather domain facts, ensuring explanations are grounded in ontological structure.
- Empirical evaluations with expert users indicate high explanation quality and improved trust, confirming the framework’s practical industrial value.
Improving Interpretability of Machine Learning in Manufacturing with LLMs and Knowledge Graphs
Introduction
Interpretability in machine learning is a critical challenge in the manufacturing domain, where practical deployment depends on the transparency of model outputs and their connection to domain knowledge. The paper "Using LLMs and Knowledge Graphs to Improve the Interpretability of Machine Learning Models in Manufacturing" (2604.16280) proposes a hybrid XAI framework leveraging structured knowledge graphs and LLM-based Graph-RAG to generate post-hoc, contextually rich explanations of ML predictions. This methodology addresses key limitations of conventional XAI, notably the lack of domain context and the technical inaccessibility of most existing explanations.
Methodological Framework
The authors extend the widely adopted ML-Schema ontology, encoding not only standard artifacts – models, datasets, tasks – but also post-hoc explanations (including Shapley values and global insights). Instances are constructed to capture the relations among datasets, ML models, preprocessing steps, tasks, and the specific explanations relevant to each outcome. Classes operate as semantic anchors, while instantiated individuals represent the operational elements of a manufacturing pipeline.
Figure 1: The core ontology structure, displaying relationships between high-level classes (colored nodes) and instances (gray nodes) in the KG.
A unique contribution lies in the traversal and retrieval mechanism: rather than depending on static, fine-tuned LLMs or interpreted SPARQL translations, the system deploys a multi-turn dialog protocol, where an LLM incrementally queries the KG by expanding from user-specified concepts, via Branch-and-Bound search, to assemble focused contexts for explanation generation. The semantic traversal starts with class identification, transitions through instance discovery, and iteratively retrieves relevant interconnected nodes, terminating when coverage is deemed sufficient. This ensures every generated explanation is grounded in both global and instance-level domain facts.
Graph-RAG Retrieval and Explanation Generation
When a user provides a query (e.g., "List all tasks which are influenced by the dataset niryo september 2024"), the system passes this input and the KG ontology to the LLM via prompt engineering. The LLM first identifies the relevant classes, finds concrete KG instances, and retrieves subgraphs through iterative expansion. Information is accumulated in context and provided to the LLM for explanation synthesis, balancing the technical fidelity of post-hoc explanation with domain-relevant detail.
The choice not to delegate traversal logic to LLM-generated SPARQL reduces potential error sources and supports multi-hop, dialog-driven expansion. Theoretical validation for Graph-RAG over fine-tuning is cited, particularly in the context of knowledge injection and dynamic updating in volatile industrial settings [Ovadia et al.].
(Figure 2)
Figure 2: Schematic example of graph traversal for query answering; the colored steps illustrate iterative expansion from a dataset node through associated models and the final task node.
Evaluation: Quantitative, Qualitative, and Stress Tests
The evaluation employs both user-based studies and system-oriented stress tests. 20 professionally experienced AI users rated explanations for both developer and worker scenarios on multiple axes (helpfulness, understandability, structure, and length appropriateness), using a five-point Likert scale. Results for developer roles show consistently higher median values and lower variance, indicating that domain experts find the explanations more directly actionable and less ambiguous than do workers – an expected result given the inclusion of technical abstractions in responses.
Structured robustness tests examined KG-based extraction under ambiguity, contradiction, out-of-scope and adversarial queries. The system reliably grounds answers in ontological structure, corrects false premises, and resists most adversarial inputs. Ambiguity is typically resolved via the most plausible implicit assumption rather than explicit clarification, and the system occasionally overestimates its own actuation capabilities—for example, engaging in task coaching beyond what is semantically supported.
Numerical and Empirical Outcomes
- Median user ratings for explanation quality consistently exceeded 4 (out of 5) across all criteria, with developers showing marginally higher acceptance than workers.
- Kendall's Ï„ analysis reveals high internal consistency, particularly for understandability and structure, across rater groups. Exceptions were isolated, pointing primarily to the complexity of certain queries.
- Robustness tests showed that the KG-RAG pipeline is highly effective for well-scoped and ontologically represented queries, with errors primarily due to insufficient clarification or implicit bias in the ontology schema.
(Figure 3)
Figure 3: User group ratings (developer vs. worker) on helpfulness, structure, and length appropriateness for responses to eight representative XAI questions.
Implications and Future Developments
On a theoretical level, this work demonstrates that graph-based RAG with modular, dialog-driven KG traversal provides a scalable alternative to static fine-tuning and aligns with current trends in LLM-KG synergistic architectures [Pan et al.]. The explicit ontology expands the potential for traceable, reproducible explainability and supports the continuous integration of new domain facts without model retraining—a critical property in manufacturing where process evolution is continual.
Practically, the demonstrated prototype shows strong robustness, factual grounding, and capability for actionable explanation generation in real-world manufacturing tasks. It suggests that role-aware explanation customization (e.g., for expert vs. non-expert operators) and summarization control (e.g., leading with concise insights, elaborating as needed) will further improve user acceptance.
Key limitations pertain to scope enforcement, the handling of underspecified or biased queries, and the lack of built-in mechanisms to adapt explanations to different technical backgrounds or information density preferences. In addition, capability awareness (of what the model/system can legitimately do) requires tighter constraint and better self-representation.
Future research directions include:
- Automated and adaptive prompt construction, facilitating explicit length and abstraction-level constraints.
- Automated KG maintenance and extension, lowering the overhead for integrating novel manufacturing processes or machine configurations.
- Incorporating smaller, open-source LLMs fine-tuned specifically for ontology querying, improving both interpretability and deployment flexibility.
- Stronger alignment techniques to enforce scope correctness and adversarial resistance in dialog-driven settings.
Conclusion
This paper presents a comprehensive, ontology-driven post-hoc XAI pipeline using Graph-RAG and LLMs, validated via both qualitative and quantitative means in a manufacturing setting. Empirical evidence demonstrates that combining LLMs with dynamically traversed knowledge graphs reliably improves both the clarity and contextual depth of ML explanations, supporting enhanced trust and usability. The architectural pattern of inference-time KG integration suggests a viable pathway for scalable, actionable interpretability in industrial AI applications, conditioned on further advances in prompt adaptivity, role-aware explanation, and robust capability modeling.