Knowledge Graphs as the Missing Data Layer for LLM-Based Industrial Asset Operations

Published 26 May 2026 in cs.DB, cs.AI, and cs.LG | (2605.26874v1)

Abstract: LLM-based agents for industrial asset operations show limited accuracy when reasoning over flat document stores. AssetOpsBench (KDD 2026) establishes that GPT-4 agents achieve 65% on 139 industrial maintenance scenarios backed by CouchDB, YAML, and CSV. It compares LLM orchestration paradigms (Agent-As-Tool vs Plan-Execute) on a fixed data layer; we ask a complementary, orthogonal question: how much does the data model behind the tools affect agent performance? Building on the same scenarios, we introduce a knowledge graph layer (781 nodes, 955 edges, 16 relationship types) and evaluate three architectures: (1) deterministic graph handlers (no LLM) at 99% (137/139); (2) LLM-generated Cypher over the graph at 82-83% with the same GPT-4 model the baseline uses; and (3) the original tool-augmented LLM baseline at 65% (91/139, matching the published KDD 2026 leaderboard ceiling). Our key finding is inverted LLM usage: rather than asking the LLM to reason over raw data, we ask it to generate structured queries from a typed schema. The graph executes deterministically. We additionally contribute 40 graph-native scenarios (multi-hop dependency, vector similarity, PageRank criticality), and evaluate against the expanded HuggingFace AssetOpsBench release (467 scenarios, 6 domains), where deterministic handlers achieve 100% (467/467) with average score 0.848. These results suggest that for structured operational domains, the data layer -- not the LLM orchestration -- is the primary bottleneck, and that knowledge graphs serve as an integration layer between raw industrial data and LLM-based reasoning.

Abstract PDF Upgrade to Chat

Authors (2)

Summary

The paper demonstrates that using a structured knowledge graph significantly improves task completion rates, with a performance lift of up to 17 percentage points.
It compares three architectures, showing that deterministic handlers achieve nearly perfect success rates by leveraging explicit relational modeling.
The study highlights that integrating graphs reduces cost and latency while enabling multi-hop queries and real-time scalability in industrial asset management.

Knowledge Graphs as the Foundational Data Layer for LLM-Based Industrial Asset Operations

Motivation and Benchmark Context

This paper critically examines the operational bottlenecks in LLM-driven industrial asset management, focusing on the AssetOpsBench benchmark (2605.26874). AssetOpsBench systematically evaluates LLM-based agents (e.g., GPT-4.1, llama-4-maverick, mistral-large, granite-3-8b) across 467 scenarios encompassing six domains. Previous studies primarily varied agent orchestration (Agent-As-Tool vs. Plan-Execute) on a fixed flat document-based data layer, observing a persistent ceiling in task completion rates—no model exceeding 70%. The paper interrogates why accuracy stagnates, attributing failures less to LLM reasoning limitations and more to fundamental constraints of the underlying data model (CouchDB, YAML, CSV). These constraints include cross-document aggregation, implicit relational traversal, and inability to execute graph algorithms or structured queries deterministically.

Knowledge Graph Construction and Architecture

Through an 8-stage ETL process, the authors construct a typed knowledge graph comprising over 1,360 nodes across 14 labels and 2,500 edges spanning 21 relationship types. The graph leverages entity hierarchy from EAM systems, sensor metadata, structured failure modes (backed by Sentence-BERT embeddings in HNSW vector indices), and event histories. The dependency topology is explicitly modeled with edge types such as depends_on and shares_system_with, enabling cascade and criticality analysis.

Three architectures are evaluated:

A. Tool-Augmented LLM (Baseline): LLMs execute multi-step reasoning directly over flat document stores, performing intent parsing, tool selection, argument crafting, and result synthesis.
B. NLQ + Knowledge Graph: LLMs generate Cypher queries from a provided schema, delegating execution to the graph engine for deterministic traversal, aggregation, and relationship reasoning.
C. Deterministic Handlers: Pre-coded handlers match query patterns to Cypher queries with no LLM involvement, serving as a ceiling for structured query performance.

The key insight—termed "inverted LLM usage"—is constraining LLMs to generate queries rather than perform open-ended reasoning, aligning the task with their code generation capabilities. This yields substantial empirical gains.

Empirical Results and Analysis

The paper offers extensive head-to-head evaluations, isolating the impact of the data layer independent of orchestration or model generation:

On the original 139-scenario benchmark, deterministic handlers (Architecture C) achieve 99% success; NLQ with GPT-4 family models achieves 82–83%; the LLM-over-documents baseline (Architecture A) remains at 65% (matching published leaderboard results).
The $\sim$ 17pp performance lift in the same-model comparison (Architecture B vs. Architecture A, both GPT-4) is attributed solely to the data layer, not changes in agent orchestration or model version.
Per-type breakdowns reveal that NLQ architectures struggle only where scenarios require ML pipeline execution beyond the scope of Cypher queries. Deterministic handlers address these via domain-specific code.
On the full 467-scenario HuggingFace AssetOpsBench, deterministic handlers deliver 100% task completion with an average score of 0.848, demonstrating scalability and preservation of accuracy across multiple industrial domains.

The authors also introduce 40 graph-native scenarios demanding multi-hop dependency analysis, vector similarity (via failure mode embeddings), criticality ranking (PageRank), and optimization—tasks structurally impossible with flat document stores. For these, the knowledge graph architecture delivers significant gains (e.g., +0.401 score for failure similarity queries), confirming the necessity of explicit relational modeling and deterministic graph operations.

Theoretical Implications and Design Principles

The study establishes that for structured operational domains, the data layer is the principal bottleneck in LLM-based reasoning, overshadowing orchestration or model selection. LLMs excel at generation tasks with explicit schemas, but fail on operations requiring aggregation, traversal, and algorithmic reasoning over implicit, unstructured data. Partitioning the pipeline into "LLMs at the edges, graph in the center" aligns each component with its strengths: LLMs prepare unstructured data (entity extraction, resolution, classification) and synthesize queries; the graph handles deterministic storage, traversal, and computation.

Furthermore, knowledge graphs enable capabilities unattainable via document stores or vanilla RAG schemes—cascade analysis, multi-hop queries, vector search, and algorithmic criticality assessment—extending not only accuracy but expressive power for industrial operations.

Practical Deployment and Scalability

The cost asymmetry between architectures is pronounced: high-frequency query loads in industrial environments (10K queries/day) accrue substantial operational costs in token-heavy LLM architectures ($300–500/day), whereas deterministic graph queries are virtually free post-ingestion. Latency also favors graphs (63 ms vs. 6–11 s). The approach supports real-time streaming and complex multi-hop queries deterministically, facilitating scalability for asset fleets exceeding 10K nodes.

The authors note caveats: deterministic handlers are pre-coded, not autonomous; LLM stochasticity and data cleanliness assumptions may not hold in real-world deployments. Addressing noisy, unstructured industrial data necessitates LLM-assisted ingestion and preparation.

Implications for Future Developments

The study suggests fundamental revisions to benchmarking and agent design in industrial AI. The data model should be treated as a primary axis of optimization, independent of LLM orchestration. Hybrid architectures combining deterministic handlers and NLQ modules are recommended. Extensions to large-scale, multi-site environments, full Pass_k evaluations, and integration with unstructured data preparation remain as promising directions.

Beyond industrial asset operations, the inverted LLM usage pattern and schema-aware query generation generalize to any domain with structured relational data—heralding knowledge graphs as the missing layer for robust, scalable LLM agent interaction.

Conclusion

The paper conclusively demonstrates that knowledge graphs, when employed as the central data layer, unlock deterministic reasoning, algorithmic traversal, and efficient aggregation for LLM-based industrial asset operations. The data layer—not agent orchestration or model selection—emerges as the primary lever for accuracy and scalability. The architectural principle of separating data operations (graphs) from structured input/output generation (LLMs) is transferable to a broad class of structured domains. This paradigm refines the trajectory for AI adoption in industrial contexts, advocating for explicit relational modeling and task alignment between symbolic and neural components.

Markdown Report Issue