KEO: Knowledge Extraction on OMIn
- KEO is a domain-specific knowledge extraction and reasoning framework that constructs a structured knowledge graph from the OMIn dataset to support safety-critical aviation maintenance.
- It employs a retrieval-augmented generation pipeline that integrates the knowledge graph for coherent multi-hop reasoning across global trends and detailed procedural data.
- Evaluation shows that KEO enhances systematic insight, offers transparent traceability, and ensures local deployment for stringent data privacy in high-stakes settings.
Knowledge Extraction on OMIn (KEO) is a domain-specific knowledge extraction and reasoning framework developed for high-stakes, safety-critical contexts, particularly in aviation maintenance. It operates by constructing a structured knowledge graph (KG) from the OMIn dataset and integrating this representation as a central resource in a retrieval-augmented generation (RAG) pipeline. This allows LLMs to perform coherent, multi-hop reasoning over the full corpus rather than relying on isolated, text-based retrievals. KEO’s architecture is designed for local deployment, ensuring compliance with data privacy and security requirements, and is evaluated using a rigorous benchmark that covers both global sensemaking and actionable procedural knowledge.
1. Framework Architecture
The KEO framework is composed of four primary components:
- KG Creation: Extracts structured entities and relations (triplets) from raw maintenance incident records in the OMIn dataset.
- KG-based RAG Pipeline: Integrates the constructed KG into a retrieval-augmented generation process, enabling LLMs to generate answers informed by global graph context.
- QA Benchmark Creation: Automatically generates a suite of QA tasks covering global trends and actionable maintenance operations to evaluate system reasoning.
- LLM-based Evaluation: Employs strong LLMs as judges for both absolute (score-based) and comparative (pairwise) assessment.
The knowledge graph is formalized as a weighted set of triplets: where and are entities, is the relation, and encodes the prominence of the fact. This explicit and frequency-aware structuring supports reasoning over entity connectivity and pattern detection.
2. Dataset and Task Formulation
KEO is constructed atop the Operations and Maintenance Intelligence (OMIn) dataset, which comprises 2,748 carefully curated aviation incident reports sourced from FAA records. Each record consists of one to three sentences describing a maintenance or operational event. Crucially, the dataset is processed and served locally, ensuring suitability for safety-critical applications and data privacy by entirely avoiding remote or external API dependencies.
The QA benchmark generated from OMIn is partitioned into two major types:
- Global Sensemaking Questions: Require synthesis of trends, patterns, and overarching relationships across multiple records.
- Knowledge-to-Action Questions: Focus on extracting and applying detailed procedural steps or diagnostics relevant to specific maintenance actions.
This dual-focus benchmark enables KEO to assess both strategic overview capabilities and fine-grained retrieval performance.
3. Knowledge Graph Construction
KEO builds its knowledge graph in a two-stage iterative process:
- LLM-based Fact Extraction: An LLM (e.g., GPT-4o or Phi-4-mini in zero-shot settings) is prompted for factual extraction from each aviation record, yielding triplets such as .
- Node Deduplication and Linkage: To reduce noise and improve interpretability, the system performs iterative prompts that propose new entity candidates based on current graph nodes, minimizing duplication and enhancing graph connectivity.
The final KG is either represented in standard triplet form with or as a weighted set (above) for downstream processing.
4. Retrieval-Augmented Generation with KG
KEO’s RAG pipeline leverages the KG in a multi-step workflow:
- Semantic Node Selection: Uses dense embeddings for each KG node; nodes are ranked by semantic similarity to the input query, and top-k are selected as seeds.
- Importance-aware Expansion: From the seed set, an m-hop subgraph is constructed. Importance is distilled by computing a maximum spanning tree (MST) over the undirected form of the subgraph. This filters for relationships most critical to answering the query.
- Graph-based Context Reconstruction: The MST paths are traversed (e.g., by depth-first search), and corresponding entity and relation sequences are linearized into context for the answer-generating LLM.
This pipeline produces context that maintains both local incident detail and global dataset structure, in contrast to conventional text-chunk RAG which lacks global connectivity and often yields fragmented, redundant information.
5. Evaluation and Empirical Findings
KEO’s performance is measured on a QA set (133 questions: 83 sensemaking, 50 action-oriented). Both answer generation (via open LLMs like Gemma-3, Phi-4, Mistral-Nemo) and evaluation (via stronger LLMs such as GPT-4o, Llama-3.3-Instruct) are conducted entirely locally to ensure reproducibility and security.
Key results:
- KEO’s KG-based RAG approach substantially improves global sensemaking by enabling LLMs to reason over dataset-spanning trends, causal links, and system-level phenomena.
- For knowledge-to-action questions, which depend on precise procedural recall, traditional text-chunk RAG remains competitive or slightly superior, suggesting localized retrieval suffices for atomic tasks.
- Evaluation combines absolute grading (1–5 across multiple reasoning axes) and pairwise win-rate scoring, revealing clear performance separation between structural (KEO) and unstructured (text-RAG) retrieval (see referenced win-rate matrix and summary tables in the source).
6. Safety-Critical Applications and Advantages
KEO’s approach addresses essential requirements for safety-critical aviation maintenance:
- Systemic Insight: By structuring event and action knowledge, KEO enables proactive identification of recurring failure patterns, root causes, and long-term operational anomalies.
- Trust and Auditability: Explicit graph representations and stepwise context assembly allow for verifiability and traceability, supporting high-consequence decision support.
- Procedural Clarity: Mapping maintenance incidents to standardized entities and relationships ensures coherent recommendations and reduces the likelihood of ambiguity—a critical factor in aviation safety.
- Local Operation: KEO’s design for local deployment mitigates data leakage risk and aligns with regulatory obligations in sensitive domains.
7. Prospective Research Directions
Future work for KEO includes:
- Scaling to Multimodal and Larger Data: Extension to incorporate technical schematics, sensor logs, and multimodal data, creating a more comprehensive operational intelligence framework.
- Cross-Domain Generalization: Adapting the architecture for other safety- and mission-critical settings beyond aviation, such as healthcare, power, and defense.
- Robustness and Security: Developing enhanced adversarial and anomaly detection mechanisms for both extraction (to prevent misleading facts) and inference (to guard against prompt attack).
- Human-in-the-Loop Evaluation: Augmenting LLM-based assessment with expert and operator feedback to further safeguard reasoning quality and operational relevance.
KEO exemplifies the integration of structured extraction, graph-based retrieval, and advanced LLMs for high-stakes, domain-specific sensemaking and decision support in aviation maintenance. Its construction of dataset-wide, navigable KGs, combined with KG-RAG, enables coherent global reasoning and actionable insight, establishing a blueprint for trustworthy AI deployment in safety-critical industries (Ai et al., 7 Oct 2025).