Knowledge Graph Construction and Reasoning

Updated 6 November 2025

Knowledge Graph Construction is the process of extracting, disambiguating, and fusing entities and relationships from diverse, noisy data into a semantically rich graph.
It employs advanced methods such as local extraction, incremental fusion, and multimodal integration driven by LLMs and graph neural networks.
KG Reasoning uses algorithmic, logical, and neural approaches to enable inference, completion, and explainable decision support for complex applications.

Knowledge graphs (KGs) formalize and structure complex, multi-relational knowledge in domains ranging from NLP and biomedicine to industrial cyber-physical systems. At the core, KG construction encompasses the processes by which entities and relationships are extracted, disambiguated, and fused from heterogeneous, often noisy sources into a coherent, semantically rich graph. KG reasoning spans algorithmic, logical, and neural methodologies that exploit this structure for inference, completion, and explainable decision support. While classical pipelines relied heavily on hand-crafted ontologies and rule-based or supervised extraction, recent advancements leverage LLMs, hybrid retrieval, scalable graph mining, and multimodal fusion to dramatically increase coverage, adaptability, and reasoning depth. This article reviews foundational concepts, automated construction paradigms, schema and fusion strategies, recent trends in LLM-empowered construction, and the state-of-the-art in KG reasoning.

1. Foundations and Motivations for Knowledge Graphs

Knowledge graphs are defined as collections of nodes (entities) and edges (relations or properties), typically represented as ordered triples $(h, r, t)$ , with $h, t \in \mathcal{E}$ (entities) and $r \in \mathcal{R}$ (relations or predicates) (Mohamed et al., 17 Dec 2024, Hur et al., 2021). The formalism supports flexible extensions such as attributes, multimodal objects, and higher-arity relations.

The key motivations for constructing KGs include:

Unification of heterogeneous information sources: Integration from structured, semi-structured, and unstructured domains, including tabular, text, and multimedia content (Mohamed et al., 17 Dec 2024).
Scalable, explainable reasoning and inference: Deductive, inductive, and probabilistic mechanisms for knowledge base completion, question answering, recommendation, and discovery (Hur et al., 2021, Mohamed et al., 17 Dec 2024).
Semantic enrichment and retrieval: Enabling semantic search, dynamic analytics, and cross-source explanations (Choudhury et al., 2016, Purohit et al., 29 Oct 2024).

Ontological constraints (defined classes and relationships) or open-world extraction approaches define the axes along which construction pipelines operate.

2. Methodologies for Automated Knowledge Graph Construction

2.1 Local Extraction and Early Pipelines

Initial KG construction leveraged named entity recognition (NER) and relation extraction (RE) applied to sentences or document fragments, using pattern-based, statistical, or supervised neural architectures. For example, biomedical KG construction achieved high NER and RE F1 scores through fine-tuned BioClinicalBERT+CRF models, supporting graph-based QA and analytics with domain-specific schemas (Harnoune et al., 2023).

2.2 Global and Incremental KG Construction

Recent frameworks address fragmentation and duplication by moving beyond local sentence-level extraction:

Incremental, topic-independent methods: iText2KG incrementally extracts, deduplicates, and fuses entities and relations with zero-shot LLM-driven extraction guided by flexible, schema-less blueprints, incrementally maintaining a globally unique set of entities and relations and avoiding post-processing (Lairgi et al., 5 Sep 2024).
Document-level retrieval-augmented generation: RAKG extracts “pre-entities” from text chunks, uses these to drive RAG-style retrieval from the entire document, then fuses contextual relations, solving long-context forgetting and coreference at scale (Zhang et al., 14 Apr 2025). Evaluation is performed by comparing against ideal, manually constructed KGs, and LLMs are used for precision filtering.

2.3 Dynamic and Multisource Construction

NOUS and similar frameworks demonstrate distributed, incremental fusion of curated KGs (e.g., Freebase) with extractions from streaming or batch unstructured text. Triple extraction (OpenIE), entity disambiguation (context-sensitive, e.g., AIDA), and mapping via embedding models enable both real-time updates and holistic provenance tracking (Choudhury et al., 2016).

2.4 Fusion and Disambiguation

Entity alignment, merging, conflict resolution, and schema reconciliation are critical for producing coherent graphs:

Fusion modules (Graphusion): LLM-based downstream modules merge semantically equivalent nodes, resolve conflicted relations (prompted with explicit fusion logic and background context), and propose novel triplets discovered via zero-shot LLM inference (Yang et al., 15 Jul 2024).
Instance- and schema-level alignment: Recent surveys document transition from rule-driven to semantics-oriented LLM/embedding-based canonicalization, supporting both data- and ontology-driven fusion scenarios (Bian, 23 Oct 2025).

2.5 Multimodal and Domain-centric KG Construction

VaLiK demonstrates end-to-end, zero-shot, text-free multimodal KG construction—cascading vision–LLMs (VLMs) to produce entity-aligned, cross-modal triplets, using cross-modal similarity filters to maintain semantic consistency and storage efficiency (Liu et al., 17 Mar 2025). Domain-centric pipelines integrate domain-specific thesauri, term embeddings, and ontology mapping to support cyber-physical systems and engineering domains, outperforming generic tools like GraphGPT or REBEL (Wawrzik et al., 30 Sep 2024).

Method	Zero-shot?	Fusion Logic	Incremental?	Multimodal?
Graphusion	Yes	LLM-prompted	No	No
iText2KG	Yes	Embedding+LLM	Yes	No
RAKG	Yes	LLM+RAG	Yes	No
VaLiK	Yes	VLM+CLIP	Yes	Yes

3. LLM-Empowered Pipelines: Schema-Based and Schema-Free Paradigms

The classical three-layered KG pipeline (ontology engineering, extraction, fusion) is now complemented and in some cases superseded by LLM-driven approaches (Bian, 23 Oct 2025):

Schema-based (top-down) methods: LLMs assist in translating domain requirements in natural language into formal ontologies, with frameworks such as Ontogenia and CQbyCQ emphasizing normalization and expert-in-the-loop validation. Extraction and fusion are guided by statically or dynamically learned schema constraints.
Schema-free (bottom-up) methods: LLMs are prompted to extract open triples, define concepts and relations, and consolidate knowledge on the fly, using chain-of-thought prompting, conversational extraction (e.g., ChatIE), and staged modular pipelines. EDC and AutoSchemaKG unify schema-based and -free elements.
Instance-level vs. schema-level fusion: LLMs are used for contextual reasoning, multiple-choice based alignment, or hierarchical multi-stage pipelines (COMEM, Graphusion).

Limitations include semantic drift, schema evolution challenges, and the computational cost of LLM-based fusion at scale.

4. Benchmarking, Evaluation Frameworks, and Reasoning Tasks

4.1 Benchmarking Construction and Reasoning

Emergent benchmarks focus on global, system-level KG construction and reasoning:

TutorQA: A 1,200-QA set assessing six tasks (relation judgment, prerequisite prediction, path searching, subgraph completion, concept similarity, and open-ended project ideas) over constructed KGs, employing both binary accuracies and semantic similarity scores between embeddings (Yang et al., 15 Jul 2024).
WikiCausal: Defines entity- and instance-level recall and LLM-based precision metrics on causal KGs extracted from Wikipedia, leveraging Wikidata for gold-standard alignment and LLM instruction prompts for scalable precision assessment (Hassanzadeh, 31 Aug 2024).

4.2 Reasoning Mechanisms

Deductive and probabilistic frameworks: Rule mining, Markov Logic Networks (MLN), Probabilistic Soft Logic (PSL), and weighted MaxSat are extensively surveyed for inferential reasoning, consistency checking, and contradiction handling (Hur et al., 2021).
Embeddings and neural reasoning: Approaches such as TransE encode entities/relations in vector spaces ( $\mathbf{h} + \mathbf{r} \approx \mathbf{t}$ ), with extensions for textual co-training (DKRL) and multi-modal integration.
Graph neural networks (GNNs): Used for path-based and multi-hop reasoning, including document-aware GNNs in G-reasoner and semantic reasoning via GCNs over AMR-based semantic graphs (Luo et al., 29 Sep 2025, Xu et al., 2021).
Explainability and provenance: Graph paths, AMR subgraphs, and witness terms (in dependently typed KGs) enable explicit, auditable reasoning trails (Lai et al., 2020, Xu et al., 2021).

4.3 Graph-Augmented LLM Reasoning

Modern paradigms "ground" LLM reasoning at inference time by anchoring each step in explicit KG search or retrieval:

Agentic and automated stepwise retrieval: Each CoT/ToT/GoT reasoning step is paired with KG action or expansion, with answer chains directly referencing subgraphs or nodes, achieving up to 54.7% performance improvements over ungrounded baselines (Amayuelas et al., 18 Feb 2025).
Process-oriented mathematical KGs: KG-RAR constructs stepwise procedural KGs, iteratively refining retrieval and reasoning steps with frozen LLMs and universal, training-free reward models, substantially enhancing small LLMs’ math reasoning accuracy (Wu et al., 3 Mar 2025).

5. System Architectures and Scaling Considerations

Efficient, scalable KG construction and querying at enterprise and open-domain scale is achieved through several architectural choices:

Lightweight, dependency parsing-based extraction: Eliminates LLM bottlenecks, retaining 94% of LLM-KG performance at orders-of-magnitude lower cost/latency (Min et al., 4 Jul 2025).
Distributed, streaming, and Spark-based systems: NOUS and similar frameworks employ streaming graph mining and distributed entity disambiguation for dynamic, high-velocity domains (Choudhury et al., 2016).
Hybrid retrieval and graph expansion: Efficient one-hop traversal, hybrid semantic and graph retrieval, and reciprocal rank fusion optimize both recall and precision in subgraph selection (Purohit et al., 29 Oct 2024, Min et al., 4 Jul 2025).
Foundation models for graphs: G-reasoner’s GFM integrates query-dependent GNNs (up to 2B parameters, mixed-precision, distributed message passing) with LLM reasoning prompts, achieving superior retrieval, answer accuracy, and explanation quality (Luo et al., 29 Sep 2025).

System	Extraction	KG Storage	Retrieval/Query	Scaling
Graphusion	LLM-COT/RAG	Triple set	KG reasoning+TutorQA	LLM cost
iText2KG	LLM zero-shot	Neo4j	Classical/traversal	Linear
NOUS	OpenIE/Spark	YAGO+stream	Path coherence/LDA	Distributed
GraphAide	LLM+ontology	Neo4j	Vector+subgraph (RAG)	Modular
RAKG	LLM+RAG	Custom	RAG evaluation	N/A
G-reasoner	QuadGraph+GFM	QuadGraph	GFM+LLM	Multi-GPU

6. Open Challenges and Trends

Beyond classical issues of correctness, scalability, and semantic drift (Hur et al., 2021, Bian, 23 Oct 2025), several frontiers merit close attention:

Multimodal and cross-domain KG fusion: Zero-shot MMKG construction pipelines (e.g., VaLiK) and custom, ontology-aware frameworks demonstrate extensibility but require advances in cross-modal alignment and robust noise filtering (Liu et al., 17 Mar 2025, Wawrzik et al., 30 Sep 2024).
Dynamic schema evolution: Hybrid schema-based/schema-free systems (AutoSchemaKG, AdaKGC) enable adaptation to new data but risk incoherence or completeness loss (Bian, 23 Oct 2025).
Explainable, agentic reasoning: LLMs as both constructors and reasoners over KGs blur the boundary between extraction and inference, motivating end-to-end, auditable, and adaptive reasoning loops.
Evaluation and benchmarking: There is a shift toward large-scale, open, modular benchmarks (WikiCausal, TutorQA, MINE) assessing both construction and reasoning metrics in standardized ways (Yang et al., 15 Jul 2024, Hassanzadeh, 31 Aug 2024, Zhang et al., 14 Apr 2025).
Scalability and cost: Lightweight, industrial NLP (SpaCy, dependency parsing) and parallel GNNs (G-reasoner) enable real-world deployment at scale, crucial for enterprise and scientific applications (Min et al., 4 Jul 2025, Luo et al., 29 Sep 2025).
Full automation vs. human-in-the-loop: Automated pipelines have become dominant, but expert validation, schema checkpointing, and explainability measures remain critical for domain-sensitive or safety-critical contexts (Harnoune et al., 2023, Bian, 23 Oct 2025).

7. Outlook and Community Directions

Contemporary research trajectories emphasize persistent, agentic KG memory for LLMs, KG-augmented multimodal reasoning, continual knowledge evolution, and closed feedback cycles between construction and inference (Bian, 23 Oct 2025). Standardized, open-source corpora and leaderboards (WikiCausal, TutorQA, Graphusion, RAKG) underpin systematic benchmarking and facilitate reproducible progress. While LLMs have advanced schema induction, instance fusion, and reasoning, scalability, efficiency, explainability, and compositional hybrid methods remain open research frontiers.

A plausible implication is that future systems will increasingly treat KGs as both a construction product and unified cognitive substrate—integral to explainable, dynamic, and cross-modal AI systems in academic, industrial, and scientific domains.