CORE-KG: An LLM-Driven Knowledge Graph Construction Framework for Human Smuggling Networks (2506.21607v1)

Published 20 Jun 2025 in cs.CL, cs.AI, and cs.LG

Abstract: Human smuggling networks are increasingly adaptive and difficult to analyze. Legal case documents offer valuable insights but are unstructured, lexically dense, and filled with ambiguous or shifting references-posing challenges for automated knowledge graph (KG) construction. Existing KG methods often rely on static templates and lack coreference resolution, while recent LLM-based approaches frequently produce noisy, fragmented graphs due to hallucinations, and duplicate nodes caused by a lack of guided extraction. We propose CORE-KG, a modular framework for building interpretable KGs from legal texts. It uses a two-step pipeline: (1) type-aware coreference resolution via sequential, structured LLM prompts, and (2) entity and relationship extraction using domain-guided instructions, built on an adapted GraphRAG framework. CORE-KG reduces node duplication by 33.28%, and legal noise by 38.37% compared to a GraphRAG-based baseline-resulting in cleaner and more coherent graph structures. These improvements make CORE-KG a strong foundation for analyzing complex criminal networks.

Summary

The paper introduces CORE-KG, a modular LLM-based framework that significantly reduces node duplication and legal noise in knowledge graphs from human smuggling cases.
It employs type-aware coreference resolution and structured domain prompts to resolve aliasing issues and improve graph coherence.
Experimental results demonstrate that CORE-KG outperforms GraphRAG-based baselines by reducing node duplication by over 33% and noise by 38%.

CORE-KG: An LLM-Driven Knowledge Graph Construction Framework for Human Smuggling Networks

This paper introduces CORE-KG, a modular LLM-based framework designed to construct knowledge graphs (KGs) from legal case documents, specifically focusing on human smuggling networks. The framework addresses challenges such as inconsistent entity references, noisy data, and the need for interpretable graph structures. By integrating type-aware coreference resolution and domain-guided instructions, CORE-KG aims to improve the precision and coherence of KGs extracted from complex legal texts. The paper demonstrates significant improvements over a GraphRAG-based baseline in terms of node duplication and noise reduction.

Addressing Knowledge Graph Construction Challenges

The paper identifies key challenges in constructing KGs from legal texts related to human smuggling, including:

Inconsistent Entity References: Legal documents often use aliases, abbreviations, and role-based titles, complicating coreference resolution and entity normalization.
Lack of Coreference Resolution: Existing KG methods often lack coreference resolution, leading to fragmented graph representations.
Hallucinations and Noise: LLM-based approaches can produce noisy graphs due to hallucinations and misclassifications.
Node Duplication: Failure to consolidate semantically equivalent mentions results in redundant nodes.

CORE-KG addresses these challenges through a modular framework that integrates coreference resolution and structured prompts.

CORE-KG Framework Components

The CORE-KG framework consists of two primary modules:

Type-Aware Coreference Resolution: This module consolidates semantically and contextually equivalent mentions within each entity type (e.g., Person, Location, Organization). It operates sequentially, resolving one entity type at a time to minimize cross-type interference. Prompts are tailored to each entity type, incorporating specific resolution rules and few-shot examples.
Knowledge Graph Construction: This module extracts entities and relationships using structured prompts with domain-specific filtering instructions. It leverages an adapted GraphRAG framework and includes sequential type-wise extraction, in-prompt type definitions, and explicit filtering instructions to reduce noise and ambiguity.

The modular design allows for targeted optimization of each component, improving the overall quality of the constructed KGs.

Implementation Details

Coreference Resolution Module

The coreference resolution module is designed to unify different surface forms and references to the same real-world entity. It processes the "Opinion" section of legal cases, performing coreference resolution separately for each entity type. The prompts used in this module include key components:

Persona definition to assign the LLM the role of a coreference resolution expert.
Clear task description to ensure that the LLM resolves coreferences without altering the input text.
Contextual information about the downstream use of the resolved text.
Entity-type-specific resolution rules.
Few-shot examples to show how to correctly resolve coreferences.

Knowledge Graph Construction Module

The Knowledge Graph Construction module takes the coreference-resolved "Opinion" section as input and uses a GraphRAG framework to extract entities and relationships. GraphRAG is a modular retrieval-augmented generation system that constructs knowledge graphs and leverages them for response generation. The module extracts entity-relationship triples, which are then aggregated and assembled into a graph using the NetworkX library. The module also uses a prompt that incorporates:

Sequential entity extraction to reduce attention distribution.
Filtering of high-frequency irrelevant entities.
Entity type definitions to mitigate overgeneralization bias.

Experimental Evaluation and Results

The paper evaluates CORE-KG on U.S. federal and state court cases related to human smuggling. The evaluation focuses on:

Node Duplication Reduction: CORE-KG reduces node duplication by 33.28% compared to the GraphRAG baseline.
Noise Reduction: CORE-KG reduces legal noise by 38.37% compared to the baseline.

The improvements are attributed to the type-aware coreference resolution and domain-specific prompting strategies employed by CORE-KG.

Figure 1: Node duplication rate comparison between baseline and CORE-KG across 20 legal cases.

The results indicate that CORE-KG produces cleaner and more coherent graph structures, which are essential for downstream analysis of complex criminal networks.

Qualitative Analysis

Qualitative analysis further highlights the benefits of CORE-KG. For instance, in a representative case paper, CORE-KG identifies key actors, transit routes, and communication methods, providing structural insights into the smuggling network.

Figure 2: Knowledge graph generated by CORE-KG for a representative legal case. The graph demonstrates resolved coreference, improved coherence, reduced legal noise, and more precise entity linking compared to the baseline output.

In contrast, the baseline GraphRAG system produces a more fragmented and noisy graph, with redundant nodes and irrelevant legal boilerplate.

Figure 3: Baseline knowledge graph generated using GraphRAG for a representative legal case. The graph contains several redundant nodes, generic entities, and visually dense connections, which reduce overall clarity. Key duplicate nodes are highlighted using rectangles of the same color, indicating repeated entities that fragment the graph structure.

Implications and Future Directions

The CORE-KG framework demonstrates the potential of LLMs for constructing high-quality knowledge graphs from complex legal texts. The modular design and domain-specific prompting strategies can be adapted to other domains and tasks. Future research directions include:

Automated analysis of human smuggling networks
Group discovery
Entity role identification within groups
Temporal graph evolution
Event prediction from legal texts

Conclusion

CORE-KG offers a robust and interpretable framework for constructing knowledge graphs from legal documents related to human smuggling. The framework's modularity and the use of type-aware coreference resolution and domain-specific prompting contribute to significant improvements in graph quality. The results underscore the importance of structured prompting and domain adaptation for leveraging LLMs in knowledge graph construction.

PDF Markdown

Follow-up Questions

Related Papers

Authors (3)

Tweets

https://twitter.com/GCorreaCabrera/status/1951158210011996497