From Text to Network: Constructing a Knowledge Graph of Taiwan-Based China Studies Using Generative AI

Published 15 May 2025 in cs.AI and cs.CL | (2505.10093v1)

Abstract: Taiwanese China Studies (CS) has developed into a rich, interdisciplinary research field shaped by the unique geopolitical position and long standing academic engagement with Mainland China. This study responds to the growing need to systematically revisit and reorganize decades of Taiwan based CS scholarship by proposing an AI assisted approach that transforms unstructured academic texts into structured, interactive knowledge representations. We apply generative AI (GAI) techniques and LLMs to extract and standardize entity relation triples from 1,367 peer reviewed CS articles published between 1996 and 2019. These triples are then visualized through a lightweight D3.js based system, forming the foundation of a domain specific knowledge graph and vector database for the field. This infrastructure allows users to explore conceptual nodes and semantic relationships across the corpus, revealing previously uncharted intellectual trajectories, thematic clusters, and research gaps. By decomposing textual content into graph structured knowledge units, our system enables a paradigm shift from linear text consumption to network based knowledge navigation. In doing so, it enhances scholarly access to CS literature while offering a scalable, data driven alternative to traditional ontology construction. This work not only demonstrates how generative AI can augment area studies and digital humanities but also highlights its potential to support a reimagined scholarly infrastructure for regional knowledge systems.

Abstract PDF Upgrade to Chat

Authors (1)

Hsuan-Lei Shao

Summary

Constructing a Knowledge Graph of Taiwan-Based China Studies Using Generative AI

This paper articulates an innovative AI-assisted methodology for constructing a knowledge graph by leveraging generative AI (GAI) and LLMs. The focus is on reorienting the academic exploration of Taiwan's China Studies (CS) through a structured and interactive knowledge representation model that extracts meaningful relations from a substantial corpus of peer-reviewed articles.

The research utilizes a dataset encompassing 1,367 CS articles published from 1996 to 2019, demonstrating the systematic extraction of entities and their relationships through LLMs. These entities and relations form the primary structure for a domain-specific knowledge graph, enabling a transformative shift from traditional linear text processing to a network-based exploration of the content.

The study is underscored by its methodological innovations—primarily the deployment of triplet extraction and advanced preprocessing techniques, including semantic label merging, frequency-based consolidation, and redundancy elimination. These preprocessing methodologies are crucial for minimizing data inconsistencies, thereby ensuring the accuracy and interpretability of the knowledge graph. The system’s implementation is notably enhanced by the application of a D3.js-based visualization system, facilitating a sophisticated graphical representation of the underlying conceptual networks.

The implications of this work are significant. The shift toward graph-based knowledge navigation allows for the identification of new intellectual trajectories, thematic clusters, and research lacunae within the CS field. This framework supports an enriched form of scholarly inquiry, allowing researchers to traverse a web of interconnected concepts and uncover latent cross-disciplinary connections that might otherwise remain obscured in traditional text-based formats.

Future potential applications of this research extend beyond the academic boundaries of Taiwan-based China Studies. The methodological blueprint established here offers a scalable template for constructing knowledge graphs in other complex interdisciplinary domains. Integrating LLMs for automated semantic linkage within document corpuses also represents an evolution in the development of scholarly infrastructures, fostering a more intuitive interaction with expansive datasets.

While the current study provides valuable insights into domain-specific knowledge systems, prospective research could explore user-centered evaluations and expand the integration with LLM-powered interfaces for enhanced semantic mapping. The exploration of knowledge centrality measures from social network analysis within this knowledge graph framework could yield new perspectives on influential thematic nodes, shaping future discourse and inquiry.

Overall, the definitive contribution of this paper lies in demonstrating a pragmatic and innovative approach for harnessing the power of AI in area studies and digital humanities, establishing a connectivity-rich scholarly infrastructure that can adapt and evolve with future academic pursuits.

Markdown Report Issue