Human Evaluation of Procedural Knowledge Graph Extraction from Text with Large Language Models (2412.03589v1)

Published 27 Nov 2024 in cs.AI, cs.CL, and cs.HC

Abstract: Procedural Knowledge is the know-how expressed in the form of sequences of steps needed to perform some tasks. Procedures are usually described by means of natural language texts, such as recipes or maintenance manuals, possibly spread across different documents and systems, and their interpretation and subsequent execution is often left to the reader. Representing such procedures in a Knowledge Graph (KG) can be the basis to build digital tools to support those users who need to apply or execute them. In this paper, we leverage LLM capabilities and propose a prompt engineering approach to extract steps, actions, objects, equipment and temporal information from a textual procedure, in order to populate a Procedural KG according to a pre-defined ontology. We evaluate the KG extraction results by means of a user study, in order to qualitatively and quantitatively assess the perceived quality and usefulness of the LLM-extracted procedural knowledge. We show that LLMs can produce outputs of acceptable quality and we assess the subjective perception of AI by human evaluators.

Summary

The paper demonstrates that LLMs can effectively convert unstructured procedural text into ontology-compliant knowledge graphs.
The methodology employs a two-stage process with chain-of-thought prompting to annotate and transform procedures into RDF format.
Human evaluations indicate that while LLM outputs match manual quality in accuracy, skepticism remains about their practical applicability.

Evaluating Procedural Knowledge Graph Extraction Using LLMs

The paper "Human Evaluation of Procedural Knowledge Graph Extraction from Text with LLMs" examines the application of LLMs to the extraction of Procedural Knowledge (PK) from natural language texts, aiming to build robust Knowledge Graphs (KGs). Procedural Knowledge is essential in various domains and traditionally it is conveyed through natural language documented in sources like manuals and guidelines. The paper undertakes the challenging task of translating unstructured procedural text into structured, machine-readable KGs using LLMs.

Methodology

The core methodology involves leveraging LLMs to decipher procedure-oriented textual content and automatically generate structured KGs. The authors implemented an iterative prompt engineering framework, which is particularly innovative for effectively breaking down the task into manageable subtasks that an LLM can execute accurately. The process is explained through a Chain-of-Thought (CoT) prompting approach, applied in two distinct stages:

Step Annotation and Description Generation (P1): The LLM, portrayed as an information extraction expert, rephrases the procedure from unformatted text into structured annotations, including elements like actions, direct objects, equipment, and temporal information.
Ontology-Based Knowledge Graph Construction (P2): The LLM, as an ontological expert, converts the annotated information from the previous step into RDF formatted in Turtle syntax according to a predefined ontology.

The methodology utilizes Wikihow as a dataset for the experiments, making use of various existing ontologies, such as P-Plan, K-Hub, FRAPO, and Time Ontology, to support this transformation.

Findings

The human evaluation phase was implemented through a robust crowdsourcing campaign that focused on three main dimensions: perceived quality, comparative quality, and perceived usefulness of the LLM-extracted procedural knowledge. Here are some condensed findings:

Perceived Quality: Participants generally agreed on the correctness and relevance of the identified procedural steps extracted by the LLMs, although individual evaluations showed slight variations.
Comparative Quality: Participants tended to believe that their manual extraction would differ slightly from the algorithm's outputs, indicating a common perception of human intelligence superiority in specific creative aspects of the task.
Perceived Usefulness: There was some skepticism about the practical utility of the automatically extracted knowledge in real-world application contexts; evaluators often rated usefulness lower compared to other dimensions, indicating a potential gap in understanding the end-use scenarios.

Implications and Future Work

The implications of this paper are pointed towards enhancing automation in knowledge graph construction, specifically in domains heavily reliant on procedural knowledge. The results suggest that LLMs hold potential as an auxiliary annotation tool, although human verification remains critical, particularly for tasks demanding high accuracy and contextual understanding.

The paper acknowledges that human bias in evaluation persists, as participants were generally less forgiving of machine-generated outputs. This aspect highlights opportunities to improve human-computer interaction paradigms and educate users on AI's evolving capabilities. Future work could extend evaluations to more complex procedural documents across varying formats to better gauge the robustness and flexibility of LLMs. There is also room for integrating context retrieval techniques, fine-tuning LLMs with domain-specific knowledge to further boost performance.

In conclusion, this work contributes valuable insights to the knowledge engineering field, demonstrating that, even without the existence of a definitive ground truth in procedural tasks, LLMs perform comparably to human annotators in extracting structured procedural knowledge. This positions LLMs as promising tools for procedural knowledge translation, with potential enhancements on the horizon.

PDF Markdown

Related Papers

Tweets

https://twitter.com/rohanpaul_ai/status/1866431901659259041

https://twitter.com/HyperMindAI/status/1864934585995239640