Knowledge Graph Structure as Prompt: Improving Small Language Models Capabilities for Knowledge-based Causal Discovery (2407.18752v3)

Published 26 Jul 2024 in cs.CL and cs.AI

Abstract: Causal discovery aims to estimate causal structures among variables based on observational data. LLMs offer a fresh perspective to tackle the causal discovery problem by reasoning on the metadata associated with variables rather than their actual data values, an approach referred to as knowledge-based causal discovery. In this paper, we investigate the capabilities of Small LLMs (SLMs, defined as LLMs with fewer than 1 billion parameters) with prompt-based learning for knowledge-based causal discovery. Specifically, we present KG Structure as Prompt, a novel approach for integrating structural information from a knowledge graph, such as common neighbor nodes and metapaths, into prompt-based learning to enhance the capabilities of SLMs. Experimental results on three types of biomedical and open-domain datasets under few-shot settings demonstrate the effectiveness of our approach, surpassing most baselines and even conventional fine-tuning approaches trained on full datasets. Our findings further highlight the strong capabilities of SLMs: in combination with knowledge graphs and prompt-based learning, SLMs demonstrate the potential to surpass LLMs with larger number of parameters. Our code and datasets are available on GitHub.

PDF HTML Abstract

Knowledge Graph Structure as Prompt: Enhancing Small LLMs for Knowledge-based Causal Discovery

The paper "Knowledge Graph Structure as Prompt: Improving Small LLMs Capabilities for Knowledge-based Causal Discovery" by Yuni Susanti and Michael Färber presents a novel approach designed to augment the performance of Small LLMs (SLMs) in knowledge-based causal discovery tasks. The authors propose a method known as "KG Structure as Prompt," which incorporates structural information from knowledge graphs (KGs) into prompt-based learning to enhance the reasoning capabilities of SLMs.

Introduction to the Task

Causal discovery aims to uncover causal relationships between variables using observational data, resulting in a causal graph where nodes represent variables and edges represent causal relationships. Traditional methods, such as covariance-based causal discovery, infer these relationships based on data values. Recent advancements in LLMs have introduced metadata-based approaches, focusing on variables' metadata rather than their data values for causal reasoning. This paper extends this concept to SLMs, defined as LLMs with fewer than 1 billion parameters.

Methodology

The authors present a structured methodology leveraging KGs such as Wikidata and Hetionet. The innovative aspect of their approach lies in transforming KG structural information into natural language prompts that can be understood and processed by SLMs. They explore three types of KG structural information: neighbor nodes ( $\mathcal{NN}$ ), common neighbor nodes ( $\mathcal{CNN}$ ), and metapaths ( $\mathcal{MP}$ ). Each type of structural information offers a different dimension of relational context that can aid SLMs in causal inference.

Prompt Design

The design of the prompt is integral to the success of this method. The authors integrate KG-derived context into the prompt-based learning framework. For instance, the prompt might combine the input text sequence, KG-derived graph context, and the target variable pair, using a combination of few-shot examples and task-specific instructions. This multi-faceted prompt design enables SLMs to leverage external knowledge effectively, thereby enhancing their inferencing capabilities.

Experimental Framework

The paper evaluates the proposed method on three types of biomedical datasets (GENEC, DDI, COMAGC) and an open-domain dataset (SEMEVAL-2010 Task 8). The experiments compare the proposed approach with several baselines, including traditional fine-tuning, prompt tuning without graph context, and in-context learning (ICL) using a larger parameterized model (GPT-3.5-turbo). The evaluation is conducted under few-shot settings using metrics of precision, recall, and F1 score.

Results

The experimental results are compelling, showcasing the effectiveness of KG Structure as Prompt:

Performance Improvement: The proposed method consistently outperformed no-graph context baselines, achieving up to a 15.1-point increase in F1 scores on biomedical datasets and a 6.8-point improvement on the open-domain dataset.
Comparison with Full Training: Even with limited training samples, the proposed approach achieved performance close to, and sometimes surpassing, models trained on full datasets.
SLMs vs. LLMs: The proposed approach demonstrated that SLMs, when combined with prompt-based learning and KGs, can surpass larger LLMs like GPT-3.5-turbo in causal discovery tasks.

Discussion

The paper presents several insightful findings:

Structural Information: Metapaths ( $\mathcal{MP}$ ) generally provided the best performance among the different types of KG structures, although the effectiveness varied with the dataset's characteristics.
Model Architecture: Models using the Masked LLM (MLM) architecture typically performed best in classification tasks, followed by Sequence-to-Sequence (Seq2SeqLM) and Causal LLM (CLM) architectures.
KG Selection: Domain-specific KGs like Hetionet generally provided better results for biomedical datasets compared to general-domain KGs like Wikidata.

Implications and Future Work

The findings underscore the potential of integrating external knowledge from KGs to enhance the capabilities of SLMs in specialized tasks such as causal discovery. The implications are significant, suggesting that SLMs, with appropriate contextual enhancements, can achieve high performance levels traditionally associated with more resource-intensive LLMs. Future research could extend this approach to more complex causal graphs involving multiple interconnected variables, further enriching the understanding of causal relationships.

In conclusion, "KG Structure as Prompt" offers a robust and flexible framework for leveraging knowledge graphs to augment the reasoning capabilities of Small LLMs, setting a new direction for efficient and cost-effective AI models in causal inference.