Papers
Topics
Authors
Recent
Search
2000 character limit reached

RFKG-CoT: Adaptive Graph Reasoning for KGQA

Updated 24 December 2025
  • The paper introduces RFKG-CoT, a method that mitigates LLM hallucinations in KGQA by dynamically calibrating reasoning depth via a relation-driven adaptive hop selector.
  • RFKG-CoT improves multi-hop performance by monitoring activated KG relations to determine optimal reasoning paths and evidence depth.
  • The few-shot in-context path guidance structures prompts to align LLM reasoning with serialized KG paths, yielding significant accuracy gains across benchmarks.

RFKG-CoT is a method for enhancing knowledge-aware question answering (QA) with LLMs by leveraging adaptive graph-based reasoning and structured few-shot path guidance. Designed to address limitations in parametric knowledge and the hallucination tendencies of LLMs, RFKG-CoT combines a relation-driven, adaptive hop-count selection mechanism with an in-context learning path guidance framework. This approach improves answer faithfulness and accuracy in knowledge graph QA (KGQA) settings by dynamically calibrating the depth and evidence provided to the LLM based on relation activations within the knowledge graph (KG) (Zhang et al., 17 Dec 2025).

1. Motivation and Overview

LLMs often generate unreliable or “hallucinated” answers in knowledge-intensive QA, primarily due to limited parametric knowledge and insufficient use of external symbolic resources. Prior approaches such as KG-CoT attempt to mitigate these weaknesses by integrating KG reasoning chains, yet exhibit two central limitations: (i) rigid hop-count selection based solely on question analysis, disregarding structural cues from the KG; and (ii) underutilization of reasoning paths in prompt construction, resulting in weak supervision and continued hallucination. RFKG-CoT introduces two innovations to rectify these: a relation-driven adaptive hop-count selector and a few-shot in-context learning mechanism with explicit chain-of-thought (CoT) path guidance (Zhang et al., 17 Dec 2025).

2. Relation-Driven Adaptive Hop-Count Selection

RFKG-CoT replaces the static, question-only hop selector of prior work with a mechanism that explicitly tracks which KG relations become “activated” during search and reasoning, using this information to determine the optimal reasoning depth (hop count HH).

The KG, denoted G\mathcal{G}, consists of nn entities and mm relation types. The topic entity state e0\mathbf{e}^0 is a one-hot vector, and the triple-to-relation index matrix M{1,,m}n×n\mathbf{M}\in\{1,\dots,m\}^{n \times n} encodes relation types between pairs of entities. At each reasoning step tt:

  • A focused question embedding qt\mathbf{q}^t is generated by attending over question tokens:

(h1,,hq),q=Encoder(q);Qt=ft([q,relctx])(h_1,\dots,h_{|q|}),\,\mathbf{q} = \mathrm{Encoder}(q);\quad \mathbf{Q}^t = f^t([\mathbf{q}, \mathbf{rel}_{\mathrm{ctx}}])

bit=Softmax(Qt[hi]),qt=i=1qbithib_i^t = \mathrm{Softmax}(\mathbf{Q}^t[h_i]), \quad \mathbf{q}^t = \sum_{i=1}^{|q|} b_i^t h_i

  • A soft relation-activation vector Rt=σ(MLPKG(qt))[0,1]m\mathbf{R}^t = \sigma(\mathrm{MLP}_{\mathrm{KG}}(\mathbf{q}^t)) \in [0,1]^m is computed.
  • Transition matrix WtW^t is compiled as

Wijt={Rkt,k=Mij 0,otherwiseW^t_{ij} = \begin{cases} R^t_k, & k = M_{ij}\ 0, & \text{otherwise} \end{cases}

followed by entity probability propagation et=et1Wt\mathbf{e}^t = \mathbf{e}^{t-1} W^t.

A global binary relation-activation mask mask{0,1}m\mathbf{mask}\in\{0,1\}^m tracks all relations activated above a minimal threshold across all steps. This mask is computed via:

  • For each step t=1,..,Tt=1,..,T, identify all (i,j)(i,j) pairs with eit1,Wijt>0e_i^{t-1}, W_{ij}^t > 0 and record corresponding relations MijM_{ij}.
  • Set mask[k]=1\mathbf{mask}[k]=1 for all such kk.

To select the adaptive hop count, the concatenated embedding [q,mask][\mathbf{q}, \mathbf{mask}] is processed:

c=Softmax(MLPT([q,mask]));H=argmaxtct\mathbf{c} = \mathrm{Softmax}(\mathrm{MLP}_T([\mathbf{q}, \mathbf{mask}])) ; \quad H = \arg\max_t c_t

The final entity score is a weighted mixture of intermediate et\mathbf{e}^t vectors:

eˉ=t=1Tctet\bar{\mathbf{e}} = \sum_{t=1}^T c_t \mathbf{e}^t

This vector is optimized via squared 2\ell_2 loss against the gold answer vector.

3. Few-Shot In-Context Path Guidance

Upon identifying HH and high-confidence reasoning paths, RFKG-CoT serializes each path in the form:

Ei0Reli0,i1Ei1EiHE_{i_0} \to \mathrm{Rel}_{i_0,i_1} \to E_{i_1} \to \cdots \to E_{i_H}

Paths are then embedded into a structured prompt for few-shot in-context learning. The input to the LLM consists of:

  • Three exemplars, each in standard "Question–Paths–Think–Answer" format.
  • A new question and its candidate reasoning paths, e.g.:
    1
    2
    3
    4
    5
    6
    7
    8
    
    Q: <natural-language question>
    Paths:
      1. Eₐ→Rₐb→E_b→...
      2. ...
    Think:
      • Step 1: Use path 1 to find candidate.
      • ...
    Answer: <single answer>
    The explicit symbolic “Think:” section instructs the LLM how to traverse and synthesize path evidence, reducing the risk of ungrounded hallucination and prompting attention to KG-derived facts during generation. Each chain prompt is thus aligned with the graph-retrieved paths.

4. Empirical Evaluation

Experiments evaluate RFKG-CoT on four KGQA benchmarks: WebQSP (primarily 1–2 hops), CompWebQ (multi-hop), SimpleQuestions (single-hop), and WebQuestions (open-domain). Subgraph retrieval employs 1–2 hop expansions around the topic entity.

RFKG-CoT’s performance is compared to KG-CoT and ablated variants using Llama2-7B, Llama2-13B, ChatGPT (gpt-3.5-turbo), and GPT-4. Key findings include:

  • On WebQSP with Llama2-7B, RFKG-CoT achieves 87.1% accuracy (KG-CoT baseline: 72.4%, Δ\Delta = +14.7 pp).
  • Across LLMs, RFKG-CoT yields improvements of +14.1 pp (Llama2-13B), +7.8 pp (ChatGPT), and +6.6 pp (GPT-4) on WebQSP.
  • On CompWebQ, gains are +5.5 pp (Llama2-13B), +9.8 pp (ChatGPT), and +2.8 pp (GPT-4).
  • Ablation studies (ChatGPT, WebQSP/CompWebQ): Adding only the relation mask yields 85.5%/59.8%; only few-shot guidance: 87.7%/57.8%; full RFKG-CoT: 89.9%/61.4%.

These results indicate that the adaptive hop-count selector improves path quality, particularly on multi-hop benchmarks, while structured few-shot path guidance enables more effective utilization of available KG evidence. The combination of both yields synergistic increases in answer faithfulness (Zhang et al., 17 Dec 2025).

5. Qualitative Analysis and Discussion

Manual inspection confirms that RFKG-CoT consistently suppresses hallucinations: LLM outputs cite serialized paths with explicit, stepwise reasoning. For instance, in response to “Who is Justin Bieber’s brother?” the model traces the path “Justin Bieber \to brother \to [Entity], hence the answer is [Entity],” avoiding extraneous or unsupported statements.

Several limitations are observed:

  • The maximum achievable accuracy is bounded by KG subgraph coverage—approximately 92% of gold answers are contained in the retrieved paths, limiting upper performance.
  • Single-hop gains are occasionally constrained by the lack of surface-form mapping for entity IDs (e.g., Freebase numeric codes).
  • Using more than three few-shot exemplars overloads the LLM context window, degrading answer quality.

A plausible implication is that future improvements in subgraph retrieval, ID-to-surface-form mapping, or context-efficient prompting could further enhance model performance.

6. Significance and Future Directions

RFKG-CoT establishes a principled framework for integrating symbolic KG reasoning with LLM-based QA. Its two principal contributions—a relation-driven adaptive hop-count selection mechanism and a structured, path-guided in-context learning format—demonstrate that careful orchestration of external knowledge and prompt engineering enables off-the-shelf LLMs to produce more faithful, accurate, and interpretable answers.

Possible future directions include:

  • Expanding coverage of KG subgraphs to approach oracle upper bounds.
  • Refining entity surface-form representations for improved answer verbalization.
  • Investigating dynamic prompt sizing to maximize information content within token limits.
  • Extending adaptive reasoning to broader KGQA domains with more complex relational schemas.

RFKG-CoT represents a substantial step forward in knowledge-grounded QA, offering a robust methodology for aligning LLM output with explicit, multi-hop symbolic reasoning over knowledge graphs (Zhang et al., 17 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to RFKG-CoT.