From Local to Global: A Graph RAG Approach to Query-Focused Summarization (2404.16130v1)

Published 24 Apr 2024 in cs.CL, cs.AI, and cs.IR

Abstract: The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables LLMs to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a na\"ive RAG baseline for both the comprehensiveness and diversity of generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.

PDF Abstract

The paper introduces a novel Graph Retrieval-Augmented Generation (RAG) approach to address query-focused summarization (QFS) over large text corpora, contrasting it with traditional RAG methods that are effective for local question answering but struggle with global, corpus-level queries. The core idea is to leverage LLMs to construct a graph-based text index, pregenerate summaries for communities of closely-related entities, and then use these summaries to generate comprehensive and diverse answers to user questions. This method aims to combine the strengths of both RAG and QFS techniques, scaling effectively with both the generality of user questions and the quantity of text indexed.

The methodology involves several key steps:

Source Documents to Text Chunks: The initial stage involves partitioning the input text into chunks. The size of these chunks is a critical parameter, balancing the need for fewer LLM calls (with larger chunks) against the recall degradation associated with longer context windows. The paper suggests that smaller chunk sizes can extract almost twice as many entity references as larger ones, using the HotPotQA dataset as an example.
Text Chunks to Element Instances: This stage focuses on extracting graph nodes and edges from each text chunk using LLM prompts. The LLM identifies entities (name, type, description) and relationships between entities (source, target, description), outputting these as delimited tuples. Few-shot examples are used to tailor the prompt to the domain of the document corpus. Multiple rounds of "gleanings" are employed to improve recall without sacrificing precision.
Element Instances to Element Summaries: Instance-level summaries are further processed using another round of LLM summarization to create descriptive text for each graph element (entity node, relationship edge, claim covariate). This step addresses potential inconsistencies in entity recognition by the LLM, ensuring resilience through community detection and summarization.
Element Summaries to Graph Communities: The graph is modeled as a homogeneous, undirected, weighted graph, with entity nodes connected by relationship edges. Edge weights represent the normalized counts of detected relationship instances. Community detection algorithms, specifically Leiden, are then used to partition the graph into communities of nodes with stronger connections to each other. Leiden is chosen for its ability to efficiently recover hierarchical community structure in large-scale graphs.
Graph Communities to Community Summaries: Report-like summaries of each community in the Leiden hierarchy are generated. For leaf-level communities, element summaries (nodes, edges, covariates) are prioritized and added to the LLM context window until the token limit is reached. Higher-level communities are summarized by either including all element summaries if they fit within the token limit or by substituting sub-community summaries for element summaries until the context window limit is met.
Community Summaries to Community Answers to Global Answer: Given a user query, community summaries are used to generate a final answer in a multi-stage process. Community summaries are shuffled, divided into chunks, and processed in parallel to generate intermediate answers. The LLM also generates a score indicating the helpfulness of each answer. Intermediate answers are then sorted by helpfulness score and used to generate the final global answer.

To evaluate the Graph RAG approach, the authors used two datasets:

Podcast transcripts: Conversations between Kevin Scott, Microsoft CTO, and other technology leaders. The dataset comprised 1669 text chunks of 600 tokens each, with 100-token overlaps, totaling approximately 1 million tokens.
News articles: A benchmark dataset of news articles from September 2013 to December 2023, covering various categories. The dataset consisted of 3197 text chunks of 600 tokens each, with 100-token overlaps, totaling approximately 1.7 million tokens.

The evaluation used activity-centered sensemaking questions generated by an LLM based on short descriptions of the datasets. The questions were designed to assess the system's ability to provide comprehensive and diverse answers.

The evaluation compared six conditions:

C0: Graph RAG using root-level community summaries.
C1: Graph RAG using high-level community summaries.
C2: Graph RAG using intermediate-level community summaries.
C3: Graph RAG using low-level community summaries.
TS: Text summarization using a map-reduce approach directly on source texts.
SS: A na\"ive "semantic search" RAG approach.

The metrics used for evaluation included:

Comprehensiveness: How much detail the answer provides.
Diversity: How varied and rich the answer is.
Empowerment: How well the answer helps the reader understand and make informed judgments.
Directness: How specifically and clearly the answer addresses the question.

The results indicated that global approaches (C0, C1, C2, C3, TS) consistently outperformed the na\"ive RAG (SS) approach in comprehensiveness and diversity. Graph RAG with intermediate- and low-level community summaries (C2 and C3) showed favorable performance over source text summarization (TS) on these metrics, with lower token costs. For the Podcast dataset, the indexing process resulted in a graph consisting of 8564 nodes and 20691 edges, while the News dataset yielded a larger graph of 15754 nodes and 19520 edges.

The paper compares its work to existing RAG systems, including Selfmem [cheng2024lift], GAR [mao2020generation], Iter-RetGen [shao2023enhancing], FeB4RAG [wang2024feb4rag], CAiRE-COVID [su2020caire], ITRG [feng2023retrieval], IR-CoT [trivedi2022interleaving], DSP [khattab2022demonstrate], RAPTOR [sarthi2024raptor], and others that use knowledge graphs or graph metrics in various ways. However, it highlights that none of these systems use the natural modularity of graphs to partition data for global summarization, which is a key contribution of the Graph RAG approach.