Papers
Topics
Authors
Recent
2000 character limit reached

HedraRAG: Coordinating LLM Generation and Database Retrieval in Heterogeneous RAG Serving

Published 12 Jul 2025 in cs.DB and cs.LG | (2507.09138v1)

Abstract: This paper addresses emerging system-level challenges in heterogeneous retrieval-augmented generation (RAG) serving, where complex multi-stage workflows and diverse request patterns complicate efficient execution. We present HedraRAG, a runtime system built on a graph-based abstraction that exposes optimization opportunities across stage-level parallelism, intra-request similarity, and inter-request skewness. These opportunities are realized through dynamic graph transformations, such as node splitting, reordering, edge addition, and dependency rewiring, applied to wavefronts of subgraphs spanning concurrent requests. The resulting execution plans are mapped onto hybrid CPU-GPU pipelines to improve resource utilization and reduce latency. Evaluations across a wide range of RAG workflows demonstrate speedups exceeding 1.5x and reaching up to 5x over existing frameworks, showcasing the effectiveness of coordinated generation and retrieval in serving environments.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.