Papers
Topics
Authors
Recent
2000 character limit reached

Cosmos: A CXL-Based Full In-Memory System for Approximate Nearest Neighbor Search (2505.16096v1)

Published 22 May 2025 in cs.AR

Abstract: Retrieval-Augmented Generation (RAG) is crucial for improving the quality of LLMs by injecting proper contexts extracted from external sources. RAG requires high-throughput, low-latency Approximate Nearest Neighbor Search (ANNS) over billion-scale vector databases. Conventional DRAM/SSD solutions face capacity/latency limits, whereas specialized hardware or RDMA clusters lack flexibility or incur network overhead. We present Cosmos, integrating general-purpose cores within CXL memory devices for full ANNS offload and introducing rank-level parallel distance computation to maximize memory bandwidth. We also propose an adjacency-aware data placement that balances search loads across CXL devices based on inter-cluster proximity. Evaluations on SIFT1B and DEEP1B traces show that Cosmos achieves up to 6.72x higher throughput than the baseline CXL system and 2.35x over a state-of-the-art CXL-based solution, demonstrating scalability for RAG pipelines.

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.