Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction (2308.04215v3)

Published 8 Aug 2023 in cs.CL, cs.AI, and cs.DC

Abstract: LLMs enhanced with retrieval augmentation has shown great performance in many applications. However, the computational demands for these models pose a challenge when applying them to real-time tasks, such as composition assistance. To address this, we propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA), a novel system for real-time text prediction that efficiently combines a cloud-based LLM with a smaller client-side model through retrieval augmented memory. This integration enables the client model to generate better responses, benefiting from the LLM's capabilities and cloud-based data. Meanwhile, via a novel asynchronous memory update mechanism, the client model can deliver real-time completions to user inputs without the need to wait for responses from the cloud. Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.

PDF Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (6)

Menglin Xia (14 papers)
Xuchao Zhang (44 papers)
Camille Couturier (4 papers)
Guoqing Zheng (25 papers)
Saravan Rajmohan (85 papers)
Victor Ruhle (4 papers)

Citations (4)

View on Semantic Scholar

Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction (2308.04215v3)

Related Papers

Tweets