Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hybrid-RACA: Hybrid Retrieval-Augmented Composition Assistance for Real-time Text Prediction (2308.04215v3)

Published 8 Aug 2023 in cs.CL, cs.AI, and cs.DC

Abstract: LLMs enhanced with retrieval augmentation has shown great performance in many applications. However, the computational demands for these models pose a challenge when applying them to real-time tasks, such as composition assistance. To address this, we propose Hybrid Retrieval-Augmented Composition Assistance (Hybrid-RACA), a novel system for real-time text prediction that efficiently combines a cloud-based LLM with a smaller client-side model through retrieval augmented memory. This integration enables the client model to generate better responses, benefiting from the LLM's capabilities and cloud-based data. Meanwhile, via a novel asynchronous memory update mechanism, the client model can deliver real-time completions to user inputs without the need to wait for responses from the cloud. Our experiments on five datasets demonstrate that Hybrid-RACA offers strong performance while maintaining low latency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Menglin Xia (14 papers)
  2. Xuchao Zhang (44 papers)
  3. Camille Couturier (4 papers)
  4. Guoqing Zheng (25 papers)
  5. Saravan Rajmohan (85 papers)
  6. Victor Ruhle (4 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets