Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

121 tokens/sec

GPT-4o

9 tokens/sec

Gemini 2.5 Pro Pro

47 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation (2407.01972v1)

Published 2 Jul 2024 in cs.IR, cs.AI, cs.HC, and cs.LG

Abstract: Retrieval-augmented text generation (RAG) addresses the common limitations of LLMs, such as hallucination, by retrieving information from an updatable external knowledge base. However, existing approaches often require dedicated backend servers for data storage and retrieval, thereby limiting their applicability in use cases that require strict data privacy, such as personal finance, education, and medicine. To address the pressing need for client-side dense retrieval, we introduce MeMemo, the first open-source JavaScript toolkit that adapts the state-of-the-art approximate nearest neighbor search technique HNSW to browser environments. Developed with modern and native Web technologies, such as IndexedDB and Web Workers, our toolkit leverages client-side hardware capabilities to enable researchers and developers to efficiently search through millions of high-dimensional vectors in the browser. MeMemo enables exciting new design and research opportunities, such as private and personalized content creation and interactive prototyping, as demonstrated in our example application RAG Playground. Reflecting on our work, we discuss the opportunities and challenges for on-device dense retrieval. MeMemo is available at https://github.com/poloclub/mememo.

References (81)

Summary

The paper demonstrates the feasibility of on-device dense retrieval by adapting HNSW graphs with IndexedDB and Web Workers for efficient in-browser text generation.
The paper introduces RAG Playground, a no-code platform that prototypes retrieval-augmented generation to enhance accuracy and preserve user privacy.
The paper offers an open-source framework that integrates modern web and machine learning technologies to support scalable, privacy-preserving AI applications.

MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation

MeMemo, a Javascript-based toolkit for in-browser retrieval-augmented generation (RAG), navigates the challenges of privacy and client-side computation, presenting a compelling solution for researchers and practitioners. Authored by Zijie J. Wang and Duen Horng Chau, the paper introduces a breakthrough in on-device dense retrieval by leveraging hierarchical navigable small world (HNSW) graphs, modern web technologies, and a novel prefetching strategy to effectively handle large vector databases directly within the browser environment.

Overview of the Paper

Introduction

The paper begins by outlining the necessity of retrieval-augmented text generation (RAG) in mitigating common deficits in LLMs, such as hallucinations. Traditional RAG models depend heavily on backend systems for data storage and retrieval, which limits their applicability in privacy-sensitive domains like personal finance and medicine. MeMemo addresses this limitation by introducing an on-device RAG system enabling dense vector retrieval within the browser.

Contributions

1. Client-Side Dense Retrieval

The centerpiece of MeMemo is its adaptation of the HNSW technique to JavaScript, allowing efficient search through millions of vectors using client-side hardware. MeMemo utilizes IndexedDB and Web Workers to optimize storage and retrieval operations within browser environments. This ensures performance efficiency even with a significant dataset size.

2. RAG Playground

The toolkit's efficacy is demonstrated through an application called RAG Playground, which provides an interactive platform for prototyping RAG applications in the browser. This no-code tool allows users to explore various RAG features, making it accessible for stakeholders with diverse technical backgrounds. It enables user query input, document retrieval, prompt augmentation, and executes on-device LLMs to validate improved response reliability and accuracy.

3. Open-Source Accessibility

MeMemo is released as an open-source project, complete with comprehensive documentation and an example application to facilitate adoption and adaptation by researchers and developers. Its design emphasizes minimal dependencies and usability within various web development stacks (TypeScript, JavaScript, React, Svelte, and Lit).

Results and Implementations

Performance and Integration

The authors illustrate the performance implications through various hypothetical usage scenarios, underlining how MeMemo's integration supports the prototype development of client-side RAG systems efficiently. For instance, creating a HNSW index of 1 million 384-dimensional vectors, though slower compared to traditional systems, achieves real-time query performance - showcasing its practical viability.

The paper details how MeMemo can be integrated with existing web machine learning technologies for real-time applications. By combining tensor models through Web LLM or ONNX, the results illustrate substantial improvements in private and personalized content generation, specifically addressing domains where data privacy is paramount.

Challenges and Future Directions

Despite its innovative approach, MeMemo's performance in index creation is slower due to browser computation constraints. This could be improved by developing parallel processing capabilities and refining prefetching strategies. Future work could also explore enhancements in personal information management, interactive RAG prototyping, and optimizing on-device retrieval for mobile and IoT applications.

Implications

MeMemo's introduction paves the way for broader adoption of privacy-preserving AI tools. It offers substantial opportunities for extending dense retrieval to personal devices, fostering newer research avenues in interactive ML systems. By integrating dense retrieval on client-side platforms, it bridges a significant gap in enabling scalable, private, and personalized AI applications directly within user browsers.

Conclusion

MeMemo revolutionizes the integration of retrieval-augmented generation techniques by empowering in-browser dense retrieval, while addressing privacy and performance challenges head-on. This toolkit, complemented by RAG Playground, serves as a versatile and accessible resource for researchers and developers aiming to exploit on-device RAG in their applications. With its open-source availability, MeMemo is well poised to inspire further advancements in on-device AI technologies.

For more details, MeMemo code and usage examples are available on GitHub MeMemo repository.

PDF Markdown

GitHub

GitHub - poloclub/mememo: A JavaScript library that brings vector search and RAG to your browser! (62 stars)

Tweets

https://twitter.com/Jay4w/status/1810721928321085932

https://twitter.com/_reachsumit/status/1808525893138288985