RAGViz: Diagnose and Visualize Retrieval-Augmented Generation (2411.01751v1)
Abstract: Retrieval-augmented generation (RAG) combines knowledge from domain-specific sources into LLMs to ground answer generation. Current RAG systems lack customizable visibility on the context documents and the model's attentiveness towards such documents. We propose RAGViz, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents. With a built-in user interface, retrieval index, and LLM backbone, RAGViz provides two main functionalities: (1) token and document-level attention visualization, and (2) generation comparison upon context document addition and removal. As an open-source toolkit, RAGViz can be easily hosted with a custom embedding model and HuggingFace-supported LLM backbone. Using a hybrid ANN (Approximate Nearest Neighbor) index, memory-efficient LLM inference tool, and custom context snippet method, RAGViz operates efficiently with a median query time of about 5 seconds on a moderate GPU node. Our code is available at https://github.com/cxcscmu/RAGViz. A demo video of RAGViz can be found at https://youtu.be/cTAbuTu6ur4.
- 2024. Gpt-4 technical report.
- Pinecone assistant.
- The apache http server project. IEEE Internet Computing, 1(4):88–90.
- Common Crawl Foundation. 2007. Common crawl.
- The pile: An 800gb dataset of diverse text for language modeling.
- Diskann: Fast accurate billion-point nearest neighbor search on a single node. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
- Search by lepton github repo.
- Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles, SOSP ’23, page 611–626, New York, NY, USA. Association for Computing Machinery.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc.
- An investigation of practical approximate nearest neighbor algorithms. In Advances in Neural Information Processing Systems, volume 17. MIT Press.
- OpenAI. 2024. Assistants api overview.
- Clueweb22: 10 billion web documents with rich information. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 3360–3362, New York, NY, USA. Association for Computing Machinery.
- Guillermo Rauch. 2017. Guillermo rauch - next.js: Universal react made easy and simple - react conf 2017.
- Llama 2: Open foundation and fine-tuned chat models.
- Jesse Vig. 2019. Visualizing attention in transformer-based language representation models.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- Unsupervised dense retrieval training with web anchors. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 2476–2480, New York, NY, USA. Association for Computing Machinery.
- Slurm: Simple linux utility for resource management. In Job Scheduling Strategies for Parallel Processing, pages 44–60, Berlin, Heidelberg. Springer Berlin Heidelberg.