- The paper introduces an end-to-end toolkit that automates and simplifies the RAG workflow through a modular design and a user-friendly WebUI.
- It details comprehensive modules for knowledge management, data construction, fine-tuning with SFT/DPO, and standardized evaluation across 40+ benchmarks.
- Case studies in the legal domain show significant improvements in retrieval and generation metrics, validating UltraRAG’s adaptive capabilities.
Retrieval-Augmented Generation (RAG) is a technique used to improve the performance of LLMs by providing them with external knowledge. While effective, building and deploying RAG systems can be challenging due to diverse data formats, complex component coordination, and the rapid evolution of algorithms. Existing RAG toolkits like LangChain and LlamaIndex offer modularity but often lack user-friendly interfaces, comprehensive knowledge management features, and crucial support for adapting RAG systems to specific domains or tasks, limiting their practical applicability.
UltraRAG is introduced as a modular and automated toolkit designed to address these challenges, particularly focusing on facilitating knowledge adaptation throughout the RAG workflow. It provides an end-to-end solution covering data construction, training, evaluation, and inference, making it easier for both researchers and practitioners to build and optimize RAG systems. A key feature of UltraRAG is its user-friendly WebUI, which lowers the technical barrier, allowing users to manage knowledge bases, configure models, and run experiments without extensive coding.
The toolkit is structured around two global setting modules and three core functional modules:
- Global Setting Modules:
- Model Management: Enables the management, deployment, and usage of various models required for RAG (retrieval, reranker, generation). It supports loading local models via vLLM (2310.04605) or HuggingFace Transformers (2010.03701), as well as integrating API-based models. It provides pre-configured environments (Docker, microservices) for seamless model integration.
- Knowledge Management: Simplifies handling external knowledge bases. Users can upload files in various formats (TXT, PDF, Markdown, JSON, CSV). The module allows configuration of processing parameters like chunk size and overlap and automates the encoding and indexing of documents using a selected embedding model.
- Functional Modules:
- Data Construction: Automates the generation of training and evaluation data tailored to the RAG pipeline and specific knowledge bases. It employs techniques to generate queries from documents and construct datasets for retrieval, reranking, and generation models, including mining hard negative samples (2007.00808) for retrieval and creating SFT and DPO (2310.11452) datasets for generation models. Users can also upload custom datasets and mix data sources for multi-task training.
- Training: Supports finetuning embedding models and generation models using the data generated by the data construction module. Currently supports Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) with techniques like LoRA (2106.09685), with plans for more strategies.
- Evaluation & Inference: Provides tools for comprehensive assessment using standard retrieval and generation metrics across over 40 benchmark datasets. It supports custom datasets and a unified data format for flexible evaluation. The inference module offers predefined RAG workflows (Vanilla RAG, KBAlign (2411.14790), VisRAG (2410.10594), Adaptive-Note (2410.08821)) and allows users to build custom pipelines. It also supports streaming output, visualization of intermediate steps, and local deployment using Ollama for easily creating customized RAG applications.
UltraRAG pre-implements several popular RAG methods, including Vanilla RAG, RA-DIT (2310.01352), Adaptive-Note (2410.08821), VisRAG (2410.10594), KBAlign (2411.14790), and RAG-DDR (2410.13509), aiming to accelerate research by providing reproducible baselines and facilitating fair comparisons.
To demonstrate its knowledge adaptation capability, the paper presents a case paper focusing on building a legal-domain RAG system. Leveraging UltraRAG's modules, a legal knowledge base (1,000+ books) was ingested and indexed. The embedding model (MiniCPM-Embedding-Light) was finetuned using 2,800 synthetically generated samples from the legal corpus via the Data Construction module. This finetuning resulted in improved retrieval performance (MRR@10, NDCG@10, Recall@10) on legal domain samples, indicating better adaptation to legal terminology and document structures.
For generation, the MiniCPM-3-4B model was finetuned using two methods implemented in UltraRAG: UltraRAG-DDR (2410.13509) (using differentiable data rewards and DPO) and UltraRAG-KBAlign (2411.14790) (using SFT with combined short- and long-range annotations). Experiments on LawBench (2309.16289) tasks (Article Prediction and Consultation) showed that both finetuned models outperformed VanillaRAG. UltraRAG-DDR achieved a significant relative improvement in ROUGE-L score for Law Prediction. For the Consultation task, the RAGAdaptation workflow (using UltraRAG-Embedding and UltraRAG-DDR finetuned models) achieved the best results, surpassing both VanillaRAG and the DeepNote workflow. Case studies further illustrate that the finetuned models accurately reference specific legal articles, unlike a vanilla RAG setup which might return less specific or incorrect information.
In summary, UltraRAG provides a practical, end-to-end, and automated solution for developing and deploying RAG systems. Its modular design, user-friendly WebUI, and focus on knowledge adaptation through automated data construction and training processes make it a valuable toolkit for researchers and practitioners aiming to build high-performing, domain-specific RAG applications. The legal domain case paper effectively demonstrates its capability to enhance performance through targeted knowledge adaptation.