Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
117 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems (2403.00982v1)

Published 1 Mar 2024 in cs.CL

Abstract: Retrieval-augmented question-answering systems combine retrieval techniques with LLMs to provide answers that are more accurate and informative. Many existing toolkits allow users to quickly build such systems using off-the-shelf models, but they fall short in supporting researchers and developers to customize the model training, testing, and deployment process. We propose LocalRQA, an open-source toolkit that features a wide selection of model training algorithms, evaluation methods, and deployment tools curated from the latest research. As a showcase, we build QA systems using online documentation obtained from Databricks and Faire's websites. We find 7B-models trained and deployed using LocalRQA reach a similar performance compared to using OpenAI's text-ada-002 and GPT-4-turbo.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Gradio: Hassle-free sharing and testing of ML models in the wild.
  2. Task-aware retrieval with instructions.
  3. Self-rag: Learning to retrieve, generate, and critique through self-reflection.
  4. Ms marco: A human generated machine reading comprehension dataset.
  5. Harrison Chase. 2022. LangChain.
  6. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
  7. Scaling instruction-finetuned language models.
  8. Arman Cohan and Nazli Goharian. 2016. Revisiting summarization evaluation for scientific articles. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 806–813, Portorož, Slovenia. European Language Resources Association (ELRA).
  9. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
  10. Realm: Retrieval-augmented language model pre-training.
  11. Huggingface. 2023. Large language model text generation inference.
  12. Unsupervised dense information retrieval with contrastive learning.
  13. Gautier Izacard and Edouard Grave. 2020a. Distilling knowledge from reader to retriever for question answering.
  14. Gautier Izacard and Edouard Grave. 2020b. Leveraging passage retrieval with generative models for open domain question answering.
  15. Atlas: Few-shot learning with retrieval augmented language models.
  16. fastRAG: Efficient Retrieval Augmentation and Generation Framework.
  17. Mistral 7b.
  18. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
  19. TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611, Vancouver, Canada. Association for Computational Linguistics.
  20. Dense passage retrieval for open-domain question answering.
  21. Robust safety classifier for large language models: Adversarial prompt shield. ArXiv, abs/2311.00172.
  22. Natural questions: A benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7:452–466.
  23. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles.
  24. Retrieval-augmented generation for knowledge-intensive nlp tasks.
  25. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  26. Jerry Liu. 2022. LlamaIndex.
  27. G-eval: NLG evaluation using gpt-4 with better human alignment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2511–2522, Singapore. Association for Computational Linguistics.
  28. Nonparametric masked language modeling.
  29. Mteb: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316.
  30. Tomáš Nekvinda and Ondřej Dušek. 2021. Shades of bleu, flavours of success: The case of multiwoz.
  31. OpenAI. 2022a. New and improved embedding model.
  32. OpenAI. 2022b. OpenAI: Introducing ChatGPT.
  33. OpenAI. 2023. GPT-4 technical report.
  34. Training language models to follow instructions with human feedback.
  35. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  36. Check your facts and try again: Improving large language models with external knowledge and automated feedback.
  37. Haystack: the end-to-end NLP framework for pragmatic builders.
  38. In-context retrieval-augmented language models.
  39. Stephen Robertson and Hugo Zaragoza. 2009. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389.
  40. Replug: Retrieval-augmented black-box language models.
  41. BEIR: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  42. Llama 2: Open foundation and fine-tuned chat models.
  43. Stablelm 3b 4e1t.
  44. Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, ADCS ’14, page 58–65, New York, NY, USA. Association for Computing Machinery.
  45. Text embeddings by weakly-supervised contrastive pre-training.
  46. SimLM: Pre-training with representation bottleneck for dense passage retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2244–2258, Toronto, Canada. Association for Computational Linguistics.
  47. Challenges in detoxifying language models. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 2447–2469, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  48. Huggingface’s transformers: State-of-the-art natural language processing.
  49. C-pack: Packaged resources to advance general chinese embedding.
  50. Judging llm-as-a-judge with mt-bench and chatbot arena.
  51. Efficiently programming large language models using sglang.
  52. Training language models with memory augmentation.
  53. Starling-7b: Improving llm helpfulness and harmlessness with rlaif.
Citations (3)

Summary

We haven't generated a summary for this paper yet.