From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process (2402.01717v1)
Abstract: Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.
- Conversational health agents: A personalized llm-powered agent framework. arXiv preprint arXiv:2310.02374, 2023.
- Query understanding in the age of large language models. arXiv preprint arXiv:2306.16004, 2023.
- Humanely: Human evaluation of llm yield, using a novel web-based evaluation tool. medRxiv, 2023. 2023-12.
- Assessing the capabilities of chatgpt to improve additive manufacturing troubleshooting. Advanced Industrial and Engineering Polymer Research, 2023.
- Chatgpt: Applications, opportunities, and threats. In 2023 Systems and Information Engineering Design Symposium (SIEDS), pages 274–279. IEEE, April 2023.
- Nougat: Neural optical understanding for academic documents. arXiv preprint arXiv:2308.13418, 2023.
- Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023.
- Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
- D Castelvecchi. Open-source ai chatbots are booming-what does this mean for researchers? Nature, 2023.
- Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937, 2023.
- M. Crudeli. Calculating quality management costs. Technology Record, 2020.
- Approximate similarity search with faiss framework using fpgas on the cloud. In Embedded Computer Systems: Architectures, Modeling, and Simulation, pages 373–386, 2019.
- Lift: Language-interfaced fine-tuning for non-language machine learning tasks. In Advances in Neural Information Processing Systems, volume 35, pages 11763–11784, 2022.
- Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.
- One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006.
- Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496, 2022.
- M. Grusky. Rogue scores. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1914–1934, July 2023.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2019.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020.
- Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
- Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy, 2019. Association for Computational Linguistics.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
- More robust dense retrieval with contrastive dual learning. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pages 287–296, 2021.
- Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634, 2023.
- Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000, volume 1, pages 464–471. IEEE, June 2000.
- An empirical comparison of faiss and fenshses for nearest neighbor search in hamming space. arXiv preprint arXiv:1906.10095, 2019.
- Multi-stage document ranking with bert. arXiv preprint arXiv:1910.14424, 2019.
- Document ranking with a pretrained sequence-to-sequence model. arXiv preprint arXiv:2003.06713, 2020.
- The use of chatbots as supportive agents for people seeking help with substance use disorder: A systematic review. European Addiction Research, 28(6):405–418, 2022.
- K. Papineni. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics, pages 311–318, 2002.
- M. Post. A structured review of the validity of bleu. Computational Linguistics, 44(3):393–401, 2018.
- E. Reiter. A structured review of the validity of bleu. Computational Linguistics, 44(3):393–401, 2018.
- S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval, 3(4):333–389, 2009.
- N. Savage. Drug discovery companies are customizing chatgpt: here’s how. Nat Biotechnol, 41:585–586, 2023.
- Bleu is not suitable for the evaluation of text simplification. arXiv preprint arXiv:1810.05995, 2018.
- E. Svikhnushina and P. Pu. Approximating online human evaluation of social chatbots with prompting. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, pages 268–281, September 2023.
- Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, pages 58–65, November 2014.
- Deep reinforced query reformulation for information retrieval. arXiv preprint arXiv:2007.07987, 2020.
- Query2doc: Query expansion with large language models. arXiv preprint arXiv:2303.07678, 2023.
- Chatgpt in drug discovery: A case study on anti-cocaine addiction drug development with chatbots. ArXiv, 2023. Available at https://arxiv.org/abs/2308.06920v2, PubMed ID: 37645039, PubMed Central ID: PMC10462169.
- Grove: A retrieval-augmented complex story generation framework with a forest of evidence. arXiv preprint arXiv:2310.05388, 2023.
- C-pack: Packaged resources to advance general chinese embedding. arXiv preprint arXiv:2309.07597, 2023.
- Leandojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv:2306.15626, 2023.
- Y. Zeng and K. Lee. The expressive power of low-rank adaptation. arXiv preprint arXiv:2310.17513, 2023.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
- Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023.
- Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.