Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
121 tokens/sec
GPT-4o
9 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process (2402.01717v1)

Published 26 Jan 2024 in cs.CL, cs.AI, and cs.IR

Abstract: Regulatory compliance in the pharmaceutical industry entails navigating through complex and voluminous guidelines, often requiring significant human resources. To address these challenges, our study introduces a chatbot model that utilizes generative AI and the Retrieval Augmented Generation (RAG) method. This chatbot is designed to search for guideline documents relevant to the user inquiries and provide answers based on the retrieved guidelines. Recognizing the inherent need for high reliability in this domain, we propose the Question and Answer Retrieval Augmented Generation (QA-RAG) model. In comparative experiments, the QA-RAG model demonstrated a significant improvement in accuracy, outperforming all other baselines including conventional RAG methods. This paper details QA-RAG's structure and performance evaluation, emphasizing its potential for the regulatory compliance domain in the pharmaceutical industry and beyond. We have made our work publicly available for further research and development.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Conversational health agents: A personalized llm-powered agent framework. arXiv preprint arXiv:2310.02374, 2023.
  2. Query understanding in the age of large language models. arXiv preprint arXiv:2306.16004, 2023.
  3. Humanely: Human evaluation of llm yield, using a novel web-based evaluation tool. medRxiv, 2023. 2023-12.
  4. Assessing the capabilities of chatgpt to improve additive manufacturing troubleshooting. Advanced Industrial and Engineering Polymer Research, 2023.
  5. Chatgpt: Applications, opportunities, and threats. In 2023 Systems and Information Engineering Design Symposium (SIEDS), pages 274–279. IEEE, April 2023.
  6. Nougat: Neural optical understanding for academic documents. arXiv preprint arXiv:2308.13418, 2023.
  7. Chemcrow: Augmenting large-language models with chemistry tools. arXiv preprint arXiv:2304.05376, 2023.
  8. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020.
  9. D Castelvecchi. Open-source ai chatbots are booming-what does this mean for researchers? Nature, 2023.
  10. Can large language models be an alternative to human evaluations? arXiv preprint arXiv:2305.01937, 2023.
  11. M. Crudeli. Calculating quality management costs. Technology Record, 2020.
  12. Approximate similarity search with faiss framework using fpgas on the cloud. In Embedded Computer Systems: Architectures, Modeling, and Simulation, pages 373–386, 2019.
  13. Lift: Language-interfaced fine-tuning for non-language machine learning tasks. In Advances in Neural Information Processing Systems, volume 35, pages 11763–11784, 2022.
  14. Ragas: Automated evaluation of retrieval augmented generation. arXiv preprint arXiv:2309.15217, 2023.
  15. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(4):594–611, 2006.
  16. Precise zero-shot dense retrieval without relevance labels. arXiv preprint arXiv:2212.10496, 2022.
  17. M. Grusky. Rogue scores. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1914–1934, July 2023.
  18. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  19. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
  20. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3):535–547, 2019.
  21. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781. Association for Computational Linguistics, 2020.
  22. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
  23. Latent retrieval for weakly supervised open domain question answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6086–6096, Florence, Italy, 2019. Association for Computational Linguistics.
  24. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  25. More robust dense retrieval with contrastive dual learning. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, pages 287–296, 2021.
  26. Gpteval: Nlg evaluation using gpt-4 with better human alignment. arXiv preprint arXiv:2303.16634, 2023.
  27. Learning from one example through shared densities on transforms. In Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000, volume 1, pages 464–471. IEEE, June 2000.
  28. An empirical comparison of faiss and fenshses for nearest neighbor search in hamming space. arXiv preprint arXiv:1906.10095, 2019.
  29. Multi-stage document ranking with bert. arXiv preprint arXiv:1910.14424, 2019.
  30. Document ranking with a pretrained sequence-to-sequence model. arXiv preprint arXiv:2003.06713, 2020.
  31. The use of chatbots as supportive agents for people seeking help with substance use disorder: A systematic review. European Addiction Research, 28(6):405–418, 2022.
  32. K. Papineni. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the Association for Computational Linguistics, pages 311–318, 2002.
  33. M. Post. A structured review of the validity of bleu. Computational Linguistics, 44(3):393–401, 2018.
  34. E. Reiter. A structured review of the validity of bleu. Computational Linguistics, 44(3):393–401, 2018.
  35. S. Robertson and H. Zaragoza. The probabilistic relevance framework: Bm25 and beyond. Foundations and Trends in Information Retrieval, 3(4):333–389, 2009.
  36. N. Savage. Drug discovery companies are customizing chatgpt: here’s how. Nat Biotechnol, 41:585–586, 2023.
  37. Bleu is not suitable for the evaluation of text simplification. arXiv preprint arXiv:1810.05995, 2018.
  38. E. Svikhnushina and P. Pu. Approximating online human evaluation of social chatbots with prompting. In Proceedings of the 24th Meeting of the Special Interest Group on Discourse and Dialogue, pages 268–281, September 2023.
  39. Improvements to bm25 and language models examined. In Proceedings of the 19th Australasian Document Computing Symposium, pages 58–65, November 2014.
  40. Deep reinforced query reformulation for information retrieval. arXiv preprint arXiv:2007.07987, 2020.
  41. Query2doc: Query expansion with large language models. arXiv preprint arXiv:2303.07678, 2023.
  42. Chatgpt in drug discovery: A case study on anti-cocaine addiction drug development with chatbots. ArXiv, 2023. Available at https://arxiv.org/abs/2308.06920v2, PubMed ID: 37645039, PubMed Central ID: PMC10462169.
  43. Grove: A retrieval-augmented complex story generation framework with a forest of evidence. arXiv preprint arXiv:2310.05388, 2023.
  44. C-pack: Packaged resources to advance general chinese embedding. arXiv preprint arXiv:2309.07597, 2023.
  45. Leandojo: Theorem proving with retrieval-augmented language models. arXiv preprint arXiv:2306.15626, 2023.
  46. Y. Zeng and K. Lee. The expressive power of low-rank adaptation. arXiv preprint arXiv:2310.17513, 2023.
  47. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  48. Retrieve anything to augment large language models. arXiv preprint arXiv:2310.07554, 2023.
  49. Judging llm-as-a-judge with mt-bench and chatbot arena. arXiv preprint arXiv:2306.05685, 2023.
Citations (10)

Summary

  • The paper introduces the QA-RAG model, a novel approach integrating generative AI with dual-track document retrieval to improve regulatory compliance accuracy.
  • It employs rigorous document preprocessing and dense retrieval techniques, achieving a context precision of 0.717 and enhancing answer generation performance.
  • The model streamlines compliance by reducing human dependency and handling complex guidelines, with potential applications in other regulated domains.

Integrating Generative AI for Pharmaceutical Regulatory Compliance: A Review of QA-RAG

The paper "From RAG to QA-RAG: Integrating Generative AI for Pharmaceutical Regulatory Compliance Process" authored by Jaewoong Kim and Moohong Min introduces an innovative approach to enhance regulatory compliance using generative AI, specifically within the pharmaceutical industry. This essay analyzes and outlines the key aspects, methodology, findings, and implications of the QA-RAG model as presented in the paper.

Introduction

Regulatory compliance in the pharmaceutical sector is a complex and labor-intensive process, requiring precise interpretation of extensive guidelines from regulatory bodies such as the FDA and EMA. Misinterpretation or oversight can result in substantial financial and operational repercussions. The paper proposes the Question and Answer Retrieval Augmented Generation (QA-RAG) model as a solution to mitigate these challenges. QA-RAG is designed to leverage the capabilities of Generative AI and Retrieval Augmented Generation (RAG) methods to improve the accuracy and efficiency of navigating regulatory guidelines.

Methodology

The QA-RAG model combines elements of generative AI with RAG to provide accurate responses to user queries within the context of regulatory compliance.

Model Structure

The QA-RAG model operates through a dual-track retrieval system, integrating contextually relevant documents from both user queries and answers generated by a fine-tuned LLM. The process encompasses several steps:

  1. Document Preprocessing:
    • The model utilizes Optical Character Recognition (OCR) to convert regulatory documents into a text corpus.
    • Dense retrieval methods are employed for embedding and retrieving pertinent documents.
  2. Dual-Track Retrieval:
    • The model retrieves half of the documents based on the user query and the other half from the answers generated by a fine-tuned LLM.
  3. Reranking:
    • A reranking model evaluates the relevance of retrieved documents, ensuring that only the most contextually pertinent documents are presented for final answer generation.
  4. Final Answer Generation:
    • The final response is generated using a fine-tuned ChatGPT-3.5 Turbo model with a few-shot prompting technique to enhance answer accuracy.

Experimental Setup and Results

The paper evaluates QA-RAG using a dataset comprising 1,404 FDA and ICH guideline documents. Several baselines, including traditional RAG methods, Multiquery retrieval, and HyDE, were compared against QA-RAG.

Document Retrieval Performance

QA-RAG exhibited superior performance in retrieving relevant documents with a context precision score of 0.717 and a context recall of 0.328. The model's ability to leverage both user queries and fine-tuned LLM answers proved to be a significant advantage.

Answer Generation Performance

In generating final answers, QA-RAG outperformed other models with a precision of 0.551, recall of 0.645, and f1 score of 0.591. The use of domain-specific fine-tuned LLM responses was pivotal in achieving high accuracy and relevance in answers.

Implications

The implications of integrating QA-RAG in the pharmaceutical regulatory domain are multifaceted:

  1. Efficiency and Accuracy:
    • QA-RAG streamlines the compliance process by reducing the time and resources needed to interpret complex guidelines, thus, enabling quicker and more precise decision-making.
  2. Reduction in Human Dependency:
    • By automating routine tasks, QA-RAG allows human experts to focus on strategic activities, enhancing overall productivity within the pharmaceutical industry.
  3. Potential for Broader Application:
    • The model's adaptable design suggests potential applications in other domains requiring specialized knowledge, such as legal compliance, financial regulation, and academic research.

Conclusion

The successful deployment of QA-RAG represents a major advancement in the application of generative AI to regulatory compliance. The model's innovative dual-track retrieval system, efficient processing, and high accuracy underline its potential to revolutionize how regulatory information is managed in the pharmaceutical industry and beyond. Future research could focus on enhancing the model's adaptability and performance to maintain its effectiveness amidst evolving regulatory landscapes. Overall, QA-RAG exemplifies the practical benefits of integrating sophisticated AI methodologies within industry-specific contexts to address complex challenges.

Ethical Considerations

The authors emphasize the QA-RAG model as a complementary tool, not a replacement, for human expertise, ensuring ethical compliance with data privacy and security standards by using publicly accessible documents for model training and evaluation.

Acknowledgments

The work was supported by the National Research Foundation of Korea and highlights the use of ChatGPT, developed by OpenAI, for generating illustrative figures within the paper.