Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models (2410.18344v1)
Abstract: This paper introduces a novel approach to enhancing closed-domain Question Answering (QA) systems, focusing on the specific needs of the Lawrence Berkeley National Laboratory (LBL) Science Information Technology (ScienceIT) domain. Utilizing a rich dataset derived from the ScienceIT documentation, our study embarks on a detailed comparison of two fine-tuned LLMs and five retrieval-augmented generation (RAG) models. Through data processing techniques, we transform the documentation into structured context-question-answer triples, leveraging the latest LLMs (AWS Bedrock, GCP PaLM2, Meta LLaMA2, OpenAI GPT-4, Google Gemini-Pro) for data-driven insights. Additionally, we introduce the Aggregated Knowledge Model (AKM), which synthesizes responses from the seven models mentioned above using K-means clustering to select the most representative answers. The evaluation of these models across multiple metrics offers a comprehensive look into their effectiveness and suitability for the LBL ScienceIT environment. The results demonstrate the potential benefits of integrating fine-tuning and retrieval-augmented strategies, highlighting significant performance improvements achieved with the AKM. The insights gained from this study can be applied to develop specialized QA systems tailored to specific domains.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).
- COBERT: COVID-19 question answering system using BERT. Arabian journal for science and engineering (2021), 1–11.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023).
- AWS. 2023. AWS Bedrock. https://aws.amazon.com/bedrock/. Accessed: 2023-12-19.
- Hiteshwar Kumar Azad and Akshay Deepak. 2019. Query expansion techniques for information retrieval: a survey. Information Processing & Management 56, 5 (2019), 1698–1735.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- Kawin Ethayarajh. 2019. How contextual are contextualized word representations? Comparing the geometry of BERT, ELMo, and GPT-2 embeddings. arXiv preprint arXiv:1909.00512 (2019).
- Automating ischemic stroke subtype classification using machine learning and natural language processing. Journal of Stroke and Cerebrovascular Diseases 28, 7 (2019), 2045–2051.
- Google. 2023. Tune Language Foundation Models Vertex AI Google Cloud. https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models. Accessed: 2023-12-30.
- Baseball: an automatic question-answerer. In Papers presented at the May 9-11, 1961, western joint IRE-AIEE-ACM computer conference. 219–224.
- Retrieval augmented language model pre-training. In International conference on machine learning. PMLR, 3929–3938.
- Greg Hamerly and Charles Elkan. 2003. Learning the k in k-means. Advances in neural information processing systems 16 (2003).
- John A Hartigan and Manchek A Wong. 1979. Algorithm AS 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics) 28, 1 (1979), 100–108.
- Adaptable closed-domain question answering using contextualized CNN-attention models and question expansion. IEEE Access 10 (2022), 45080–45092.
- A novel CNN-based method for question classification in intelligent question answering. In Proceedings of the 2018 international conference on algorithms, computing and artificial intelligence. 1–6.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems 33 (2020), 9459–9474.
- Towards end-to-end multilingual question answering. Information Systems Frontiers 23 (2021), 227–241.
- Matt Post. 2018. A call for clarity in reporting BLEU scores. arXiv preprint arXiv:1804.08771 (2018).
- Juan Ramos et al. 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, Vol. 242. Citeseer, 29–48.
- Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. doi: 10.48550. arXiv preprint arXiv.2307.09288 (2023).
- Attention is all you need. Advances in neural information processing systems 30 (2017).
- Recogs: How incidental details of a logical form overshadow an evaluation of semantic interpretation. arXiv preprint arXiv:2303.13716 (2023).
- Novel architecture for long short-term memory used in question classification. Neurocomputing 299 (2018), 20–31.
- DFM: A parameter-shared deep fused model for knowledge base question answering. Information Sciences 547 (2021), 103–118.
- Fengchen Liu (1 paper)
- Jordan Jung (1 paper)
- Wei Feinstein (1 paper)
- Jeff DAmbrogia (1 paper)
- Gary Jung (1 paper)