Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models (2410.18344v1)

Published 24 Oct 2024 in cs.CL, cs.AI, and cs.LG

Abstract: This paper introduces a novel approach to enhancing closed-domain Question Answering (QA) systems, focusing on the specific needs of the Lawrence Berkeley National Laboratory (LBL) Science Information Technology (ScienceIT) domain. Utilizing a rich dataset derived from the ScienceIT documentation, our study embarks on a detailed comparison of two fine-tuned LLMs and five retrieval-augmented generation (RAG) models. Through data processing techniques, we transform the documentation into structured context-question-answer triples, leveraging the latest LLMs (AWS Bedrock, GCP PaLM2, Meta LLaMA2, OpenAI GPT-4, Google Gemini-Pro) for data-driven insights. Additionally, we introduce the Aggregated Knowledge Model (AKM), which synthesizes responses from the seven models mentioned above using K-means clustering to select the most representative answers. The evaluation of these models across multiple metrics offers a comprehensive look into their effectiveness and suitability for the LBL ScienceIT environment. The results demonstrate the potential benefits of integrating fine-tuning and retrieval-augmented strategies, highlighting significant performance improvements achieved with the AKM. The insights gained from this study can be applied to develop specialized QA systems tailored to specific domains.

View on arXiv

References (24)

Authors (5)

Fengchen Liu (1 paper)
Jordan Jung (1 paper)
Wei Feinstein (1 paper)
Jeff DAmbrogia (1 paper)
Gary Jung (1 paper)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Aggregated Knowledge Model: Enhancing Domain-Specific QA with Fine-Tuned and Retrieval-Augmented Generation Models (2410.18344v1)

Summary

Related Papers