Two-Stage Quranic QA via Ensemble Retrieval and Instruction-Tuned Answer Extraction

Published 9 Aug 2025 in cs.CL and cs.IR | (2508.06971v1)

Abstract: Quranic Question Answering presents unique challenges due to the linguistic complexity of Classical Arabic and the semantic richness of religious texts. In this paper, we propose a novel two-stage framework that addresses both passage retrieval and answer extraction. For passage retrieval, we ensemble fine-tuned Arabic LLMs to achieve superior ranking performance. For answer extraction, we employ instruction-tuned LLMs with few-shot prompting to overcome the limitations of fine-tuning on small datasets. Our approach achieves state-of-the-art results on the Quran QA 2023 Shared Task, with a MAP@10 of 0.3128 and MRR@10 of 0.5763 for retrieval, and a pAP@10 of 0.669 for extraction, substantially outperforming previous methods. These results demonstrate that combining model ensembling and instruction-tuned LLMs effectively addresses the challenges of low-resource question answering in specialized domains.