Unsupervised Multi-hop Question Answering by Question Generation
The paper introduces MQA-QG, a novel unsupervised framework designed to train multi-hop question answering models without the need for human-labeled multi-hop question-answer pairs. Recognizing the challenge of data scarcity, where annotating multi-hop QA datasets is resource-intensive owing to their complexity, the authors propose a method to generate training data automatically.
Framework Overview
MQA-QG operates over both homogeneous and heterogeneous data sources to synthetically create training datasets. The process involves a two-step approach: initially selecting or generating relevant information from different data sources, and subsequently integrating these pieces to structure coherent multi-hop questions. For training efficiency, MQA-QG employs a series of operators that handle tasks like entity selection (FindBridge), entity description (DescribeEnt), and question generation with specific conditions (QGwithAns and QGwithEnt). The synthesis of multi-hop questions is facilitated by operators such as BridgeBlend and CompBlend which blend single-hop questions into composite multi-hop forms.
Experimental Evaluation
The framework was evaluated on two distinct multi-hop QA datasets: HotpotQA, which involves text-only reasoning, and HybridQA, which combines both table and text data sources. The paper demonstrates that using only generated data, MQA-QG achieves 61% and 83% of the fully supervised performance on HybridQA and HotpotQA respectively, indicating that synthetic data can effectively pretrain models to reduce reliance on human annotations.
Additionally, the framework is found to be beneficial in few-shot learning scenarios, significantly boosting model performance when only a handful of labeled samples are available. For example, combining MQA-QG pretraining with 50 labeled examples on the HotpotQA dataset raised the F1 score from 21.6 to 64.6, showing a substantial reduction in data requirements.
Implications and Future Research
The implications of MQA-QG are profound for the development of QA systems, especially in scenarios where data labeling is prohibitive. By assembling robust training datasets with minimal human intervention, this framework paves the way for deploying QA systems in low-resource domains or with new document types.
Future research could explore expanding the framework to incorporate additional modalities beyond text and tables, such as integrating visual data for richer reasoning tasks. Moreover, refining the question generation process to further enhance semantic coherence and naturalness of the generated questions could provide even greater alignment with human intuition, enhancing the utility of the synthetically generated datasets.
In summary, the research suggests a promising direction towards reducing the bottleneck of labeled data in multi-hop QA through automated, unsupervised methodologies.