Beyond I.I.D.: Three Levels of Generalization for Question Answering on Knowledge Bases (2011.07743v6)

Published 16 Nov 2020 in cs.CL, cs.AI, and cs.LG

Abstract: Existing studies on question answering on knowledge bases (KBQA) mainly operate with the standard i.i.d assumption, i.e., training distribution over questions is the same as the test distribution. However, i.i.d may be neither reasonably achievable nor desirable on large-scale KBs because 1) true user distribution is hard to capture and 2) randomly sample training examples from the enormous space would be highly data-inefficient. Instead, we suggest that KBQA models should have three levels of built-in generalization: i.i.d, compositional, and zero-shot. To facilitate the development of KBQA models with stronger generalization, we construct and release a new large-scale, high-quality dataset with 64,331 questions, GrailQA, and provide evaluation settings for all three levels of generalization. In addition, we propose a novel BERT-based KBQA model. The combination of our dataset and model enables us to thoroughly examine and demonstrate, for the first time, the key role of pre-trained contextual embeddings like BERT in the generalization of KBQA.

Citations (181)

View on Semantic Scholar

Summary

The paper introduces GrailQA, a novel dataset designed to evaluate three levels of generalization in knowledge base question answering.
It proposes a BERT-based model that leverages pre-trained embeddings for enhanced language-ontology alignment and effective logical form generation.
The results demonstrate improved accuracy and robust performance in handling out-of-distribution queries across large-scale knowledge bases.

Three Levels of Generalization for KBQA

This paper presents an investigation into question answering over knowledge bases (KBQA) by concentrating on generalization capabilities beyond the standard i.i.d. (independent and identically distributed) assumptions. This approach recognizes the limitations such assumptions impose, especially when handling large-scale knowledge bases like Freebase or Google Knowledge Graph. The authors suggest three levels of generalization that KBQA systems should aim to integrate: i.i.d., compositional, and zero-shot generalization. To support this aim, they introduce GrailQA, a novel dataset comprising 64,331 questions designed to facilitate evaluations across these generalization levels.

Dataset and Model Proposition

Existing KBQA models often prioritize i.i.d. generalization, assuming consistency between training and test distributions. The paper argues that creating a representative training set covering all possible user queries is unfeasible due to the expansive nature of knowledge bases. Moreover, such attempts could lead to inadequate performance on out-of-distribution questions. To address this, the authors propose GrailQA, which not only features large-scale data but also diversifies its question complexity and explores multiple domains and entities.

Furthermore, the paper introduces a BERT-based KBQA model, strategically leveraging pre-trained contextual embeddings to improve generalization. Empirical evidence illustrates the model's ability to outperform existing systems, demonstrating the utility of BERT in handling the logical form generation and providing robust language-ontology alignment necessary for effective KBQA.

Results and Evaluation

The paper outlines several evaluations to test the efficacy of the proposed approaches:

I.I.D. Generalization: GrailQA tests how systems can handle questions mirroring training distribution, maintaining familiarity with observed schema items. Metrics such as exact match accuracy and F1 scores help gauge performance.
Compositional Generalization: The dataset challenges models to understand new compositions of observed schema items, reflecting the complexity of KBQA environments. BERT-based models show notable strength in aligning language and unseen constructs.
Zero-Shot Generalization: Evaluating models on previously unseen domains or schema items, GrailQA sets a precedent. The BERT-based approach shows substantial progress in zero-shot tasks.

The results indicate that both pre-trained embeddings and strategic vocabulary pruning effectively enhance a KBQA system's generalization ability. The BERT-based model excels in providing robust linguistic variation handling and better alignment across the expansive schema items present in large KBs.

Implications and Future Directions

The work underscores the importance of grappling with the inherent complexity of KBs. With GrailQA, researchers can address the pitfalls of standard datasets and explore avenues like language-ontology alignment and intelligent candidate generation for complex questions and extended domains.

Future research avenues include refining entity linking mechanisms for long-tail or low-popularity entities and pursuing sophisticated search space pruning strategies that leverage contextual knowledge more effectively. Additionally, understanding the mechanisms through which pre-trained embeddings enhance generalization across multiple domains and tasks remains a critical pursuit.

Ultimately, this research bridges a gap in KBQA studies, offering insights and resources that invite further exploration into scalable and generalizable KBQA systems. GrailQA stands as both a robust benchmark and a stepping stone towards more practically efficient natural language processing across vast knowledge datasets.

PDF Markdown