Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Compositional Semantic Parsing on Semi-Structured Tables (1508.00305v1)

Published 3 Aug 2015 in cs.CL

Abstract: Two important aspects of semantic parsing for question answering are the breadth of the knowledge source and the depth of logical compositionality. While existing work trades off one aspect for another, this paper simultaneously makes progress on both fronts through a new task: answering complex questions on semi-structured tables using question-answer pairs as supervision. The central challenge arises from two compounding factors: the broader domain results in an open-ended set of relations, and the deeper compositionality results in a combinatorial explosion in the space of logical forms. We propose a logical-form driven parsing algorithm guided by strong typing constraints and show that it obtains significant improvements over natural baselines. For evaluation, we created a new dataset of 22,033 complex questions on Wikipedia tables, which is made publicly available.

Citations (706)

Summary

  • The paper presents a logical-form-driven parsing algorithm that uses strong typing to answer complex questions on semi-structured HTML tables.
  • It achieves a test accuracy of 37.1%, significantly surpassing baseline methods with accuracies of 12.7% and 24.3%.
  • The methodology transforms HTML tables into knowledge graphs, enhancing semantic parsing across open-domain relations.

Compositional Semantic Parsing on Semi-Structured Tables

This paper addresses the dual challenges of enhancing the breadth of knowledge sources and the logical compositionality depth in semantic parsing, particularly for question answering tasks. The authors introduce a novel task: answering complex questions using semi-structured HTML tables from Wikipedia. The research focuses on two inherent challenges: handling an open-ended set of relations due to broader domain coverage, and managing the combinatorial explosion in logical form space caused by deeper compositionality.

Methodology

The authors develop a logical-form-driven parsing algorithm constrained by strong typing to handle complex questions. This approach avoids the need for constructing or relying on a pre-learned lexicon that maps phrases to relations and logical operations, a common strategy in semantic parsing. The system leverages a deterministic transformation of HTML tables into knowledge graphs, which serve as the input for parsing questions into candidate logical forms. A log-linear model reranks these candidates, with beam search and type-based pruning strategies helping manage computational complexity.

Dataset and Evaluation

To evaluate the proposed semantic parser, the researchers curated a dataset comprising 2,108 HTML tables and 22,033 question-answer pairs. The semantic parser achieved a remarkable accuracy of 37.1% on unseen test data—significantly outperforming baselines such as information retrieval and simpler semantic parsing models—an indication of successful handling of previously unseen tables and relations.

Strong Numerical Results

The test accuracy of 37.1%, compared to an information retrieval baseline of 12.7% and a basic semantic parsing baseline of 24.3%, demonstrates the efficacy of the proposed system in dealing with complex and compositional logical queries. This achievement, coupled with an oracle score of 76.6%, underscores the ability of the methodology to predict correct logical forms even if sometimes failing to prioritize them during inference.

Implications and Future Directions

The implications of this research, both practical and theoretical, are substantial. Practically, it enhances the ability of AI systems to parse and understand semi-structured data formats, which are prevalent on the web. Theoretically, it expands the frontiers on how semantic parsing can handle extensive domains without fixed schema constraints.

Looking forward, the principles and methodologies proposed in this work could be adapted to other semi-structured formats beyond HTML tables, such as lists or colon-delimited text. Integrating these approaches with broader content extraction systems could further empower the development of robust knowledge acquisition and question-answering systems across dynamic and unstructured information landscapes. The potential to combine this work with general web page data represents an enticing pathway for future AI research endeavors.