Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Graph Representation of Semi-structured Data for Web Question Answering (2010.06801v1)

Published 14 Oct 2020 in cs.CL and cs.AI

Abstract: The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Xingyao Zhang (17 papers)
  2. Linjun Shou (53 papers)
  3. Jian Pei (104 papers)
  4. Ming Gong (246 papers)
  5. Lijie Wen (58 papers)
  6. Daxin Jiang (138 papers)
Citations (14)