Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Web Question Answering with Neurosymbolic Program Synthesis (2104.07162v1)

Published 14 Apr 2021 in cs.PL

Abstract: In this paper, we propose a new technique based on program synthesis for extracting information from webpages. Given a natural language query and a few labeled webpages, our method synthesizes a program that can be used to extract similar types of information from other unlabeled webpages. To handle websites with diverse structure, our approach employs a neurosymbolic DSL that incorporates both neural NLP models as well as standard language constructs for tree navigation and string manipulation. We also propose an optimal synthesis algorithm that generates all DSL programs that achieve optimal F1 score on the training examples. Our synthesis technique is compositional, prunes the search space by exploiting a monotonicity property of the DSL, and uses transductive learning to select programs with good generalization power. We have implemented these ideas in a new tool called WebQA and evaluate it on 25 different tasks across multiple domains. Our experiments show that WebQA significantly outperforms existing tools such as state-of-the-art question answering models and wrapper induction systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Qiaochu Chen (12 papers)
  2. Aaron Lamoreaux (1 paper)
  3. Xinyu Wang (186 papers)
  4. Greg Durrett (118 papers)
  5. Osbert Bastani (97 papers)
  6. Isil Dillig (57 papers)
Citations (25)

Summary

We haven't generated a summary for this paper yet.