Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Synthetic Datasets for Neural Program Synthesis (1912.12345v1)

Published 27 Dec 2019 in cs.LG, cs.AI, cs.PL, and stat.ML

Abstract: The goal of program synthesis is to automatically generate programs in a particular language from corresponding specifications, e.g. input-output behavior. Many current approaches achieve impressive results after training on randomly generated I/O examples in limited domain-specific languages (DSLs), as with string transformations in RobustFill. However, we empirically discover that applying test input generation techniques for languages with control flow and rich input space causes deep networks to generalize poorly to certain data distributions; to correct this, we propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications. We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Richard Shin (18 papers)
  2. Neel Kant (9 papers)
  3. Kavi Gupta (4 papers)
  4. Christopher Bender (2 papers)
  5. Brandon Trabucco (13 papers)
  6. Rishabh Singh (58 papers)
  7. Dawn Song (229 papers)
Citations (44)