2000 character limit reached
NAPS: Natural Program Synthesis Dataset (1807.03168v1)
Published 6 Jul 2018 in cs.LG, cs.AI, cs.PL, and stat.ML
Abstract: We present a program synthesis-oriented dataset consisting of human written problem statements and solutions for these problems. The problem statements were collected via crowdsourcing and the program solutions were extracted from human-written solutions in programming competitions, accompanied by input/output examples. We propose using this dataset for the program synthesis tasks aimed for working with real user-generated data. As a baseline we present few models, with the best model achieving 8.8% accuracy, showcasing both the complexity of the dataset and large room for future research.